Spam Filter

Time Limit: 2000/1000MS (Java/Others) Memory Limit: 128000/64000KB (Java/Others)

Problem Description

      Most spam filters use so called Bayesian filter to choose which message is spam and which is not. Let us describe the idea of such filter.
      The first stage of using Bayesian filter is its training. During training the filter is shown a number of messages for each of which it is specified whether it is spam or not. Denote the set of messages that are spam as Spam and the set of other messages as Good.
      Filter parses the messages to words and for each word w it calculates the probability that it is in a spam message:

      and the probability that it is not in a spam message

      After that for each word w in a message to classify the filter uses this information to calculate using Bayes’ formula:

      Here and .Here let 0/0 = 0 (this happens if the word did not occur in training messages).

      After that the ratio of words that indicate that M ∈ Spam with probability of at least 1/2 is calculated. If it reaches some threshold t the message is classified as spam. Note that each word is analyzed only once even if it occurs several time in a message.
      In this problem you will have to implement Bayesian spam filter. You will be given the set of messages to train and after that the set of messages to classify. For each message to classify you will have to tell whether it is spam or not.

Input

      The first line of the input file contains four integer numbers: s, g, n and t — the number of spam messages to train on, the number of good messages to train on and the number of messages to classify, and the threshold in percent (1 ≤ s ≤ 1000, 1 ≤ g ≤ 1000, 1 ≤ n ≤ 1000, 1 ≤ t ≤ 100).
      The following s lines contain one message per line that are specified to be spam. A message is the sequence of one or more words that consist of letters of  the English alphabet ('a'~'z', 'A'~'Z') and are separated and/or surrounded by spaces(' '), digits('0'~'9') and punctuation marks (‘-’, ‘,’, ‘.’, ‘!’, ‘?’). All words are case insensitive.
      The length of each message is at most 200 characters. Word length doesn’t exceed 20. The following g lines contain good messages, one per line, and the last n lines contain messages to be
classified, one per line.

Output

      For each message to be classified print “spam” if at least t% words indicate that it is spam, and “good” in the other case.

Sample Input

2 2 2 50
Buy our best computers!
You will find our computers best!
I have completed writing problems for trainings.
I have problems with my computer.
Computers? Solution!
I do not know what to buy. Need help.

Sample Output

spam
good

Source

Andrew Stankevich Contest 22

Manager

Information
Solved Number22
Submit Number107
Problem Tags
No tag edit access
温馨提示:AC后可以编辑标签哦. ^-^
Login
LoginCancel