You are not logged in.
Pages: 1
i need some help counting probability using Bayes theorem
given that 1/10 of message is a spam message. spam detector will correct identify a message as spam 89 percent of the time. the spam detector will correct identify a message as not spam 89 percent of the time too. The detector has just detect a message as being spam. what is the probability that the message is actually not a spam message?
i have come out with some numbers which i am really not sure i am correct
someone please guide me with this. i am really confused.
Last edited by zxcvbnm123 (2013-07-24 23:57:20)
Offline
You want:
P (not spam | flag)
P (not spam | flag) = P (not spam and flag) / P (flag)
How do we work out P (not spam and flag) ?
given that 1/10 of message is a spam message.
Does this mean that 0.1 of all messages are flagged as spam ?
Or that 0.1 of all messages are spam ?
So P(flag) = 0.1
Or is it P(spam) = 0.1
spam detector will correct identify a message as spam 89 percent of the time.
the spam detector will correct identify a message as not spam 89 percent of the time too.
Last edited by SteveB (2013-07-24 07:56:05)
Offline
sorry can elaborate more? i just assume that i understand the theory and guess the number. if i am wrong please tell me in more details.
Offline
The detector has just detect a message as being spam. what is the probability that the message is actually not a spam message?
So we are looking for: P(not spam given that the message is flagged as spam)
Or using your notation: P(~spam | flag)
I am a little confused myself and I may need to think some more about this.....
I am not sure either way at the moment I am trying to draw a tree diagram on paper to help ....
Last edited by SteveB (2013-07-24 08:22:57)
Offline
the question is pretty confusing for me too. but i think what i need to look for is P(not spam given that the message is flagged as spam).
Offline
I am wondering whether it might help to draw a diagram similar to this adapting where needed
to suit the problem:
Spam event Flag event
Spam - true -true spam and flag
-false spam and not flag
- false -true not spam AND flag
-false not spam AND not flag
In theory Bayes theorem can calculate P (A and B) = P(A | B) x P(B)
The trouble is do we really know the values to plug in to that formula ?
Offline
(0.10*0.11 + 0.9*0.89)
To my mind this reads as P(flag | not spam) x P(flag) + P(~flag) x p(~flag | not spam)
Which using Bayes simplifies to: P (flag and not spam) + P (~flag and not spam)
Which perhaps also simplifies to: P (not spam)
0.10*0.11 / (0.10*0.11 + 0.9*0.89)
This I thought came from: P(flag and not spam) / P(not spam)
Is that what you meant and why you made that calculation?
If you let A = flag
and let B = not spam
Then apply Bayes of P (A and B) = P (A | B ) x P (B)
then this supports this logic.
But P(A | B) = P (flag | not spam)
unless I am mistaken we want P (not spam | flag)
So we want P (B | A) perhaps ?
P (B | A) x P(A) = P(B and A)
The trouble with that argument is that it does not work. Perhaps your answer is correct.
because P(B and A) = P (A and B)
so P (B | A) = P(A and B) / P(A)
The problem with that is that it gives us simply the answer 0.11 which does not make sense.
It would be daft to ask that question so are you sure you have formulated everything correctly
and consistently.
Make sure you use the term "flag" consistently and do not confuse with "spam".
Is 0.1 definately the probability that a random message is spam?
Not 0.1 being the probability that a random message is flag?
given that 1/10 of message is a spam message.
Does this mean that 0.1 of all messages are flagged as spam ?
Or that 0.1 of all messages are spam ?
So P(flag) = 0.1
Or is it P(spam) = 0.1 ? (I think this version is more likely, but you have used the other)
I will have to stop for now. I am sharing your confusion about this it is a complicated problem
and the formulation of the probabilities originally is not very clear.
Last edited by SteveB (2013-07-24 09:37:32)
Offline
I think the answer might be: (0.099 / (0.089 + 0.099)
Reason:
Let us assume that P(spam) = 0.1
and that P(not spam) = 0.9
Now I will consider the two possibilities of flag and not flag as if it were an event afterwards:
Spam Event Flag Event Probability of Spam Event AND Flag Event
True(0.1) True (0.89) 0.089 *
True(0.1) False(0.11) 0.011
False(0.9) True (0.11) 0.099 *
False(0.9) False(0.89) 0.801
The two that I have put * next two are very significant in this because both concern a
situation where the flag event is true. So in these cases the message has been flagged
as spam. So the total probability of this is (0.089 + 0.099).
We are also interested in finding the probability of the situation where the spam event
was false but the flag event was true individually. This is (0.099).
Hence P(not spam | flag ) = 0.099 / (0.099 + 0.089) = 0.526596 (to 6 dp.)
If you do that style of analysis with the assumption that P(flag) = 0.1
then it gives a rather silly answer of 0.11 - silly because it only uses half the tree
and does not use Bayes or any conditional logic and is just the probability of an incorrect
flagging of something not a spam under this assumption on the basis of (1-0.89) = 0.11
This seems too simple to be true for this type of problem.
So I prefer the answer 52.65 .... etc
Last edited by SteveB (2013-07-24 17:09:28)
Offline
Pages: 1