You are not logged in.
Pages: 1
If the die is fair, and I roll a die 6000 times, then the occurrence for each number (1,2,3,4,5,6) will be 1000 times.
In reality, the number of occurrence will not be the same. Is there any approach to determine whether the die is based or not? If the total outcome of 6 is 3500 times, and the outcome for the rest of numbers are 500 times, then I can see there is a clear biased. I get no idea on how to draw the line on defining how much occurrence will indicate a biased die.
Does anyone have any suggestions?
Thanks in advance for any suggestions
Last edited by oem7110 (2010-12-09 01:44:43)
Offline
hi oem7110
If the number of throws were small, the probability of getting a six, P(6) can be modelled by using the binomial distribution. For a large number of throws it is possible to approximate by using the normal distribution. For any particular values you need access to normal distribution tables.
Maths is fun does have a normal distribution table but it's not so easy to use for 'right hand tail' questions like yours so I suggest
http://www.math.unb.ca/~knight/utility/NormTble.htm
The table here comes in two parts; the first is where the experiment yields results that are critical (ie. hard to tell if the die is biased or not); the second shows much less critical results because it's pretty obvious the die is biased. Your example of 3500 was off the end of even that table.
Binomial first: Say you assume the die is not biased. Then P(6) = 1/6, and, if youve rolled it 6000 times, then the expected number of sixes is 1000 with a variance of 6000 x 1/6 x 5/6 = 833.33.
Normal approximation: You then construct a normal distribution with the same parameters, ie E = 1000, V = 833.33.
Now back to your experiment. Is 3500 sixes unexpected for this distribution?
I dont need tables to tell me this is unusual as this result is so far above the expected. So let's take a more critical value.
Suppose you get 1100 sixes. Is this unusual?
You do the calculation (1100-1000)/SQRT(833.33) = 3.46 and look this up in normal tables.
This table gives for 3.4 the following probability
3.4 ... 0.0003369
Thats how likely it is that the result of 1100 sixes could have come from an un-biased die, so its very safe to assume it is biased.
Another even more critical example.
Supposing the number of sixes was 1050.
You do the calculation (1050-1000)/SQRT(833.33) = 1.73 and look this up in normal tables.
1.70 ... 0.9554 1.71 ... 0.9564 1.72 ... 0.9573 1.73 ... 0.9582
On this part of the table thats the probability of being to the left of the experimental result so we need to do 1 -0.9582 to get the probability of being beyond this value.
ie. Theres a probability of 0.0418 of getting this result from an un-biased die. Thats pretty low but is it so low that youre happy to reject the idea that the die is fair. You have to make a judgement. For a die, Id be fairly happy to conclude it is biased but, if my life depended on getting a right result here I think Id want to repeat the experiment first.
Bob
Last edited by Bob (2010-12-09 05:09:50)
Children are not defined by school ...........The Fonz
You cannot teach a man anything; you can only help him find it within himself..........Galileo Galilei
Sometimes I deliberately make mistakes, just to test you! …………….Bob
Offline
hi oem7110
I've been thinking some more about this. Are you interested in further notes on the normal distribution? I've got a plan in my head for a full explanation with diagrams, but don't want to spend time putting it all into a post unless you're interested.
Bob
Children are not defined by school ...........The Fonz
You cannot teach a man anything; you can only help him find it within himself..........Galileo Galilei
Sometimes I deliberately make mistakes, just to test you! …………….Bob
Offline
Could you please tell me how much sample size should be collected in order to determine whether I should use binomial or normal distribution?
When I look up the value 3.46 in normal tables (0.0003369), how can I know whether the value indicates a biased or fair die?
If I draw a line on accepting 5% error, then when the number of 6 was 1050, and the determined value 1.73 equals to 0.9582 in normal tables,
the probability of 4.18% is considered to be fair die within 5% error, am I on the right track?
Thank you very much for your suggestions
Last edited by oem7110 (2010-12-09 09:00:03)
Offline
Hi oem7110;
I see you are continuing with your dice problems! Are you planning to open a casino?
Okay, no more kidding. You will have a better understanding when you see Bob's tables. Regarding this:
When I look up the value 3.46 in normal tables (0.0003369), how can I know whether the value indicates a biased or fair die?
In statistics you decide beforehand whether you will accept the hypothesis or reject the hypothesis, that the die are fair. based on the standard deviation. In other words I would pick 3 standard deviations first and then say if I conduct the experiment and the outcome is outside of this than I will say the die are biased. Sometimes we use 2 standard deviations and sometimes 99%.
If some empirical experiment is outside of 3 standard deviations that means there is less than one chance in 300 that the result could be due to chance. That does not mean it was definitely biased, just that it is a long shot that it is a fair die!
In mathematics, you don't understand things. You just get used to them.
If it ain't broke, fix it until it is.
Always satisfy the Prime Directive of getting the right answer above all else.
Offline
If I should use normal distribution, do I need at least 300 sample size in order to apply 2/3 standard deviations?
Will this sample size be the minimum requirement for normal distribution?
What about binomial distribution? Can this sample size apply to binomial distribution too?
Thanks everyone very much for any suggestions
Offline
Hi oem7110;
You do not need a sample size of 300. A good rule of thumb is a sample size of 30 is a good minimum for the Normal distribution.
In mathematics, you don't understand things. You just get used to them.
If it ain't broke, fix it until it is.
Always satisfy the Prime Directive of getting the right answer above all else.
Offline
There is one issue about selecting the right sample size I concern.
For example,
If I take 30 sample size for evaluation based on Normal distribution, the number occurrence of 6 is 20.
If I take 120 sample size for evaluation based on Normal distribution, the number occurrence of 6 is 20.
If I take 300 sample size for evaluation based on Normal distribution, the number occurrence of 6 is 20.
So from different point of views based on sample sizing,
for the first 30 sample, the number occurrence of 6 is 20, after that, no 6 occur in futher sampling.
the number occurrence of 6 seems too much based on 30 sample size,
the number occurrence of 6 seems fair based on 120 sample size,
but the number of occurence of 6 seems too little based on 300 sample size.
Therefore, to determine whether the die is fair or biased depends on how we select the sample size to fit what we want, and that is the problem I concern.
Does anyone have any suggestions on any approach on Math to solve the issue of sample sizing?
Thanks everyone very much for any suggestions
Last edited by oem7110 (2010-12-09 17:47:27)
Offline
The question you are asking is different now.
Therefore, to determine whether the die is fair or biased depends on how we select the sample size to fit what we want, and that is the problem I concern.
As you are phrasing it that may not be true.
You do not need a sample size of 300. A good rule of thumb is a sample size of 30 is a good minimum for the Normal distribution.
There are many types of distributions other than normal. There is a theorem which says no matter what type of distribution the population has, a randomly drawn sample from it, will be normally distributed provided the sample size is big enough. When n is >= 30 the sample will be approximately normally distributed no matter what the population distribution was like. That is all I said.
If you are asking how big an simulation ( a sample ) must be for you to have some confidence level for the mean you get then there is a formula for that.
In mathematics, you don't understand things. You just get used to them.
If it ain't broke, fix it until it is.
Always satisfy the Prime Directive of getting the right answer above all else.
Offline
Hi oem7110
Edited since two hours ago.
I've been out all day so I've only just got up to date with all your posts.
If I draw a line on accepting 5% error, then when the number of 6 was 1050, and the determined value 1.73 equals to 0.9582 in normal tables,
the probability of 4.18% is considered to be fair die within 5% error, am I on the right track?
Yes, that's about right I think! You should really word it like this: " I can reject the idea that the die is fair with only a 4.18% chance that I'm wrong."
Your question is "What sample size do I need to be 95% confident that I can reject the hypothesis that the die is fair."
Here's what a normal graph looks like:
E is the 'average' or expected value. It is found from the formula
The 'spread' of the curve is found from the formula
The normal tables that I suggested look like this:
The numbers starting 0.9 are the probabilities of being to the left of the line X.
So for 95% confidence you want the computation
to evaluate to 1.65 or higher
So if you know p, E, V and X you can work back to find n.
But the trouble is you will not know X until you've done your 'n' trials.
If you put p = 1/6 then
So if you want
re-arranging gives
You can solve for 'n' using the quadratic formula.
I am just a bit worried I might have slipped up somewhere in this algebra so I'll check it when I'm more awake. (It's nearly 11.00pm GMT and I've had a long day including having to take someone with a broken arm to hospital!)
I've just tried X = 1050 in this and solved for 'n' and got two values of around 6000, so this seems about right.
Hope that answers the question,
Bob
Last edited by Bob (2010-12-10 23:46:10)
Children are not defined by school ...........The Fonz
You cannot teach a man anything; you can only help him find it within himself..........Galileo Galilei
Sometimes I deliberately make mistakes, just to test you! …………….Bob
Offline
Thank everyone very much for suggestions with detailed explaination
Do I need to every possible X in order to determine n? such as X = 1050, but when I try different X within the formula, could you please tell me when I should stop and find the right n?
Offline
Hi oem7110;
To work it the other way. Let's say you have thrown a die 6000 times and you count 1050 for 3.
How do you know that die is pretty much okay.
The expected number is:
So you expect to get 1000.
The standard deviation is 28.867 so you say
Your 1050 fits neatly in between the right and left values. The die is fair. If it did not you should reject the hypothesis that the die is fair.
This uses 3 standard deviations from the mean. In other words if you got 1200 for instance. You would say the die are biased because there is less than a one chance in 300 of that result happening with a fair die.
In mathematics, you don't understand things. You just get used to them.
If it ain't broke, fix it until it is.
Always satisfy the Prime Directive of getting the right answer above all else.
Offline
hi oem 7110
I went to sleep last night but the little guy in my head who does all the real thinking carried on trying to spot what I'd done wrong.
At 4.00am he woke me up to tell me, but I told him to shut up and went back to sleep.
There's a error with
It should be
It doesn't effect the working that follows because when I squared to eliminate the square root the sign error goes anyway.
My other worry was that I computed 'n' from a value of X = 1050 and got 6013. Should it have been more than 6000?
If the left graph here is centred on 6000 and the other on 6013 you can see that this second curve has more area to the right of X = 1050. In other words the probability has gone up from 4.18 to 5%. So that's ok.
As 'n' and 'X' are interdependant in the equations you have got a problem of which you choose and which you then calculate. I don't think there's a way round that. If you put the formulas into a spreadsheet type program like 'Excel' you could experiment quickly with different values until you get a set you like.
What most statisticians would do, is decide on 'n', do the trials, analyse the results and then maybe decide to repeat with a larger value of 'n'.
The little guy in my head says he enjoyed helping you with this question and is happy for you to post again if you've got another.
Bob
Children are not defined by school ...........The Fonz
You cannot teach a man anything; you can only help him find it within himself..........Galileo Galilei
Sometimes I deliberately make mistakes, just to test you! …………….Bob
Offline
Pages: 1