You are not logged in.
Had an interesting example the other day from basic renewal theory that can easily be explained without doing any math:
Suppose you're rolling a fair 6-sided die and you're interested in average number a trials to get a certain sequence, lets make it only 2 rolls long for simplicity, like "34". By grinding the math (involving probability generating functions) it can be shown that on average it takes 36 trials for "3 4" to appear, which totally makes sense ( since P(any particular #) = 1/6 => 6 trials for 1 number, 6^2 for two). But if you're waiting for "33" for example, surprisingly it takes 42 on average. (This was kind of funny cause prof took a poll before giving an answer and majority of class voted wrong - for needing less trials than 36). It has a really simple logical explanation though - if anybody wants to give it try.
Offline

Without doing any number crunching, here's why I think the 33 sequence should take a bit longer:
Why did the vector cross the road?
It wanted to be normal.
Offline

This problem implicitly treats 34 and 43 the same.
X'(y-Xβ)=0
Offline
I'd like to see the reasoning behind 1/36 and 1/42. Can you please post it?
Offline

Hi qmech;
When the OP posted he and the subsequent answers were unaware that I had already posed this problem.
http://www.mathisfunforum.com/viewtopic … =12832&p=6
Only I had used (1,1) and (1,2).
See posts 109 and 148 for an interesting problem set. If you cannot solve that I will post the math answer to the problem there.
If you want the reasoning what is wrong with post#2?
In mathematics, you don't understand things. You just get used to them.
If it ain't broke, fix it until it is.
 Always satisfy the Prime Directive of getting the right answer above all else.
Offline
I looked at 109 and 148, but didn't see an explanation for this problem.
Yes, looking at the sequence 1,1 vs 1,2 should be the same thing.
I suppose that looking at the sequence Heads, Heads vs. Heads, Tails should also be the same for a coin flipping game.
If the problem is stated: "Let's toss a dice until 33 or 34 shows up. Then someone wins. Then we start all over again (from scratch). Keep going. Who wins more often, and what is the expected number of tosses before a win?", I don't see why one sequence is preferred over another.
Any number tossed other than 3 or 4 is noise and effectively restarts the game. Keep tossing a dice until a 3 shows up. The next toss determines if 33 wins, if 34 wins, or if the problem starts all over again. Since the next toss is equally likely to be 3 or 4, I don't see why the sequence 33 is preferred over 34 or vice versa.
Here is some C code for a simulation:
#include <conio.h>
#include <math.h>
#include <stdlib.h>
void main( void )
{
  unsigned ctr,toss, sw, n33, n34;
  float tot33, tot34;
  float ave33, ave34;
  clrscr();
  tot33=0L;
  tot34=0L;
  n33=0;
  n34=0;
l:
  for(sw=0,ctr=0;(n34<30000) && !kbhit();)
  { toss = 1+6.*rand()/RAND_MAX; /* printf("%d\n",toss); */
    ctr++;
    if (0==sw)
    { sw = (3==toss) ? 1 : 0; }
    else                      /* 1==sw */
    { switch(toss)
      { case 3: /*printf("33: %d\n",ctr);*/
     n33++; tot33+=ctr;
     ave33 = tot33/n33;
     goto l;
    case 4: /*printf("34: %d\n",ctr);*/
     n34++; tot34+=ctr;
     ave34 = tot34/n34;
     goto l;
    default: sw=0; break;
    } }
  }
  printf("33: #=%u; tot=%f; ave=%f\n",n33,tot33,ave33);
  printf("34: #=%u; tot=%f; ave=%f\n",n34,tot34,ave34);
}
The result of running this simulation until "34" has 30000 wins is:
33: #=30160; tot # tosses=633824.000000; ave # of tosses=21.015385
34: #=30000; tot # tosses=633490.000000; ave # of tosses=21.116333
Both a win with "33" and "34" are averaging 21 tosses. The 0.1 difference, and whether 33 or 34 is more, varies depending on how many games are run and the starting random number seed.
21 tosses seems reasonable because it's close to half of 36 + 2 (sequence length). For instance, the expected number of tosses needed to get one 3 by tossing one dice should be 3.5(?).
So I'm still wondering what the reasoning is behind the 36 and 42 numbers.
I do see that in the complete set of 3 tosses of a fair coin,
HHH
HHT
HTH
HTT
THH
THT
TTH
TTT
there are 4 sequences containing HT, and only 3 containing HH. However if this were a game with a notion of time, then HH wins in 3 sequences and HT wins in 3 sequences. What happened is that HHT contains an HT sequence, but HH had already won this game on the 2nd move.
Offline

Hi
I looked at 109 and 148, but didn't see an explanation for this problem.
There is not any over there as it is a problem which I posed prior to this thread. 
I have run simulations too and get 36 and 42. Please check your simulation because 21 for both is not even close. Are you counting them by pairs?
Remember it has to be a (3,4) not a (4,3).
First let us clear some points up!
For instance, the expected number of tosses needed to get one 3 by tossing one dice should be 3.5?
For one thing the expected number of throws to get a 3 is 6 not 3.5. 3.5 is the expected value or mean, not the same as the expected number 6.So you automatically know that 21 is out. If you do not believe about the six run your own simulation.
Please rethink your first simulation also. Keep in mind that post #2 is absolutely correct. So they do not have the same expected number.
Super Important!
It will be useless to show you the math until you have convinced yourself using your own methods of what the answers might be.
In mathematics, you don't understand things. You just get used to them.
If it ain't broke, fix it until it is.
 Always satisfy the Prime Directive of getting the right answer above all else.
Offline

I think it's important to point out that, in isolation, the expected rolls to get a "33" string is more than the expected rolls to get a "34" string.
However, when trying for them together, either string is equally likely to show up first.
Those two things aren't a contradiction.
Why did the vector cross the road?
It wanted to be normal.
Offline

Yikes, you are telling the end of the movie, giving away the answer to my problem in the other thread! Maybe, no one will see it.
Yep, it is a crazy problem. Very counterintuitive.
In mathematics, you don't understand things. You just get used to them.
If it ain't broke, fix it until it is.
 Always satisfy the Prime Directive of getting the right answer above all else.
Offline
I agree that the expected number of rolls to get one 3 by repeatedly tossing one dice is 6, not 3.5.
This shows up both in a simulation using code similar to the code I posted before, as well as summing the series:
s*1 + f*s*2 + f*f*s*3 + f*f*f*s*4 + ...
where s = 1/6 and f=5/6 (read success or failure), and the 1,2,3... are the number of tosses
(calculating the expected value of the number of tosses).
However we still disagree on the expected number of tosses for the sequences 33 and 34.
If you did a simulation, please post your simulation code.
If not, please post your derivation. There is no point in drawing this out further. I've already shown you my work, now please show me yours.
Offline

Hi qmech;
If the problem is stated: "Let's toss a dice until 33 or 34 shows up. Then someone wins. Then we start all over again (from scratch). Keep going. Who wins more often, and what is the expected number of tosses before a win?", I don't see why one sequence is preferred over another.
If not, please post your derivation. There is no point in drawing this out further. I've already shown you my work, now please show me yours.
There are 3 math solutions.
1) An analytical solution (difficult)
2) Absorbing Markov chain (harder)
3) Renewal theory ( easy to do , harder to understand)
Before we look at this, there is much more to draw out of this. Mathsypersons posts are the key to this problem.
You cannot do what you have done in your first quote (top of this post). That game and the OP's post are not the same thing. That is where you are going wrong. You must isolate both situations. Your game and your C code does not do that. Why should you believe the math when your simulation says something else?
You have a means of answering this question. I have said that you need to work on your program. Once you see the answer for yourself, it will be much easier for you to follow the coming math. 
Otherwise when I show you mathematically 36 and 42 you are going to say, bull! Please trust me, first things first!
Incidentally my simulation is written in Mathematica code. It is a functional programming language much different then the procedural C or C++.
I am preparing all 3 math answers, give me some time. In the meantime I urge you to recheck your thinking that produced that code.
In mathematics, you don't understand things. You just get used to them.
If it ain't broke, fix it until it is.
 Always satisfy the Prime Directive of getting the right answer above all else.
Offline
If you have Mathematica code, please post it. I will try to read how your game is structured from the code.
If I'm not looking at the right game, please post a clear, concise description of the problem that you are describing. I will try to see how I've misinterpreted the problem.
Offline

Hi qmech;
First the math:
When one speaks of the "expected number of attempts needed to get one successful attempt," one might conservatively approximate it as the reciprocal of the probability of success for such an attempt.
So to get a rough estimate for 34 or 33 appearing in a string of {1,2,3,4,5,6}'s is 1 / (1 /36 ). So we have a rough estimate that it is around 36 throws. Could be more.
We do 33 first.
1) Analytical method:
Say that W is the number of throws to get a 33. So 5 / 6 of the time you will throw someting other than a 3 so that means you have made no progress  and it is W + 1, that is the first term.
The next possibility is a 3 and then something other than a 3, this is (W + 2) and happens (1 / 6) ( 5 / 6 ) of the time.
The third possibility is a 3 and then a 3, This takes 2 throws and happens ( 1 / 6 ) ( 1 / 6 ) of the time.
Solve for W.
2) Absorbing Markov chain:
With states of (3O,33,O3,OO ). O means anything other than a 3.
Going from state to state fills in the stochastic P matrix.
Standard calculations are then done on P. Outlined in any book on Markov chains.
Matrix Q:
Matrix S:
Fundamental Matrix T:
Times to absorption t:
The first and last columns of t are the important ones. It represents O -> 33, (meaning when starting from scratch). Notice the number in that column = 42.
3) Renewal theory:
Renewal theory says that the expected number of throws for 33 is with P(3) = p =1 / 6
Now for 34:
The analytical method is out because I believe it contains difficult series in it. I am not sure. So we stick to method 2 and 3.
2) Markov chain:
With states of (O,3,34 ). Going from state to state fills in the stochastic P matrix. In this case it is smaller than for 33.
Standard calculations are then done on P. Outlined in any book on Markov chains.
Matrix Q:
Matrix S:
Fundamental Matrix T:
Times to absorption t:
Look at the first column that is state O to state 34. Notice the 36 underneath that is the expected number.
3) Renewal theory says the expected number to get a 34 in a long string of {1,2,3,4,5,6}'s.
With P(3) = p =1 / 6 and P(4) = q = 1 / 6, is:
In mathematics, you don't understand things. You just get used to them.
If it ain't broke, fix it until it is.
 Always satisfy the Prime Directive of getting the right answer above all else.
Offline
Thank you for showing your calculations.
I am getting the following results:
1. The game is: 1 person repeatedly throws 1 dice. He stops when he gets 3,3.
Result: The game takes on average 42 rolls. (weird!)
2. The game is: 1 person repeatedly throws 1 dice. He stops when he gets 3,4.
Result: The game takes on average 36 rolls.
3. The game is: 1 person repeatedly throws 1 dice. Each time he rolls he increments a "number of throws" counter. Each time gets a 3,3 he increments a "number of sequences" counter.
Result: After many throws, the average:
"number of sequences" / "number of throws" = 36.
4. The game is: 1 person repeatedly throws 1 dice. Each time he rolls he increments a "number of throws" counter. Each time gets a 3,4 he increments a "number of sequences" counter.
Result: After many throws, the average:
"number of sequences" / "number of throws" = 36.
5. The game is: 1 dice is repeatedly thrown. 2 people look at the same dice. Person A wins if a 3,3 comes up first. Person B wins if 3,4 comes up first.
Result: The game is fair. Person A and Person B win about equally often. Each game lasts about 21 throws.
Remember, a game stops if either sequence 3,3 or 3,4 show up.
6. The game is: 
Person A throws person A's dice. Person A wins if a 3,3 comes up on person A's dice. 
Person B throws person B's dice. Person B wins if a 3,4 comes up on person B's dice. 
Each person throws, then they look at their dice at the same time. 
Person A could win, Person B could win or they could both win.
Result: The game is not fair. Person B wins more often. 
Person B wins about 7% more often. 
The average number of games for person A to win is about 20.1.
The average number of games for person B to win is about 20.3
The results for games 1-5 seem ok. 
That game 6 isn't fair and that person B wins more often also seems ok. 
However I'm not sure why the average number of games is 20+ for scenario 6 vs. 21 for scenario 5.
Offline

Hi qmech;
1. The game is: 1 person repeatedly throws 1 dice. He stops when he gets 3,3.
Result: The game takes on average 42 rolls. (weird!)
Not exactly. Counterintuitive, our experience often does not hold up in probability. The math I posted and the simulation agree. I would think it weird if they did not.
Okay we are getting out of mathland and heading to simulation land. The difference between 5 and 6 seems perfectly natural to me.
Before I say why, let me ask you this. How big was your simulation? What random number generator did you use?
In mathematics, you don't understand things. You just get used to them.
If it ain't broke, fix it until it is.
 Always satisfy the Prime Directive of getting the right answer above all else.
Offline
It's beginning to make intuitive sense.
Suppose you play a game between 2 people, and the rules of the game are as follows:
1. Player A wins if he gets a sequence 1,2.
2. Player B wins if he gets a sequence 3,4.
The expected number of rolls to win for both A and B are the same. It's a fair game.
Now modify rule 1:
3. Player A wins if he gets a sequence 1,2. However, upon rolling the first 1, he has to flip a coin. If the coin lands heads, he continues as before. If the coin lands tails, he has to start all over again (effectively erasing the '1' he already has).
Clearly player A is playing under a handicap. His expected number of rolls increases. A game between A and B is now clearly unfair.
Offline

Hi qmech;
Although I am a big fan of developing one's intuition, especially in competition, I would urge you to pay attention to my signature! I suggest checking when possible all gut feelings, when there is time to do so. That means developing the math skills and or running simulations.
Now back to the difference between 5) and 6).
Before I say why, let me ask you this. How big was your simulation? What random number generator did you use?
In mathematics, you don't understand things. You just get used to them.
If it ain't broke, fix it until it is.
 Always satisfy the Prime Directive of getting the right answer above all else.
Offline

thread post #2 (by mathsy) is very interesting indeed, and 
I think it is most likely true.  
If you want, make it a 2 sided die, and think about that for a simpler example.
igloo myrtilles fourmis
Offline
The random number generator is from Borland's Turbo C.
The number of simulations is greater than 1 million. 
To read the result of the simulation below, use "ave = tot / #".
sequence 33: #=1007117; tot=20197276.000000; ave=20.054548
sequence 34: #=1174266; tot=23728800.000000; ave=20.207346
Offline

Hi qmech;
Just as a point of interest random number generators, even from commercial packages are in some cases known to be poor. Especially ones that use linear congruential algorithms. It is possible to show under certain circumstances that they are not random enough. The newest one I believe is the Mersenne Twister algorithm, it is considered best.
I million is fine. The reason that 6 and 5 are not the same is that with 5 there is only one winner, with 6 there can be two winners. This probably accounts for the small difference.
In mathematics, you don't understand things. You just get used to them.
If it ain't broke, fix it until it is.
 Always satisfy the Prime Directive of getting the right answer above all else.
Offline
Let's calculate the expected number of throws to get a 3,3 by counting the number 
of sequences that win in exactly n throws. Clearly the last 2 throws are 3,3 and 
that there are no 3,3 subsequences prior to that.
Call an n-seq a sequence of length n where the last 2 throws are 3,3.
For instance, some 6-seq are:
124533, 343533, 215533, 236533, but not 
433133, 335633, 145333.
Clearly the number of:
2-seq=1 (33)
3-seq=5 (133,233,433,533,633)
<expected number> = 
# throws=n * prob( n throws ), summed over n=1 to infinity.
= 2*(1/36) + 3*(5/216) + ...
We need to count the number of n-seq.
To create an n+1-seq from an n-seq, prepend a throw to the n-seq. If the first 
throw of the n-sequence was a 3, there are 5 choices [1,2,4,5,6]. If the first 
throw was not a 3, there are 6 choices - 5 of which are not 3 and 1 of which is a 3.
Call:
A(n) := n-seq whose 1st throw != 3.
B(n) := n-seq whose 1st throw  = 3.
Then:
A(3)=5, B(3)=0, and the number of 3-seq = A+B = 5.
A(4)=25,B(4)=5, and the number of 4-seq = A+B = 30.
The recursion relation is:
A(n+1) = 5*A(n) + 5*B(n)
B(n+1) = A(n);
Programming a computer to actually add all these numbers together gives:
expected value  = 41.972222, where I must be losing accuracy from rounding error.
A similar calculation for 3,4 with
A(n) := n-seq whose 1st throw != 4.
B(n) := n-seq whose 1st throw  = 4.
A(n+1) = 5*A(n) + 4*B(n)
B(n+1) =   A(n) +   B(n)
A(3)=5, B(3)=1
gives:
expected value  = 35.972222
Offline

Hi qmech;
I did not know you knew about recurrences. I would have tried to set up the problem using them instead of the 3 ways I did.
Okay, so now we have 4 ways to compute the expectation of 33 and 34. You can shorten your (33) recurrence to,
a[n+1] = 5 (a[n] + a[n-1])
In order for me to find where the roundoff error is occurring I need to see more of your calculation. Offhand that recurrence is what we call an error magnifier and should not be run in the forward direction as you have probably done.
For (33) again this is what your recurrence - sum idea really looks like.
Your idea is correct but it is cumbersome. You are trying to solve the problem by using the series expectation formula. While this correct it is very difficult to do this one in practice.
Is not the renewal theory or the analytical, that I showed easier?
In mathematics, you don't understand things. You just get used to them.
If it ain't broke, fix it until it is.
 Always satisfy the Prime Directive of getting the right answer above all else.
Offline
The analytical approach is much easier. I've verified it works in another simpler problem. However I don't see the logic behind it yet. I have found some pdf's about it and am trying to digest them.
What do you mean by "run in the forward direction"?
Offline

It is very similar to the recursive thinking you did for your recurrence. It is a recurrence also.
Do you remember the links for the pdfs you found? Can you post them?
"Running it in the forward direction" means computing it just as you see it. For instance:
a[n+1] = 10 a[n] + 3 with a[0]=1
a[1]= 10 * 1 + 3 = 13
a[2]= 10 * 13 + 3 = 133
a[3] = 10 * 133 + 3 = 1333
.
.
.
That is the forward direction and in this case we say it is not robust because it suffers from smearing.
In mathematics, you don't understand things. You just get used to them.
If it ain't broke, fix it until it is.
 Always satisfy the Prime Directive of getting the right answer above all else.
Offline
I presume took M =
5 5
1 0
By decomposing M = S D (S inverse), with D being the eigenvalues:
( 5+3root5)/2, (5-3root5)/2 )
with S containing the corresponding eigenvectors, you get:
M^n = S D^n (S inverse).
The entries of M^n are powers of the eigenvalues suitably
jumbled together by the eigenvectors. From there you get your sum.
The sum is in principle summable exactly since it is the differential
of a geometric series, but things are getting horribly messy by now.
Is there any easier way to do this?
Yes, I added the terms up in the "forward direction".
What is 'smearing', and is there a better way to do the sum (presumably backwards?)?
The 2 articles I found are:
http://www.dartmouth.edu/~chance/teaching_aids/books_articles/probability_bookChapter11.pdf
http://protist.biology.washington.edu/biol429/Lecture%20notes/Lecture8_Analyzing_Markov_Chains.pdf
Offline