You are not logged in.
Pages: 1
hi ducmod
I'll be happy to provide further help. You'll need to understand the formula for variance and co-variance. The book I have used to teach from is called Advanced Level Statistics by A Francis. ISBN 0-85950-451-4 My book was printed in 1986 so it may not be available now.
Post further questions when you are ready.
Bob
Bob, thank you very much for your help! Trust me, I will be back Very soon. It's challenging for me to learn math on my own, but it's the only option I have now.
Have a grate one!
hi ducmod
Welcome to the forum.
It sounds like you are describing the Pearson correlation coefficient. (There are others.)
There's a wiki page but, if you are new to this subject, you may find it difficult.
http://en.wikipedia.org/wiki/Pearson_pr … oefficient
There's a lot of theory behind that formula. I'll try to give you an outline.
If you take your x and y pairs and plot them on a graph you'll get a scatter graph. You can get a good idea of how correlated two variables are just by looking at the points. If they are randomly scattered then there's no correlation at all. If they lie in a straight line, there's a perfect correlation between the two and you could work out the equation of the line in the form y = mx + c.
That's unlikely in practice; the points may show a tendency towards a straight line but with not all the points lying exactly on a line. The Pearson correlation coefficient is a way to calculate a number (always between -1 and + 1) that indicates how close to the line the points are.
Pretend you had the line drawn on the scatter graph. For each point calculate how far above or below the line it lies. Square this distance to ensure all these measures are positive. Now, add them up. The resulting sum gives a measure of how well that line fits the points. If you change the line, all the calculations lead to a new sum. The Pearson CC gives the line when the sum of squares is a minimum. In summary it's the line when the distances (squared) are the lowest they could be.
So there's a load of algebra for calculating these distances, then squaring and adding. What the minimum gives is the 'm' and 'c' for the best fit line*. From that you can generate a number that has the desired property (-1 < p < +1) and that's why the formula has that form.
It's a part of regression analysis; or method of least squares. I cannot find either on the Maths Is Fun site but there are bound to be loads of on-line sites that cover this. It's a while since I taught this, so I'd have to remind myself how it all goes. But If you're keen, I'll have a go. I'll warn you now, it will take a few posts to cover it all and the algebra is tough (well it is for me anyway ). Post again if you want to know more.
Bob
* It's called the Y on X regression line and can be used to predict a fresh Y given an X. Strictly, if you wanted to predict an X from a Y you should calculate the X on Y line, which is done by calculating horizontal distances from the points to the line.
Hello Bob!
Thank you so much for your answer and your explanations. Yes, indeed, I don't only want, I have to learn this, though when posting the initial question I actually thought about a simple math issue of the logic behind multiplication of two given variables, what this multiplication represent and mean.
I would be very grateful for your further help, but before that I have to do some additional work, and to start reading a textbook on statistics.
I am at the initial stage, learning math on my own. Difficult, challenging, and still can't develop understanding of even basics things (sometimes more difficult ones seem to be much easier to grasp).
Thank you!
Hello!
Here is the quote of mathisfun explanation of correlation formula and after ## my understanding or questions:
Let us call the two sets of data "x" and "y" (in our case Temperature is x and Ice Cream Sales is y):
Step 1: Find the mean of x, and the mean of y
Step 2: Subtract the mean of x from every x value (call them "a"), do the same for y (call them "b")
## with step 2 we compute how each variable differs from the mean
Step 3: Calculate: a × b, a2 and b2 for every value
## here I come to the point where I need help: I understand that we have to square each value from step 2 (a and b)
to avoid negative numbers;
## but I don't understand the meaning (ligic; why) of multiplication of variables from step 2 a x b
Step 4: Sum up a × b, sum up a2 and sum up b2
Step 5: Divide the sum of a × b by the square root of [(sum of a2) × (sum of b2)]
## in step 5 again I don't understand the logic of multiplication, what does this multiplication mean; and then the division.
## usually, division shows how many parts of divisor are in divident, or percent.
Thank you!
Pages: 1