You are not logged in.
Pages: 1
Currently I`m working with a positively skewed data.
I would like to calculate the range of values that represent x% of the data (CI). How do I do this with skewed data?
Offline
Hi;
Why not post the exact problem?
In mathematics, you don't understand things. You just get used to them.
If it ain't broke, fix it until it is.
Always satisfy the Prime Directive of getting the right answer above all else.
Offline
Exact problem is I'm trying to calculate what's the range of income of 98% of the population.
The income distribution is positively skewed.
Normally, if the data is normally distributed, I would:
1. Calculate the mean, std dev
2. use 2 std dev to find the range below the mean to find out the answer. (that 98% of the population earns mean - 2 std dev or more)
Offline
Is that all you have? What is the PDF? How can anyone compute the area under a curve that is unknown? Do you have the data?
In mathematics, you don't understand things. You just get used to them.
If it ain't broke, fix it until it is.
Always satisfy the Prime Directive of getting the right answer above all else.
Offline
sorry bobbym, I don't have the data.
This is just a theoretical question as I my professor couldn't answer my question regarding the use of CI from the normal distribution on a non-normal distribution data, i.e., the income distribution.
Am I right to assume that the use of CI from normal distribution theory cannot be used in explaining the non-normal distribution data?
If yes, how do I calculate the CI for the non-normal distribution data.
Offline
hi bechau
Welcome to the forum.
The normal distribution is probably the most studied statistical function. It's symmetrical and just two parameters (mean and standard deviation) are sufficient to completely determine its behaviour. Once you start looking at new distributions you have to have data in order to make similar analyses.
The starting point would have to be to gather lots of actual figures for income. Once you have that, you might be able to fit a function to the data, but, you'll probably already have the answer to your question as it will be embedded in the data.
Also, bear in mind that anything you can calculate will only be valid for that 'population'. Income values vary enormously around the world.
There's a small article at
http://www.mathsisfun.com/data/skewness.html
and a longer one at
http://en.wikipedia.org/wiki/Skewness
Bob
Children are not defined by school ...........The Fonz
You cannot teach a man anything; you can only help him find it within himself..........Galileo Galilei
Sometimes I deliberately make mistakes, just to test you! …………….Bob
Offline
Am I right to assume that the use of CI from normal distribution theory cannot be used in explaining the non-normal distribution data?
Without knowing more I can do little. Skewed curves can look very different from the SNC.
Am I right to assume that the use of CI from normal distribution theory cannot be used in explaining the non-normal distribution data?
You mention the income distribution? Is that the distribution you want to compute the percentages from?
Read the links and given by Bob and see if that helps.
In mathematics, you don't understand things. You just get used to them.
If it ain't broke, fix it until it is.
Always satisfy the Prime Directive of getting the right answer above all else.
Offline
Thanks guys.
Actually, the short article is the one that got me thinking and asking the professor that very question.
In his class, he shows a theoretical distribution of income that looks exactly like the one in the short article, a positively skewed distribution (mean is towards the lower income). There was no real data behind it and we were just debating on the estimating CI. I said that since the data is skewed, we cannot use the Excel STDEV's value to mark the CI range. He agrees. But when I ask him how to define the CI for skewed distribution data. He couldn't answer my question.
As for the long article, I read it but didn't find the answer I was looking for, i.e., what is the income range that 95% of the population belongs to?
If I were to have the real data, I CAN calculate from the accumulative numbers of people at each income intervals and find out the answer to my question.
However, what I really want to know was how to use Excel to work on a set the sample non-normal distributed data to answer this question.
Offline
I'll check out the Excel function. Back later.
Last edited by Bob (2014-04-07 23:11:42)
Children are not defined by school ...........The Fonz
You cannot teach a man anything; you can only help him find it within himself..........Galileo Galilei
Sometimes I deliberately make mistakes, just to test you! …………….Bob
Offline
The Excel function SKEW calculates
This is only one of a number of formulas for calculating skewness.
In one of my examples I had a frequency distribution (ie. a column of x values and a column of frequencies). I calculated the skew from the formula above and using SKEW and got wildly different answers. But I then realised that the Excel SKEW function has no inputs for frequency so was treating each of my x values as if it occurred just once. So it cannot be used in these circumstances. I haven't found a 'formula' that will act as a generating function. You probably have to create a scatter graph using income and frequency and then try to fit a function by trial and improvement.
Bob
Children are not defined by school ...........The Fonz
You cannot teach a man anything; you can only help him find it within himself..........Galileo Galilei
Sometimes I deliberately make mistakes, just to test you! …………….Bob
Offline
This is just a theoretical question as I my professor couldn't answer my question regarding the use of CI from the normal distribution on a non-normal distribution data, i.e., the income distribution.
A smooth kernel distribution is the theoretical answer to your theoretical question. With it you can compute the mean, variance and other moments. Also, the area under the curve of the PDF can be integrated giving the probabilities you want.
However, what I really want to know was how to use Excel to work on a set the sample non-normal distributed data to answer this question
A smooth kernel distribution is possible provided you have the data points. This should be useful because we can then treat it as any other PDF. If you have a picture of the curve, post it and the data points can retrieved.
Trouble is, that although I see that many computer programming languages can create this smooth kernel distribution Excel is not one of them.
In mathematics, you don't understand things. You just get used to them.
If it ain't broke, fix it until it is.
Always satisfy the Prime Directive of getting the right answer above all else.
Offline
Pages: 1