help on correct descriptive statistics of a project
Moderators: Leonid, amiradm, BioTeam

 Garter
 Posts: 27
 Joined: Thu Dec 06, 2007 1:53 am
 Location: Zele, Belgium
help on correct descriptive statistics of a project
Hi!
Quick background information on my project:
24 subjects (animals) divided in 4 groups (each group represents a type of diet)
The research is to find out wether a difference in type of diet results in a different effect in coagulation => I know I have to use ANOVA or Wilcoxon signed rank test, but our prof is very critical on the correct summarisation of the data before we do any test, and only the correct one (mean or median,not both)
So my question is:
the 24 subjects are like I said divided in 4 groups, witch means the number of subjects in each group is small=> Should I use mean or median??
greetings from Belgium
Quick background information on my project:
24 subjects (animals) divided in 4 groups (each group represents a type of diet)
The research is to find out wether a difference in type of diet results in a different effect in coagulation => I know I have to use ANOVA or Wilcoxon signed rank test, but our prof is very critical on the correct summarisation of the data before we do any test, and only the correct one (mean or median,not both)
So my question is:
the 24 subjects are like I said divided in 4 groups, witch means the number of subjects in each group is small=> Should I use mean or median??
greetings from Belgium
A grouped scatter diagram shows all the data and lets the reader see for themselves how skewed the data might be. Usually, you show the median of the group as a small line amidst the vertically distributed points of the individual data. Whether you use means or medians depends on how the data seem to be distributed. My first guess would be that your data will not be severely skewed—but that is pure guess work and could be wrong—and that the usual parametric tests (ANOVA and ttests) would work just fine. In other words, means and standard deviations would (probably) be the way to go.
Most parametric tests, like standard ttests and Ftests, assume an underlying normal distribution with random error. Chisquared and tstatistics, in particular, can be strongly biased by departures from normality. Ftests, on the other hand, are fairly robust toward “deviant” behavior. Nonparametric tests like the Wilcoxan don’t assume a normal distribution and are valid under a wider range of situations. The Wilcoxan, though, is analogous to a ttest. The nonparametric analogue to ANOVA is KruskalWallis. Hollander and Wolfe’s “Nonparametric Statistical Methods” is a great source for nonparametric analysis with lots of howto examples alongside the theory of the tests.
Most parametric tests, like standard ttests and Ftests, assume an underlying normal distribution with random error. Chisquared and tstatistics, in particular, can be strongly biased by departures from normality. Ftests, on the other hand, are fairly robust toward “deviant” behavior. Nonparametric tests like the Wilcoxan don’t assume a normal distribution and are valid under a wider range of situations. The Wilcoxan, though, is analogous to a ttest. The nonparametric analogue to ANOVA is KruskalWallis. Hollander and Wolfe’s “Nonparametric Statistical Methods” is a great source for nonparametric analysis with lots of howto examples alongside the theory of the tests.

 Garter
 Posts: 27
 Joined: Thu Dec 06, 2007 1:53 am
 Location: Zele, Belgium
First of all, thank you very much for the replies!
To see how skewed the data is per group I used QQplots (advised by our Prof) I also used Dot Plots (Dot Plots are said to be better than Boxplots when there are few subjects because they use a maximum on information of the data)
I'm still not sure if I have to use mean or median because it's not clear to me if there's normal distribution in each group. You can see a few outliers in 3 of the 4 groups but I don't know if they are systematic because of the small amount of data (I'll post the picture with the QQplots)
Because like you said Normal distribution is needed for parametric tests, it's best I make the right conclusions. Thanks for the help!
And by the way, these forums are the best thing I ever found on the internet. Not only to ask questions but the quantity of good and interesting topics is huge!
To see how skewed the data is per group I used QQplots (advised by our Prof) I also used Dot Plots (Dot Plots are said to be better than Boxplots when there are few subjects because they use a maximum on information of the data)
I'm still not sure if I have to use mean or median because it's not clear to me if there's normal distribution in each group. You can see a few outliers in 3 of the 4 groups but I don't know if they are systematic because of the small amount of data (I'll post the picture with the QQplots)
Because like you said Normal distribution is needed for parametric tests, it's best I make the right conclusions. Thanks for the help!
And by the way, these forums are the best thing I ever found on the internet. Not only to ask questions but the quantity of good and interesting topics is huge!
It’s going to be tough to get terribly good normal probability plots with such a small sample size. They look “OK” to my eye—but I warn you: what I am willing to accept and what your prof is willing to accept for deviations from normality may not be the same. You’d better take your lead from your prof, not from me, though you can have my opinion, for what it’s worth.
I notice there are two extra animals in Group D and two fewer animals in Group A than there should have been by design. There may be a perfectly good reason for that, but is there any chance that two animals in Group D were misclassified?
You might try both a histogram and another normal probability plot with all the data as if they represent a single sample from a single population. Would this ungrouped plot still fall on a straight line or does it look badly segmented? And do the data from any one group tend to fall in the same part of the line, assuming there is one, or are the data from all the groups likely to fall all up and down the same line with no evidence of clustering together in some way? Does the histogram of ungrouped data look like a uniform distribution, or is there any sign of either skewing or tailing or more than one mean or of clustering by groups?
I haven’t seen anything yet that strongly convinces me to switch from means and parametric statistics to medians and nonparametric statistics, though you can certainly calculate both and compare the results. When you use means as your measure of central tendency, you have a natural indicator of dispersion in the variance or standard deviation. When you use the median, however, you have a bit of a problem with measures of dispersion. Typically one quotes the 95th and the 5th percentiles or the quartiles or some particular interquartile range. You could also just give the total range of the data. The standard deviation has a natural relationship to the distribution function. Percentiles are just percentiles and remain percentiles no matter what the underlying distribution.
These normal probability plots aren't what I was thinking of when I said scatter diagrams, not that you have to do them.
I notice there are two extra animals in Group D and two fewer animals in Group A than there should have been by design. There may be a perfectly good reason for that, but is there any chance that two animals in Group D were misclassified?
You might try both a histogram and another normal probability plot with all the data as if they represent a single sample from a single population. Would this ungrouped plot still fall on a straight line or does it look badly segmented? And do the data from any one group tend to fall in the same part of the line, assuming there is one, or are the data from all the groups likely to fall all up and down the same line with no evidence of clustering together in some way? Does the histogram of ungrouped data look like a uniform distribution, or is there any sign of either skewing or tailing or more than one mean or of clustering by groups?
I haven’t seen anything yet that strongly convinces me to switch from means and parametric statistics to medians and nonparametric statistics, though you can certainly calculate both and compare the results. When you use means as your measure of central tendency, you have a natural indicator of dispersion in the variance or standard deviation. When you use the median, however, you have a bit of a problem with measures of dispersion. Typically one quotes the 95th and the 5th percentiles or the quartiles or some particular interquartile range. You could also just give the total range of the data. The standard deviation has a natural relationship to the distribution function. Percentiles are just percentiles and remain percentiles no matter what the underlying distribution.
These normal probability plots aren't what I was thinking of when I said scatter diagrams, not that you have to do them.
Who is online
Users browsing this forum: No registered users and 3 guests