Biology-Online • View topic - help on correct descriptive statistics of a project
Advertisement
Advertisement

## help on correct descriptive statistics of a project

Debate and discussion of any biological questions not pertaining to a particular topic.

Moderator: BioTeam

### help on correct descriptive statistics of a project

Hi!

Quick background information on my project:

24 subjects (animals) divided in 4 groups (each group represents a type of diet)

The research is to find out wether a difference in type of diet results in a different effect in coagulation => I know I have to use ANOVA or Wilcoxon signed rank test, but our prof is very critical on the correct summarisation of the data before we do any test, and only the correct one (mean or median,not both)

So my question is:

the 24 subjects are like I said divided in 4 groups, witch means the number of subjects in each group is small=> Should I use mean or median??

greetings from Belgium
Lucanus cervus
Garter

Posts: 27
Joined: Thu Dec 06, 2007 1:53 am
Location: Zele, Belgium

The median in your case would be the average of the two in the middle. You would be justified to use it if you found that one of them was an outlier.
Living one day at a time;
Enjoying one moment at a time;
Accepting hardships as the pathway to peace;
~Niebuhr

mith
Inland Taipan

Posts: 5345
Joined: Thu Jan 20, 2005 8:14 pm
Location: Nashville, TN

A grouped scatter diagram shows all the data and lets the reader see for themselves how skewed the data might be. Usually, you show the median of the group as a small line amidst the vertically distributed points of the individual data. Whether you use means or medians depends on how the data seem to be distributed. My first guess would be that your data will not be severely skewed—but that is pure guess work and could be wrong—and that the usual parametric tests (ANOVA and t-tests) would work just fine. In other words, means and standard deviations would (probably) be the way to go.

Most parametric tests, like standard t-tests and F-tests, assume an underlying normal distribution with random error. Chi-squared and t-statistics, in particular, can be strongly biased by departures from normality. F-tests, on the other hand, are fairly robust toward “deviant” behavior. Non-parametric tests like the Wilcoxan don’t assume a normal distribution and are valid under a wider range of situations. The Wilcoxan, though, is analogous to a t-test. The non-parametric analogue to ANOVA is Kruskal-Wallis. Hollander and Wolfe’s “Non-parametric Statistical Methods” is a great source for non-parametric analysis with lots of how-to examples alongside the theory of the tests.
blcr11
Viper

Posts: 672
Joined: Fri Mar 30, 2007 4:23 am

First of all, thank you very much for the replies!

To see how skewed the data is per group I used QQplots (advised by our Prof) I also used Dot Plots (Dot Plots are said to be better than Boxplots when there are few subjects because they use a maximum on information of the data)

I'm still not sure if I have to use mean or median because it's not clear to me if there's normal distribution in each group. You can see a few outliers in 3 of the 4 groups but I don't know if they are systematic because of the small amount of data (I'll post the picture with the QQplots)
Because like you said Normal distribution is needed for parametric tests, it's best I make the right conclusions. Thanks for the help!

And by the way, these forums are the best thing I ever found on the internet. Not only to ask questions but the quantity of good and interesting topics is huge!

Lucanus cervus
Garter

Posts: 27
Joined: Thu Dec 06, 2007 1:53 am
Location: Zele, Belgium

It’s going to be tough to get terribly good normal probability plots with such a small sample size. They look “OK” to my eye—but I warn you: what I am willing to accept and what your prof is willing to accept for deviations from normality may not be the same. You’d better take your lead from your prof, not from me, though you can have my opinion, for what it’s worth.

I notice there are two extra animals in Group D and two fewer animals in Group A than there should have been by design. There may be a perfectly good reason for that, but is there any chance that two animals in Group D were misclassified?

You might try both a histogram and another normal probability plot with all the data as if they represent a single sample from a single population. Would this ungrouped plot still fall on a straight line or does it look badly segmented? And do the data from any one group tend to fall in the same part of the line, assuming there is one, or are the data from all the groups likely to fall all up and down the same line with no evidence of clustering together in some way? Does the histogram of ungrouped data look like a uniform distribution, or is there any sign of either skewing or tailing or more than one mean or of clustering by groups?

I haven’t seen anything yet that strongly convinces me to switch from means and parametric statistics to medians and non-parametric statistics, though you can certainly calculate both and compare the results. When you use means as your measure of central tendency, you have a natural indicator of dispersion in the variance or standard deviation. When you use the median, however, you have a bit of a problem with measures of dispersion. Typically one quotes the 95th and the 5th percentiles or the quartiles or some particular interquartile range. You could also just give the total range of the data. The standard deviation has a natural relationship to the distribution function. Percentiles are just percentiles and remain percentiles no matter what the underlying distribution.

These normal probability plots aren't what I was thinking of when I said scatter diagrams, not that you have to do them.
blcr11
Viper

Posts: 672
Joined: Fri Mar 30, 2007 4:23 am

Return to General Discussion

### Who is online

Users browsing this forum: No registered users and 0 guests