Statistics for the Biology IA
For your II, you should always follow at least a 5x5 rules: having 5 IV conditions and 5 trials of each.
Once you have your data, you will normally want to put it in a raw data table like the ones below. Note that no math has been done anywhere in these tables, and that all units are in the headings.
Once you have your data, you will normally want to put it in a raw data table like the ones below. Note that no math has been done anywhere in these tables, and that all units are in the headings.
Next, you will want to process your data. Make sure to include at least one example for all the math you do, even if it is simple or if you are using excel.
You will then present your data in a processed data table. This table should have the IV and DV as its main axes, and should thus be the place you can find the answer to your RQ.
Finally, you will graph your processed data. Notice that the graph below has axes labels with units, in addition to standard deviation error bars and a line of best fit with an r-coefficient.
Mean/Average
WHEN IS IT USED?
The arithmetic mean is used when you want to determine the central tendency of a group of numbers (the 'middle' of all the numbers). It is useful for processing different trials of an experiment and, given enough trials, can help overcome random error in the data.
The arithmetic mean is used when you want to determine the central tendency of a group of numbers (the 'middle' of all the numbers). It is useful for processing different trials of an experiment and, given enough trials, can help overcome random error in the data.
BY HAND
The mean is calculated by adding all the values in a set, and dividing by the number of values. For example:
The mean is calculated by adding all the values in a set, and dividing by the number of values. For example:
USING EXCEL
In excel, type =average( into a cell, then highlight the values you would like to have averaged. Hit Enter and you are done!
In excel, type =average( into a cell, then highlight the values you would like to have averaged. Hit Enter and you are done!
Range
WHEN IS IT USED?
The range is used to determine how spread out your data is, and is used to help determine the reliability of one's data.
Thus, a large range suggests that your data may not be reliable while a small range suggests more reliable results. Remember that 'large' and 'small' are subjective terms, and that the size of the range needs to be compared to the value of your data. For example, a range of "1" would be small if your minimum and maximum were 99 and 101, but be huge if they were 0.5 and 1.5.
The range is used to determine how spread out your data is, and is used to help determine the reliability of one's data.
Thus, a large range suggests that your data may not be reliable while a small range suggests more reliable results. Remember that 'large' and 'small' are subjective terms, and that the size of the range needs to be compared to the value of your data. For example, a range of "1" would be small if your minimum and maximum were 99 and 101, but be huge if they were 0.5 and 1.5.
BY HAND
Subtract the biggest number in your data set from the smallest number. For example:
Subtract the biggest number in your data set from the smallest number. For example:
EXCEL
In excel, type =max( into a cell, then highlight all the data in a set/trial. Hit Enter to get the maximum number in your set.
In excel, type =min( into a cell, then highlight all the data in a set/trial. Hit Enter to get the minimum number in your set.
Subtract the minimum number from the maximum number.
In excel, type =max( into a cell, then highlight all the data in a set/trial. Hit Enter to get the maximum number in your set.
In excel, type =min( into a cell, then highlight all the data in a set/trial. Hit Enter to get the minimum number in your set.
Subtract the minimum number from the maximum number.
Standard deviation
WHEN IS IT USED?
The standard deviation is used to measure the spread of data around the mean. This is often a more accurate indicator of the reliability of your data than the range, and shows you how clustered together or far apart your data points were.
The standard deviation is used to measure the spread of data around the mean. This is often a more accurate indicator of the reliability of your data than the range, and shows you how clustered together or far apart your data points were.
USING EXCEL
In excel, type =stdev( into a cell, then highlight the values you would like to have averaged. Hit Enter and you are done!
In excel, type =stdev( into a cell, then highlight the values you would like to have averaged. Hit Enter and you are done!
Line of best fit
WHEN IS IT USED?
A line of best fit is used to visually determine the relationship between two variables on a graph. Thus, this should only be used when the data is continuous.
A line of best fit is used to visually determine the relationship between two variables on a graph. Thus, this should only be used when the data is continuous.
BY HAND
On your line or scatter-plot graph, simply draw a straight line, using a ruler, that passes as closely through all your data points as possible. For example:
On your line or scatter-plot graph, simply draw a straight line, using a ruler, that passes as closely through all your data points as possible. For example:
USING EXCEL
After making your graph in Excel, right-click on any data point in the graph and click on Add Trendline. Then, select Linear and click Close.
After making your graph in Excel, right-click on any data point in the graph and click on Add Trendline. Then, select Linear and click Close.
r-Coefficient
WHEN IS IT USED?
The r-coefficient tells you how well you data matches the line of best fit. This number will range from -1 to 0 to 1, and stands for a percentage. If your positive line of best fit goes through every point in your graph, your r-coefficent will be 1, meaning a 100% match. A -1 is also a 100% fit, but for a line of best fit with a negative slope.
Your r-coefficient is great to talk about in your conclusion and evaluation as a measure of how well your data fits a prediction. Generally, statisticians only consider two variables correlated if the r-coefficient is at least +/-0.5. More than +/-0.7 implies a fairly strong correlation. Do not expect your correlations to always be 0.99, as real data usually has errors.
Try this surprisingly fun game of correlation to see if you understood: http://guessthecorrelation.com/
The r-coefficient tells you how well you data matches the line of best fit. This number will range from -1 to 0 to 1, and stands for a percentage. If your positive line of best fit goes through every point in your graph, your r-coefficent will be 1, meaning a 100% match. A -1 is also a 100% fit, but for a line of best fit with a negative slope.
Your r-coefficient is great to talk about in your conclusion and evaluation as a measure of how well your data fits a prediction. Generally, statisticians only consider two variables correlated if the r-coefficient is at least +/-0.5. More than +/-0.7 implies a fairly strong correlation. Do not expect your correlations to always be 0.99, as real data usually has errors.
Try this surprisingly fun game of correlation to see if you understood: http://guessthecorrelation.com/
USING EXCEL
See the video on how to make a graph below.
See the video on how to make a graph below.
t-test
WHEN IS IT USED?
The t-test is used to determine whether two different data sets are significantly different. For example, one could determine whether boys or girls were statistically taller based on collected data from your class. When using the t-test, it is important to state the decision rule in your data processing, which should look like this:
The t-test is used to determine whether two different data sets are significantly different. For example, one could determine whether boys or girls were statistically taller based on collected data from your class. When using the t-test, it is important to state the decision rule in your data processing, which should look like this:
- If p > 0.05 then the two data sets are not statistically different. There is no significant difference between the two data sets.
- If p < 0.05 then the two data sets are statistically different. There is a significant difference between the two data sets.
USING EXCEL
In excel, type =ttest( into a cell, then highlight the two sets of data, seperated by a comma. Then type , 1, 1) and hit Enter. Thus, your equation should look like this: =ttest("first data set", "second data set", 1, 1).
In excel, type =ttest( into a cell, then highlight the two sets of data, seperated by a comma. Then type , 1, 1) and hit Enter. Thus, your equation should look like this: =ttest("first data set", "second data set", 1, 1).
ONLINE COPY AND PASTE
http://www.physics.csbsju.edu/stats/t-test_bulk_form.html
http://www.physics.csbsju.edu/stats/t-test_bulk_form.html
ANOVA test
WHEN IS IT USED?
The Anova is used to determine whether multiple different data sets are significantly different. This should only be used when the data is continuous. For example, one could determine whether age had any effect on height after collecting this data from 12, 13, 14 and 15-year-olds. When using the Anova test, it is important to state the decision rule in your data processing, which should look like this:
The Anova is used to determine whether multiple different data sets are significantly different. This should only be used when the data is continuous. For example, one could determine whether age had any effect on height after collecting this data from 12, 13, 14 and 15-year-olds. When using the Anova test, it is important to state the decision rule in your data processing, which should look like this:
- If p > 0.05 then the data sets are not statistically different. There is no significant difference between the data sets.
- If p < 0.05 then the data sets are statistically different. There is a significant difference between the data sets.
ONLINE COPY AND PASTE
http://www.physics.csbsju.edu/stats/anova_pnp_NGROUP_form.html
For the data above, the value of p < 0.0001, so there is a significant difference. This means that the mass of sucrose in water does affect the volume of carbon dioxide gas produced after 10 minutes.
http://www.physics.csbsju.edu/stats/anova_pnp_NGROUP_form.html
For the data above, the value of p < 0.0001, so there is a significant difference. This means that the mass of sucrose in water does affect the volume of carbon dioxide gas produced after 10 minutes.
Uncertainties
Uncertainties are a measure of the accuracy of your data and it is important to include with all your measurements.
UNCERTAINTIES IN RAW DATA
When collecting raw data from an analog source, the uncertainty is half of the smallest unit.
On the thermometer above, the uncertainty would be +/- 0.5 oC
|
When collecting raw data from a digital source, the uncertainty is exactly the smallest unit.
On the digital scale below, the uncertainty would be +/- 0.1 g
|
UNCERTAINTIES WHEN ADDING OR SUBTRACTING MEASUREMENTS
When adding or subtracting data, simply add the uncertainties.
For example: (36.3 g. +/- 0.1 g.) + (82.2 g. +/- 0.1 g.) = 118.5 g. +/- 0.2 g
When adding or subtracting data, simply add the uncertainties.
For example: (36.3 g. +/- 0.1 g.) + (82.2 g. +/- 0.1 g.) = 118.5 g. +/- 0.2 g
CONVERTING UNCERTAINTIES INTO PERCENTAGES
Converting numerical uncertainties into percentages is important for any further mathematics with uncertainties. To do this, simply divide the uncertainty by the measured value and then multiply by 100%.
For example: 25.2 g. +/- 0.1 g.
(0.1 / 25.2) * 100% = 0.4%
So, 25.2 g. +/- 0.4%
Converting numerical uncertainties into percentages is important for any further mathematics with uncertainties. To do this, simply divide the uncertainty by the measured value and then multiply by 100%.
For example: 25.2 g. +/- 0.1 g.
(0.1 / 25.2) * 100% = 0.4%
So, 25.2 g. +/- 0.4%
UNCERTAINTIES WHEN MULTIPLYING OR DIVIDING MEASUREMENTS
When multiplying or dividing data, simply convert the relevant uncertainties to percentages and then add the percentages. You can then convert the percentage back into a numerical uncertainty.
For example: (36.3 g. +/- 0.1 g.) / (82.2 g. +/- 0.1 g.)
= (36.3 / 82.2) +/- ( (0.1/36.3)*100% + (0.1/82.2)*100%) )
= 0.44 g. +/- ( 0.3% + 0.1% )
= 0.44 g. +/- 0.4%
= 0.44 g. +/- (0.44 * 0.4/100)
= 0.44 g. +/- 0.002 g
When multiplying or dividing data, simply convert the relevant uncertainties to percentages and then add the percentages. You can then convert the percentage back into a numerical uncertainty.
For example: (36.3 g. +/- 0.1 g.) / (82.2 g. +/- 0.1 g.)
= (36.3 / 82.2) +/- ( (0.1/36.3)*100% + (0.1/82.2)*100%) )
= 0.44 g. +/- ( 0.3% + 0.1% )
= 0.44 g. +/- 0.4%
= 0.44 g. +/- (0.44 * 0.4/100)
= 0.44 g. +/- 0.002 g
UNCERTAINTIES WHEN CALCULATING THE AVERAGE
The uncertainty of your average will be the same as that of the raw data it was taken from. If your raw data has varying uncertainties, use percentage instead (see above).
The uncertainty of your average will be the same as that of the raw data it was taken from. If your raw data has varying uncertainties, use percentage instead (see above).