The Ultimate Biostatistics Formula Sheet for Life Sciences
Biostatistics is the backbone of modern biological research, clinical trials, and epidemiology. Whether you are analyzing genetic variations, measuring the efficacy of a new drug, or tracking population ecology, understanding the right mathematical models is crucial. Bookmark this comprehensive, formal reference guide for your academic and research needs.
1. Descriptive Statistics & Data Dispersion
Before jumping into complex models, we must summarize our raw data. Descriptive statistics help researchers understand the central tendency and the variability (spread) of their biological samples.
| Metric | Formula | Biological Application |
|---|---|---|
| Sample Mean (x̄) | Σx / n | Calculates the arithmetic average of collected biological data points (e.g., average plant height). |
| Sample Variance (s²) | Σ(x - x̄)² / (n - 1) | Measures average squared deviation from the mean. The (n-1) is Bessel's correction to estimate population variance accurately. |
| Standard Deviation (s) | √s² | Represents the standard amount of dispersion within a sample. A low SD means data points are clustered near the mean. |
| Standard Error (SEM) | s / √n | Estimates how precise the sample mean is relative to the true population mean. Crucial for plotting error bars on graphs. |
| Coefficient of Variation (CV) | (s / x̄) × 100% | Standardized measure of dispersion. Used to compare variability between vastly different measurements (e.g., cell weight vs. organ weight). |
📝 Exemplar: Enzymatic Assay Analysis
Problem: A biochemist runs 4 trials to measure enzyme activity (in IU/L): 120, 130, 115, 135. Calculate the Mean, Standard Deviation, and Standard Error.
- Mean (x̄): (120+130+115+135) / 4 = 125 IU/L
- Variance (s²): [(120-125)² + (130-125)² + (115-125)² + (135-125)²] / 3 = 250 / 3 = 83.33
- Standard Deviation (s): √83.33 = 9.13
- Standard Error (SEM): 9.13 / √4 = 4.56
2. Probability Distributions in Biology
Biological events are rarely absolute; they are probabilistic. These models help predict the likelihood of discrete and continuous biological occurrences.
| Distribution Type | Formula | Context |
|---|---|---|
| Binomial Probability P(X) | [n! / x!(n-x)!] × p× × (1-p)ⁿ⁻× | Calculates the chance of exactly x successes in n trials. Heavily used in Mendelian genetics to predict offspring phenotypes. |
| Poisson Probability P(X) | (λ× × e⁻λ) / x! | Predicts rare events occurring in a fixed interval of time or space, such as the number of mutations on a DNA strand or bacterial colonies on a Petri dish. |
| Z-Score (Normal Distribution) | Z = (x - μ) / σ | Standardizes a continuous variable. Tells you how many standard deviations a raw score (x) is from the population mean (μ). |
3. Inferential Statistics & Hypothesis Testing
To publish research, scientists must prove that their results are not due to random chance. This requires inferential statistics, calculating p-values, and establishing significance.
| Statistical Test | Formula | Application |
|---|---|---|
| Independent t-test | t = (x̄₁ - x̄₂) / √[(s²₁/n₁) + (s²₂/n₂)] | Compares the means of two completely independent groups (e.g., Treatment Group vs. Placebo Group). |
| Paired t-test | t = d̄ / (s_d / √n) | Compares means from the same group at different times (e.g., Blood pressure before and after taking a drug). d̄ is the mean difference. |
| ANOVA (F-ratio) | F = MSG / MSE | Used when comparing 3 or more groups simultaneously. Compares Mean Square between groups (MSG) to Mean Square Error within groups (MSE). |
| Chi-Square Test (χ²) | Σ [(O - E)² / E] | Tests for categorical data. Checks if Observed frequencies (O) significantly deviate from Expected frequencies (E). |
4. Correlation and Regression
In life sciences, variables often impact one another. Regression helps us model these relationships, such as how ambient temperature affects the metabolic rate of reptiles.
| Metric | Formula | Application |
|---|---|---|
| Pearson Correlation Coefficient (r) | r = Σ((x-x̄)(y-ȳ)) / √(Σ(x-x̄)²Σ(y-ȳ)²) | Quantifies the strength and direction (-1 to +1) of a linear relationship between two continuous variables. |
| Linear Regression Line | y = β° + β₁x | Predicts the value of a dependent variable (y) based on an independent variable (x). β° is the intercept, β₁ is the slope. |
5. Population Genetics
These formulas are the foundation of evolutionary biology, helping researchers track changes in gene pools over generations.
| Principle | Formula | Application |
|---|---|---|
| Allele Frequency | p + q = 1 | p represents the frequency of the dominant allele, q represents the frequency of the recessive allele. |
| Hardy-Weinberg Equilibrium | p² + 2pq + q² = 1 | Predicts the genotype frequencies in a population that is not evolving. (p² = homozygous dominant, 2pq = heterozygous, q² = homozygous recessive). |
📝 Exemplar: Hardy-Weinberg Allele Tracking
Problem: In a forest, a population of 1000 moths has 160 individuals showing a recessive white phenotype. Calculate the dominant allele frequency (p).
- q² (Frequency of recessive phenotype) = 160 / 1000 = 0.16
- q (Recessive allele frequency) = √0.16 = 0.4
- p (Dominant allele frequency) = 1 - 0.4 = 0.6
- Bonus: Heterozygote frequency (2pq) = 2 × 0.6 × 0.4 = 0.48 (480 moths).
6. Clinical Epidemiology & Diagnostic Metrics
In medical research, biostatistics is used to evaluate the accuracy of diagnostic tests (like PCR assays for viruses) and measure disease risks in populations.
| Epidemiological Metric | Formula | Clinical Context |
|---|---|---|
| Sensitivity (True Positive Rate) | TP / (TP + FN) | The ability of a test to correctly identify patients WITH the disease. High sensitivity rules out disease if negative. |
| Specificity (True Negative Rate) | TN / (TN + FP) | The ability of a test to correctly identify healthy patients. High specificity rules in disease if positive. |
| Positive Predictive Value (PPV) | TP / (TP + FP) | If a patient tests positive, what is the mathematical probability they actually have the disease? |
| Relative Risk (RR) | [a/(a+b)] / [c/(c+d)] | Used in Cohort Studies. Compares the risk of developing a disease in an exposed group versus an unexposed group. |
| Odds Ratio (OR) | (a × d) / (b × c) | Used in Case-Control Studies. Estimates the odds of prior exposure among sick patients compared to healthy controls. |
📝 Exemplar: COVID-19 Rapid Test Accuracy
Problem: A new rapid test is trialed on 200 people. 100 actually have the virus, 100 do not. The test correctly identifies 90 infected people (True Positives) but misses 10 (False Negatives). Calculate Sensitivity.
- Formula: Sensitivity = TP / (TP + FN)
- Calculation: 90 / (90 + 10) = 90 / 100 = 0.90
- Answer: The test has a 90% Sensitivity.