Don't give me any of that "95 percent" crap!

In this dialogue from the TV show Monk, the characters are discussing the certainty of a suspect being the real killer. Captain Stottlemeyer wants to be sure, while Monk claims he is certain one hundred percent, but it means 95%. This conversation may sound nonsense, but it highlights the concept of probability and confidence intervals in medical research pretty well.



Captain Stottlemeyer: Monk... are you sure? I mean, are you really sure? And don't give me any of that "95 percent" crap.

Adrian Monk: Captain, I am one hundred percent sure... that she probably killed him.

Captain Stottlemeyer: What does that mean?

Adrian Monk: 95 percent.


He is certain one hundred percent, which means 95% 😕 

In the context of medical research, the 95% probability that Monk mentions represents a confidence interval. A confidence interval is a range within which a particular parameter, such as the efficacy of a treatment, is likely to fall. It provides an estimate of the uncertainty around the true value of the parameter. When researchers say they are 95% confident, it means that if the study were repeated 100 times, the results would fall within the confidence interval 95 times out of 100.

In the dialogue, Monk's statement suggests that he is 95% certain that the suspect is the killer, which implies that there is a 5% chance of being wrong. This is similar to how medical researchers present their findings; they acknowledge the possibility of error and do not claim absolute certainty. By using a 95% confidence interval, researchers convey the level of confidence they have in their results while also recognizing the limitations of their study and the inherent variability in the data.


p-Values and Confidence 

Confidence intervals provide a range of values within which a population parameter is likely to fall, while p-values help researchers determine the statistical significance of a hypothesis. Both concepts play important roles in quantifying uncertainty and making inferences in research studies, but they address different aspects of the research questions and should be interpreted accordingly.

In the TV show Monk, let's consider a hypothetical scenario where the characters are trying to determine the likelihood that a particular suspect is the real killer, based on the available evidence. We can illustrate the differences between confidence intervals and p-values using this example.

P-Values:

Definition: A p-value is the probability of obtaining a test statistic as extreme as, or more extreme than, the one observed in a study, assuming that the null hypothesis is true.

Purpose: P-values are used to test the significance of a hypothesis, typically in the context of null hypothesis significance testing (NHST). They help researchers determine whether there is sufficient evidence to reject the null hypothesis in favor of the alternative hypothesis.

Interpretation: A small p-value (e.g., less than 0.05) indicates that the observed results are unlikely to have occurred by chance alone, assuming the null hypothesis is true. In this case, researchers may reject the null hypothesis and conclude that there is a statistically significant effect or relationship. A large p-value suggests that the observed results could have occurred by chance and that there is insufficient evidence to reject the null hypothesis.

Now, let's say the investigators establish a null hypothesis stating that the suspect is not the killer (i.e., the suspect's profile score is not significantly different from the average score of innocent individuals). They calculate a p-value to test this hypothesis.

If the calculated p-value is less than 0.05 (a commonly used threshold for statistical significance), it would indicate that the observed profile score for the suspect is unlikely to have occurred by chance alone, assuming the null hypothesis is true. In this case, the investigators may reject the null hypothesis and conclude that there is a statistically significant difference between the suspect's profile score and the average score of innocent individuals.

Wilke J, Vogt L, Niederer D, et al. Short-term effects of acupuncture and stretching on myofascial trigger point pain of the neck: a blinded, placebo-controlled RCT. Complement Ther Med. 2014;22(5):835-841. 

Results: Both acupuncture as well as acupuncture plus stretching increased MPT by five, respectively, 11 percent post treatment. However, only acupuncture in combination with stretching was superior to placebo (p<0.05). There were no significant differences between interventions at 15 and 30 min post treatment. VAS did not differ between treatments at any measurement. Five minutes after application of acupuncture plus stretching, ROM was significantly increased in the frontal and the transversal plane compared to placebo (p<0.05).

Liu J, Wang A, Nie G, Wang X, Huang J. Zhongguo Zhen Acupuncture for female depression: a randomized controlled tria Jiu. 2018;38(4):375-378. 

The improvements of SDS and MADRS scores in the observation group before and after treatment were better than those in the control group (both P<0.05).

However, the p-value doesn't tell us anything about the size or importance of the effect - that's where effect size comes into play. This means that a smaller p-value doesn't mean the effect of a treatment or relationship between variables is stronger or better. It merely indicates that the result is less likely to have occurred by chance, assuming the null hypothesis is true.  Therefore, a small p-value should not be interpreted as indicating a better treatment effect. (Even with the same treatment effect, the p-value becomes smaller as the number of subjects increases)"

Confidence Intervals:

The phrase "95% confidence interval from A to B" means that if we repeat an experiment 100 times, 95 times out of those 100, the average would fall within the range of A to B. Therefore, the narrower the range (from A to B), the more reliable the result is. Conversely, a wider range would imply less reliability.

Confidence Interval - Meaning, Statistics, Calculation, CI of 95

If you conduct an experiment 100 times, and each time you calculate a confidence interval from the data of that experiment, then a 95% confidence interval means that 95 of those 100 calculated intervals would include the true population mean (or other parameter you are measuring).

The range of the confidence interval (from A to B) reflects the precision of our estimate. A narrower interval suggests a higher level of precision, meaning we have more confidence in our estimate, whereas a wider interval would indicate less precision and therefore less confidence.

When the range includes extreme values, like 0 or 1, it might indicate high uncertainty or high variability in the data. For instance, if you are examining a proportion or a correlation coefficient, the inclusion of 1 or 0 in the confidence interval could potentially undermine the reliability of the findings because these values could represent non-effect (in case of 0) or perfect effect (in case of 1), which are rare or extreme conditions in real-world data. So, when the CI for the rate ratio includes 1, the p-value is generally larger than 0.05. This indicates that the difference is not statistically significant, meaning that the difference observed could easily have occurred by chance alone, and we don't have strong evidence to suggest that there's a meaningful difference between the rates in the two groups being compared.


Hinman RS, McCrory P, Pirotta M, et al. Acupuncture for Chronic Knee Pain: A Randomized Clinical Trial. JAMA. 2014;312(13):1313–1322. doi:10.1001/jama.2014.12660

Compared with control, needle and laser acupuncture resulted in modest improvements in pain (−1.1; 95% CI, −1.8 to −0.4, and −0.8; 95% CI, −1.5 to −0.1, respectively) at 12 weeks, but not at 1 year. Conclusions and Relevance  In patients older than 50 years with moderate or severe chronic knee pain, neither laser nor needle acupuncture conferred benefit over sham for pain or function. Our findings do not support acupuncture for these patients.

MacPherson H, Tilbrook H, Agbedjro D, Buckley H, Hewitt C, Frost C. Acupuncture for irritable bowel syndrome: 2-year follow-up of a randomised controlled trial. Acupunct Med. 2017;35(1):17-23. 

Results: The overall response rate was 61%. The adjusted difference in mean IBS SSS at 24 months was -18.28 (95% CI -40.95 to 4.40) in favour of the acupuncture arm. Differences at earlier time points estimated from the multivariate model were: -27.27 (-47.69 to -6.86) at 3 months; -23.69 (-45.17 to -2.21) at 6 months; -24.09 (-45.59 to -2.59) at 9 months; and -23.06 (-44.52 to -1.59) at 12 months.

Conclusions: There were no statistically significant differences between the acupuncture and usual care groups in IBS SSS at 24 months post-randomisation, and the point estimate for the mean difference was approximately 80% of the size of the statistically significant results seen at 6, 9 and 12 months.

Andrew J. Vickers, Acupuncture for Chronic Pain, Individual Patient Data Meta-analysis

the effect sizes in comparison to no- acupuncture controls were 0.55 (95% CI, 0.51-0.58), 0.57 (95% CI, 0.50-0.64), and 0.42 (95% CI, 0.37-0.46) SDs. 

Conclusions: Acupuncture is effective for the treat- ment of chronic pain and is therefore a reasonable refer- ral option. Significant differences between true and sham acupuncture indicate that acupuncture is more than a pla- cebo. 



The phrase "95% confidence interval from a to b" means that if we repeat an experiment 100 times, 95 times out of those 100, the average would fall within the range of a to b. It can tell the scale of confidence, not statistical difference in dichotomous way.

The phrase "p-value is <0.05" means the probability of this event occurring by chance is less than 5%. It can tell the statistical difference in dichotomous way, but not the scale of confidence.

In summary, the confidence interval in this example provides a range of plausible profile scores for the suspect, reflecting the uncertainty in the estimate based on the available evidence. On the other hand, the p-value helps the investigators determine the statistical significance of the suspect's profile score, allowing them to make inferences about whether the suspect is likely to be the real killer or not.


Why mr.Monk says he is sure 95%, not 85%?

Scientists often use 95% confidence intervals rather than 80% or 75% because these higher percentages strike a balance between the precision of the estimate and the level of uncertainty. A higher confidence level, such as 95%, indicates that the results are more likely to be accurate and reliable. Using a lower confidence level, like 80% or 75%, would mean that the range within which the true value lies is broader, and the likelihood of the results being accurate is lower.

It is important to note that the choice of a specific confidence level is somewhat arbitrary and depends on the context of the research and the level of certainty required. In many scientific fields, a 95% confidence level has become the standard, primarily because it offers a good balance between precision and uncertainty. However, in some situations, researchers might opt for a 90% or even a 99% confidence level, depending on the level of risk and the potential consequences associated with the decisions made based on the research findings.

In summary, scientists generally use 95% confidence intervals because they provide a balance between precision and uncertainty. Higher confidence levels indicate a higher likelihood of the results being accurate and reliable, which is crucial for making informed decisions based on scientific research.

Monk's seemingly nonsensical statement reflects the concept of probability and confidence intervals used in medical research. It highlights the importance of understanding that even when researchers express high confidence in their findings, there is always a degree of uncertainty and a margin for error.