Large language models encode clinical knowledge
Nature Portfolio (2023) • Volume 620, Issue 7972, Pages 172-180
Overall Assessment
Adequate Methodological Quality
Assessment created by PaperScorers Medical AI v0.1.0 on Dec 14, 2025
Key Takeaways
- •Introduces MultiMedQA and HealthSearchQA with a human evaluation framework.
- •Flan-PaLM sets SOTA on MedQA, MedMCQA, PubMedQA (Fig.2).
- •Instruction prompt tuning (Med-PaLM) markedly reduces harm/bias vs Flan-PaLM (Fig.4).
- •Selective prediction shows uncertainty tracks accuracy (Fig.3).
- •Model/weights not released; reproducibility limited.
Conclusion
Robust methods and novel contributions, but transparency curtailed by no code/weights; promising yet not deployment-ready.
Quick Actions
Quality Dimensions
Integrity & Transparency
Premise
Literature Positioning
Study Provenance
Methodological Assessment
Abstract
Quick Actions
Study Overview
Publication Details
External Resources
Disclaimer: This assessment is generated by AI and should not be the sole basis for clinical or research decisions. Always review the original paper and consult with domain experts.
Suggested Papers
From Our Blog
Meta-Analysis: The Study of Studies
One study is an anecdote. Ten studies are data. A meta-analysis combines them all to find the truth.
Ecological Fallacy: The Group is Not the Person
Countries that eat more chocolate win more Nobel Prizes. Does chocolate make you smart? No. This is the Ecological Fallacy.
Lead Time Bias: The Illusion of Survival
Screening finds cancer earlier. It does not always make you live longer. It just makes you sick longer.