HomeSearchPaper Details

Large language models encode clinical knowledge

Karan Singhal, Shekoofeh Azizi, Tao Tu, S. Sara Mahdavi, Jason Lee, Hyung Won Chung, Nathan Scales, Ajay Kumar Tanwani, Heather Cole-Lewis, Stephen Pfohl, Perry W. Payne, Martin Seneviratne, Paul Gamble, Christopher Kelly, Abubakr Babiker...

Nature Portfolio (2023) • Volume 620, Issue 7972, Pages 172-180

Method-ToolMethod-ToolPDF AvailableGrade Eligible⚠️ Moderate Risk Flags

Overall Assessment

Adequate Methodological Quality

Assessment created by PaperScorers Medical AI v0.1.0 on Dec 14, 2025

C
60/100

Key Takeaways

  • Introduces MultiMedQA and HealthSearchQA with a human evaluation framework.
  • Flan-PaLM sets SOTA on MedQA, MedMCQA, PubMedQA (Fig.2).
  • Instruction prompt tuning (Med-PaLM) markedly reduces harm/bias vs Flan-PaLM (Fig.4).
  • Selective prediction shows uncertainty tracks accuracy (Fig.3).
  • Model/weights not released; reproducibility limited.

Conclusion

Robust methods and novel contributions, but transparency curtailed by no code/weights; promising yet not deployment-ready.

Quick Actions

Read Full Paper

Quality Dimensions

Integrity & Transparency

Premise

Literature Positioning

Study Provenance

Methodological Assessment

Abstract

Study Overview

Publication Details

External Resources

Disclaimer: This assessment is generated by AI and should not be the sole basis for clinical or research decisions. Always review the original paper and consult with domain experts.


Suggested Papers