viernes, 30 de mayo de 2025

Preliminary analysis of the impact of lab results on large language model generated differential diagnoses

https://pubmed.ncbi.nlm.nih.gov/40102561/ An AHRQ-funded study published in NPJ Digital Medicine found that including lab results significantly improved the accuracy of differential diagnoses generated by large language models. Researchers tested five models—GPT-4, GPT-3.5, Claude-2, Llama-2-70b, and Mixtral-8x7B—using 50 clinical vignettes based on real patient cases. Each model generated a list of possible diagnoses with and without lab data. Adding lab results improved diagnostic accuracy by up to 30 percent across models. GPT-4 performed the best, achieving 55 percent Top-1 accuracy and 79 percent lenient accuracy. The models correctly interpreted common lab tests such as liver function and toxicology panels. These findings underscore the potential of large language models as supplemental diagnostic tools and the importance of structured clinical data in AI-driven decision support.

No hay comentarios: