Evaluating Guideline Adherence in Gemini-Powered Dental Trauma Workflows: Standalone Gemini Chat vs. Document-Grounded NotebookLM

Aim: The aim of this study was to compare the accuracy and inter-account consistency of two Google Gemini–powered, user-facing workflows for dental trauma decision support: standalone Gemini chat and NotebookLM, a document-grounded work-flow that generates responses grounded in uploaded European Society of Endodontology and International Association of DentalTraumatology guideline documents, when answering dichotomous (yes/no) clinical questions on the management of trauma-tized permanent teeth.Methodology: A cross-sectional simulation was conducted using 99 dichotomous (yes/no) questions derived from the EuropeanSociety of Endodontology and International Association of Dental Traumatology guidelines. Three academic endodontists sub-mitted each question to Gemini and NotebookLM using three independent Google accounts, generating 297 responses per work-flow. Accuracy was defined as exact agreement with guideline-based answers, and consistency as the proportion of identicalresponses across the three trials. Statistical analyses included Wald and Wilson 95% confidence intervals, Fleiss' kappa for inter-account agreement, and Pearson's chi-squared tests to compare proportions.Results: Gemini demonstrated an overall accuracy of 83.83% (95% CI: 75.08–90.47) and a consistency of 74.74% (κ = 0.84).NotebookLM showed higher accuracy (92.93%; 95% CI: 85.97–97.11) and perfect consistency (100%; κ = 1.00). While the dif-ference in accuracy did not reach statistical significance (p = 0.076), NotebookLM exhibited significantly greater consistency(p < 0.001).Conclusions: The responses generated from the guidelines were highly consistent with both workflows. Document groundingmay enhance repeatability and alignment with guideline-derived decision points for structured dichotomous inquiries, as evi-denced by NotebookLM's ability to achieve complete inter-account consistency and to quantitatively increase accuracy. Theseresults are the outcome of workflow-level benchmarking; therefore, clinical utility cannot be inferred solely from them; profes-sional oversight and additional validation remain necessary before any clinical application.

Tipus de document

Article

Versió del document

Versió publicada

Llengua

Anglès

Matèries (CDU)

616.3 - Patologia de l'aparell digestiu. Odontologia

Paraules clau

Decision-making

Dental trauma

Google Gemini

Large language models

NotebookLM

Retrieval-augmented generation

Presa de decisions

Trauma dental

Models de llenguatge grans

Generació augmentada per recuperació

Toma de decisiones

Traumatismos dentales

Modelos de lenguaje grandes

Generación aumentada por recuperación

Pàgines

Publicat per

Wiley

Col·lecció

Publicat a

Dental Traumatology

Citació recomanada

Dufey-Portilla, Nicolás; Abella Sans, Francesc; Duran-Sindreu, Fernando[et al.]. Evaluating Guideline Adherence in Gemini-Powered Dental Trauma Workflows: Standalone Gemini Chat vs. Document-Grounded NotebookLM. Dental Traumatology, 2026, 0, páginas 1-9. Disponible en <https://onlinelibrary.wiley.com/doi/10.1111/edt.70065>. Fecha de acceso: 5 mar. 2026. DOI: https://doi.org/10.1111/edt.70065

Nota

The author, N. Dufey-Portilla, thanks the National Agency for Researchand Development (ANID) for its support through the DOCTORADOBECAS CHILE/2025 - 72250040 Scholarship Program.

Mostra el registre complet de l'element

Aquest element apareix en la col·lecció o col·leccions següent(s)

Odontologia [350]

Drets

This is an open access article under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs License, which permits use and distribution in anymedium, provided the original work is properly cited, the use is non- commercial and no modifications or adaptations are made.© 2026 The Author(s). Dental Traumatology published by John Wiley & Sons Ltd.

Excepte que s'indiqui una altra cosa, la llicència de l'ítem es descriu com https://creativecommons.org/licenses/by-nc-nd/4.0/