Chatbot responses suggest that hypothetical biology questions are harder than realistic ones

Gregory J. Crowther; Usha Sankar; Leena S. Knight; Deborah L. Myers; Kevin T. Patton; Lekelia D. Jenkins; Thomas A. Knight

doi:10.1128/jmbe.00153-23

Chatbot responses suggest that hypothetical biology questions are harder than realistic ones

Gregory J. Crowther, Usha Sankar, Leena S. Knight, Deborah L. Myers, Kevin T. Patton, Lekelia D. Jenkins, Thomas A. Knight

Research output: Contribution to journal › Article › peer-review

Abstract

The biology education literature includes compelling assertions that unfamiliar problems are especially useful for revealing students’ true understanding of biology. However, there is only limited evidence that such novel problems have different cognitive requirements than more familiar problems. Here, we sought additional evidence by using chatbots based on large language models as models of biology students. For human physiology and cell biology, we developed sets of realistic and hypothetical problems matched to the same lesson learning objectives (LLOs). Problems were considered hypothetical if (i) known biological entities (molecules and organs) were given atypical or counterfactual properties (redefinition) or (ii) fictitious biological entities were introduced (invention). Several chatbots scored significantly worse on hypothetical problems than on realistic problems, with scores declining by an average of 13%. Among hypothetical questions, redefinition questions appeared especially difficult, with many chatbots scoring as if guessing randomly. These results suggest that, for a given LLO, hypothetical problems may have different cognitive demands than realistic problems and may more accurately reveal students’ ability to apply biology core concepts to diverse contexts. The Test Question Templates (TQT) framework, which explicitly connects LLOs with examples of assessment questions, can help educators generate problems that are challenging (due to their novelty), yet fair (due to their alignment with pre-specified LLOs). Finally, ChatGPT’s rapid improvement toward expert-level answers suggests that future educators cannot reasonably expect to ignore or outwit chatbots but must do what we can to make assessments fair and equitable.

Original language	English (US)
Journal	Journal of Microbiology and Biology Education
Volume	24
Issue number	3
DOIs	https://doi.org/10.1128/jmbe.00153-23
State	Published - Dec 2023
Externally published	Yes

Keywords

artificial intelligence (AI)
Bloom’s taxonomy
cheating
exams
Google Bard
HOCS/LOCS
summative assessment
YouChat

ASJC Scopus subject areas

Education
General Biochemistry, Genetics and Molecular Biology
General Immunology and Microbiology
General Agricultural and Biological Sciences

Access to Document

10.1128/jmbe.00153-23

Cite this

@article{77e3a9213d934429b6c6f680f5799918,

title = "Chatbot responses suggest that hypothetical biology questions are harder than realistic ones",

abstract = "The biology education literature includes compelling assertions that unfamiliar problems are especially useful for revealing students{\textquoteright} true understanding of biology. However, there is only limited evidence that such novel problems have different cognitive requirements than more familiar problems. Here, we sought additional evidence by using chatbots based on large language models as models of biology students. For human physiology and cell biology, we developed sets of realistic and hypothetical problems matched to the same lesson learning objectives (LLOs). Problems were considered hypothetical if (i) known biological entities (molecules and organs) were given atypical or counterfactual properties (redefinition) or (ii) fictitious biological entities were introduced (invention). Several chatbots scored significantly worse on hypothetical problems than on realistic problems, with scores declining by an average of 13%. Among hypothetical questions, redefinition questions appeared especially difficult, with many chatbots scoring as if guessing randomly. These results suggest that, for a given LLO, hypothetical problems may have different cognitive demands than realistic problems and may more accurately reveal students{\textquoteright} ability to apply biology core concepts to diverse contexts. The Test Question Templates (TQT) framework, which explicitly connects LLOs with examples of assessment questions, can help educators generate problems that are challenging (due to their novelty), yet fair (due to their alignment with pre-specified LLOs). Finally, ChatGPT{\textquoteright}s rapid improvement toward expert-level answers suggests that future educators cannot reasonably expect to ignore or outwit chatbots but must do what we can to make assessments fair and equitable.",

keywords = "artificial intelligence (AI), Bloom{\textquoteright}s taxonomy, cheating, exams, Google Bard, HOCS/LOCS, summative assessment, YouChat",

author = "Crowther, {Gregory J.} and Usha Sankar and Knight, {Leena S.} and Myers, {Deborah L.} and Patton, {Kevin T.} and Jenkins, {Lekelia D.} and Knight, {Thomas A.}",

note = "Publisher Copyright: Copyright {\textcopyright} 2023 Crowther et al.",

year = "2023",

month = dec,

doi = "10.1128/jmbe.00153-23",

language = "English (US)",

volume = "24",

journal = "Journal of Microbiology and Biology Education",

issn = "1935-7877",

publisher = "American Society for Microbiology",

number = "3",

}

TY - JOUR

T1 - Chatbot responses suggest that hypothetical biology questions are harder than realistic ones

AU - Crowther, Gregory J.

AU - Sankar, Usha

AU - Knight, Leena S.

AU - Myers, Deborah L.

AU - Patton, Kevin T.

AU - Jenkins, Lekelia D.

AU - Knight, Thomas A.

PY - 2023/12

Y1 - 2023/12

N2 - The biology education literature includes compelling assertions that unfamiliar problems are especially useful for revealing students’ true understanding of biology. However, there is only limited evidence that such novel problems have different cognitive requirements than more familiar problems. Here, we sought additional evidence by using chatbots based on large language models as models of biology students. For human physiology and cell biology, we developed sets of realistic and hypothetical problems matched to the same lesson learning objectives (LLOs). Problems were considered hypothetical if (i) known biological entities (molecules and organs) were given atypical or counterfactual properties (redefinition) or (ii) fictitious biological entities were introduced (invention). Several chatbots scored significantly worse on hypothetical problems than on realistic problems, with scores declining by an average of 13%. Among hypothetical questions, redefinition questions appeared especially difficult, with many chatbots scoring as if guessing randomly. These results suggest that, for a given LLO, hypothetical problems may have different cognitive demands than realistic problems and may more accurately reveal students’ ability to apply biology core concepts to diverse contexts. The Test Question Templates (TQT) framework, which explicitly connects LLOs with examples of assessment questions, can help educators generate problems that are challenging (due to their novelty), yet fair (due to their alignment with pre-specified LLOs). Finally, ChatGPT’s rapid improvement toward expert-level answers suggests that future educators cannot reasonably expect to ignore or outwit chatbots but must do what we can to make assessments fair and equitable.

AB - The biology education literature includes compelling assertions that unfamiliar problems are especially useful for revealing students’ true understanding of biology. However, there is only limited evidence that such novel problems have different cognitive requirements than more familiar problems. Here, we sought additional evidence by using chatbots based on large language models as models of biology students. For human physiology and cell biology, we developed sets of realistic and hypothetical problems matched to the same lesson learning objectives (LLOs). Problems were considered hypothetical if (i) known biological entities (molecules and organs) were given atypical or counterfactual properties (redefinition) or (ii) fictitious biological entities were introduced (invention). Several chatbots scored significantly worse on hypothetical problems than on realistic problems, with scores declining by an average of 13%. Among hypothetical questions, redefinition questions appeared especially difficult, with many chatbots scoring as if guessing randomly. These results suggest that, for a given LLO, hypothetical problems may have different cognitive demands than realistic problems and may more accurately reveal students’ ability to apply biology core concepts to diverse contexts. The Test Question Templates (TQT) framework, which explicitly connects LLOs with examples of assessment questions, can help educators generate problems that are challenging (due to their novelty), yet fair (due to their alignment with pre-specified LLOs). Finally, ChatGPT’s rapid improvement toward expert-level answers suggests that future educators cannot reasonably expect to ignore or outwit chatbots but must do what we can to make assessments fair and equitable.

KW - artificial intelligence (AI)

KW - Bloom’s taxonomy

KW - cheating

KW - exams

KW - Google Bard

KW - HOCS/LOCS

KW - summative assessment

KW - YouChat

UR - http://www.scopus.com/inward/record.url?scp=85181822748&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85181822748&partnerID=8YFLogxK

U2 - 10.1128/jmbe.00153-23

DO - 10.1128/jmbe.00153-23

M3 - Article

AN - SCOPUS:85181822748

SN - 1935-7877

VL - 24

JO - Journal of Microbiology and Biology Education

JF - Journal of Microbiology and Biology Education

IS - 3

ER -

Chatbot responses suggest that hypothetical biology questions are harder than realistic ones

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this