Study: AI-Assisted Tutoring Boosts Studentsâ Math Skills
<¶¶Òő¶ÌÊÓÆ” class="subtitle">First randomized controlled trial of AI system that helps human tutors offers a middle ground in whatâs become a polarized debate.¶¶Òő¶ÌÊÓÆ”>
Get stories like this delivered straight to your inbox. Sign up for The 74 Newsletter
An AI-powered digital tutoring assistant designed by Stanford University researchers shows modest promise at improving studentsâ short-term performance in math, suggesting that the best use of artificial intelligence in virtual tutoring for now might be in supporting, not supplanting, human instructors.
The open-source tool, which researchers say other educators can recreate and integrate into their tutoring systems, made the human tutors slightly more effective. And the weakest tutors became nearly as effective as their more highly-rated peers, according to a study .
The tool, dubbed Tutor CoPilot, prompts tutors to think more deeply about their interactions with students, offering different ways to explain concepts to those who get a problem wrong. It also suggests hints or different questions to ask.
The new study offers a middle ground in whatâs become a polarized debate between supporters and detractors of AI tutoring. Itâs also the first randomized controlled trial â the gold standard in research â to examine a human-AI system in live tutoring. In all, about 1,000 students got help from about 900 tutors, and students who worked with AI-assisted tutors were four percentage points more likely to master the topic after a given session than those in a control group whose tutors didnât work with AI.
Students working with lower-rated tutors saw their performance jump more than twice as much, by nine percentage points. In all, their pass rate went from 56% to 65%, nearly matching the 66% pass rate for students with higher-rated tutors.
The cost to run it: Just $20 per student per year â an estimate of what it costs Stanford to maintain accounts on Open AIâs GPT-4 large language model.
The study didnât probe studentsâ overall math skills or directly tie the tutoring results to standardized test scores, but Rose E. Wang, the project’s lead researcher, said higher pass rates on the post-tutoring âmini testsâ correlate strongly with better results on end-of-year tests like state math assessments.Â
The big dream is to be able to enhance humans.
Rose E. Wang, Stanford University
Wang said the studyâs key insight was looking at reasoning patterns that good teachers engage in and translating them into âunder the hoodâ instructions that tutors can use to help students think more deeply and solve problems themselves.Â
âIf you prompt ChatGPT, ‘Hey, help me solve this problem,’ it will typically just give away the answer, which is not at all what we had seen teachers do when we were showing them real examples of struggling students,â she said.

Essentially, the researchers prompted GPT-4 to behave like an experienced teacher and generate hints, explanations and questions for tutors to try out on students. By querying the AI, Wang said, tutors have âreal-timeâ access to helpful strategies that move students forward.
âAt any time when I’m struggling as a tutor, I can request help,â Wang said.
She said the system as tested is ânot perfectâ and doesnât yet emulate the work of experienced teachers. While tutors generally found it helpful â particularly its ability to provide âwell-phrased explanations,â clarify difficult topics and break down complex concepts on the spot â in a few cases, tutors said the toolâs suggestions didnât align with studentsâ grade levels.
A common complaint among tutors was that Tutor CoPilotâs responses were sometimes âtoo smart,â requiring them to simplify and adapt for clarity.
âBut it is much better than what would have otherwise been there,â Wang said, âwhich was nothing.â
Researchers analyzed more than half a million messages generated during sessions, finding that tutors who had access to the AI tool were more likely to ask helpful questions and less eager to simply give students answers, two practices aligned with high-quality teaching.
Amanda Bickerstaff, co-founder and CEO of , said she was pleased to see a well-designed study on the topic focused on economically disadvantaged students, minority students, and English language learners.
She also noted the benefits to low-rated tutors, saying other industries like consulting are already using generative AI to close skills gaps. As the technology advances, Bickerstaff said, most of its benefit will be in tasks like problem solving and explanations.
Susanna Loeb, executive director of Stanfordâs National Student Support Accelerator and one of the reportâs authors, said the idea of using AI to augment tutorsâ talents, not replace them, seems a smart use of the technology for the time being. âWho knows? Maybe AI will get better,â she said. âWe just don’t think it’s quite there yet.â
Maybe AI will get better. We just don't think it's quite there yet.
Susanna Loeb, Stanford University
At the moment, there are lots of essential jobs in fields like tutoring, health care and the like where practitioners âhaven’t had years of education â and they don’t go to regular professional development,â she said. This approach, which offers a simple interface and immediate feedback, could be useful in those situations.
âThe big dream,â said Wang, âis to be able to enhance the human.â
Benjamin Riley, a frequent AI-in-education skeptic who leads the AI-focused think tank and writes a on the topic, applauded the study’s rigorous design, an approach he said prompts âeffortful thinking on the part of the student.â
âIf you are an inexperienced or less-effective tutor, having something that reminds you of these practices â and then you actually employ those actions with your students â that’s good,â he said. âIf this holds up in other use cases, then I think you’ve got some real potential here.â
Riley sounded a note of caution about the toolâs actual cost. It may cost Stanford just $20 per student to run the AI, but he noted that tutors received up to three weeks of training to use it. âI don’t think you can exclude those costs from the analysis. And from what I can tell, this was based on a pretty thoughtful approach to the training.â
He also said studentsâ modest overall math gains raises the question, beyond the efficacy of the AI, of whether a large tutoring intervention like this has âmeaningful impactsâ on student learning.
Similarly, Dan Meyer, who writes a on education and technology and co-hosts a on teaching math, noted that the gains âdon’t seem massive, but they’re positive and at fairly low cost.â
He said the Stanford developers âseem to understand the ways tutors work and the demands on their time and attention.â The new tool, he said, seems to save them from spending a lot of effort to get useful feedback and suggestions for students.
Stanfordâs Loeb said the AIâs best use is determining what a student knows and needs to know. But people are better at caring, motivating and engaging â and celebrating successes. âAll people who have been tutors know that that is a key part about what makes tutoring effective. And this kind of approach allows both to happen.â
Get stories like these delivered straight to your inbox. Sign up for The 74 Newsletter