Everyone Highlights
In every university library on the planet, students drag fluorescent markers across textbook pages, reread their notes before exams, and write summaries of chapters they have already read once. It all feels productive in the moment. Pages look worked over, and the brain registers a warm glow of recognition that masquerades as understanding. Recognition is not learning. Never has been.
In 2013, five researchers at four major universities published a review of 10 commonly recommended learning techniques in Psychological Science in the Public Interest, a journal reserved for research with broad societal implications. Each technique was assigned a utility rating based on the accumulated evidence. Five of the most popular strategies received the lowest possible rating. Only two earned top marks, and neither is widely taught in schools.
Ten Techniques, Four Dimensions
John Dunlosky of Kent State University led the evaluation alongside Katherine Rawson, Elizabeth Marsh at Duke, Mitchell Nathan at Wisconsin-Madison, and Daniel Willingham at Virginia. A technique earned high utility only if its benefits generalized across four dimensions: learning conditions (both laboratory and real classrooms), student characteristics (all ages, all ability levels), materials (text, mathematics, vocabulary, scientific concepts), and criterion tasks (recall, comprehension, transfer to novel problems). Meeting two of those four dimensions was insufficient to earn the top tier.
Highlighting occasionally helped simple recall in controlled lab settings but failed to transfer to comprehension tasks and showed no benefit across longer retention intervals. Rereading produced marginally better short-term recall than a single read, but that small advantage disappeared within days. Both amount to busy work dressed up as studying. Summarization required substantial training to execute well, and even trained summarizers showed inconsistent gains across different materials and task types. Keyword mnemonics and imagery for text also fell into the low-utility category. Elaborative interrogation, self-explanation, and interleaved practice earned moderate ratings: promising under specific conditions, but lacking the broad generalizability for the top tier.
What Actually Works
Practice testing and distributed practice were the only strategies rated high utility, and nothing else came close.
Practice testing means quizzing yourself on material rather than reviewing it passively. Students who took practice tests recalled substantially more a week later than students who spent equivalent time rereading, because retrieval forces the brain to reconstruct a memory trace rather than merely recognize one. Christopher Rowland's 2014 meta-analysis in Psychological Bulletin synthesized 159 experiments comparing testing to restudy and found a medium effect size of approximately d = 0.50. A separate 2017 meta-analysis of 72 classroom-based studies by Adesope and colleagues confirmed that figure at d = 0.56. Recognizing something on a page is a fundamentally different cognitive operation from producing it from memory, and that difference explains why passive review leaves students confident but unprepared.
Distributed practice means spreading study across time rather than cramming. Cepeda, Pashler, Vul, Wixted, and Rohrer synthesized 839 observations from 317 experiments in 2006 and found consistent retention gains from spacing sessions apart. For an exam one week away, a one-day gap between sessions is effective; for a test 30 days out, gaps of 10 to 14 days work better. In one representative experiment, students who divided vocabulary study into two sessions separated by 24 hours recalled 35% more words after a month than students who studied the same material for the same total time in a single block.
How Big Is the Waste?
Karpicke, Butler, and Roediger surveyed college students in 2009 and found that 84% listed rereading as their primary study strategy. Only 11% reported self-testing, meaning almost nobody does the thing that works best. Education textbooks used to train teachers barely mention either practice testing or distributed practice, which means the two most effective techniques identified by four decades of research have been almost entirely excluded from teacher preparation curricula.
If 84% of roughly 21 million American college students rely primarily on rereading, and the performance gap between rereading and practice testing is approximately d = 0.50, the aggregate cost is enormous: a half-standard-deviation improvement in exam performance translates to roughly 19 percentile points for a student at the median, and at the scale of 17.6 million students that unrealized potential shows up as avoidable course failures, extended time to degree, and billions in tuition spent on study hours that produce minimal lasting retention.
Strongest Counterargument
A 2022 meta-analysis by Miyatsu, Nguyen, and McDaniel reexamined 36 experiments specifically on highlighting and found a meaningful positive effect on memory recall, with a pooled d = 0.36. When instructors pre-marked the important passages rather than leaving selection to students, the effect rose to d = 0.44 and extended to comprehension. Dunlosky's blanket low-utility verdict may have been overly severe for highlighting used as a targeted curation tool rather than an indiscriminate habit, and several subsequent classroom studies that pair highlighting with active retrieval at a later session have found meaningful gains that the original 2013 review did not separately model.
Fair enough, but limited. Even the most charitable reading puts highlighting's benefit at roughly one-third the magnitude of practice testing's, and highlighting still shows no reliable gains on deep learning tasks: applying concepts to novel problems, generating arguments, diagnosing cases. Better than nothing? Yes. Close to optimal? Not remotely.
What We Didn't Prove
Dunlosky's review is a narrative synthesis, not a preregistered experiment, and the utility ratings were expert judgments rather than outputs of a formal meta-analytic algorithm with prespecified inclusion criteria. Different researchers weighing the same body of evidence could reach different conclusions. Practice testing has been studied overwhelmingly in Western, English-speaking university populations using artificial materials like word lists and paired associates, so whether identical effect sizes hold in K-12 classrooms in developing countries with different pedagogical traditions remains unknown. Self-testing also requires a baseline level of content knowledge that complete beginners may lack, which could limit its utility for truly novel material encountered for the first time.
The Bottom Line
Most students default to strategies that produce minimal lasting learning. Two methods backed by hundreds of experiments feel counterintuitively difficult, and that difficulty is the mechanism, not the failure mode. Students interpret retrieval effort as evidence that the method is failing, when the struggle to produce an answer from memory is precisely what builds durable knowledge.
What You Can Do
After finishing a chapter, close the book and write down everything you can recall; the gap between what you thought you knew and what you could actually produce is where learning happens. Space sessions across days rather than compressing them into a single marathon block, because two one-hour sessions separated by three days will outperform one continuous two-hour session almost every time. Use flashcard apps that implement spaced repetition algorithms automatically. Treat difficulty as a positive signal rather than a warning. If you teach, introduce five minutes of retrieval questions at the start of every lecture. And if you cannot stop highlighting, use it strictly as a selection tool for later self-testing, not as a learning activity in its own right.