Memory systems and problem-solving practice
Added 2023-04-01 05:08:53 +0000 UTCOne problem with most discussion around memory systems is: the real goal isn’t to remember answers on flashcards; it’s to expand your capacity to think and act in the world. Sure, your app says you can remember this set of cards for months. But what does that mean in terms of what you can do, thoughts you can think? The connection is far from clear. If our goal is to produce real-world capacity, rather than rote recall, how should memory systems be used, designed, redefined? I’ve wanted to dig into these questions for years, but I’ve found it quite tough to establish an effective experimental context. Happily, this month, I’ve been able to watch closely as a student struggles to transfer knowledge from his memory system practice to complex problem solving.
I’ve been acting as a “personal learning assistant” for “Alex”, an adult learner studying physics in service of a meaningful project. We talk every day or two about problems and progress in his learning journey; I listen in on his tutoring sessions; I coach him on handling challenges which arise; and I prototype interventions which might help. This month, Alex has been studying electrostatics from a classic textbook by Young and Freedman. I’ve written memory prompts to reinforce the content—about 70 for each chapter we’ve worked through, covering the material on declarative, procedural, and conceptual levels.
Now here’s the trouble. After Alex read each chapter and completed a few rounds of memory practice, he still found the book’s exercises very difficult. I’m sure the memory practice helped: he’s able to recall long equations and definitions from memory when solving problems. Yet detailed memory practice wasn’t enough, on its own, to let him solve complex practice problems independently. He got stuck and made significant errors.
Then, after many hours of problem-solving, Alex found that the answers came more easily. Something important happened when solving those exercises, above and beyond what occurred when he read the textbook and did detailed memory practice. It certainly wasn’t cheap: I’d guess he spent at least four times as long solving problems as he did reading and reviewing the text.
On some level, this isn’t surprising. Sure, of course, you can’t just read about a topic. You can’t even just answer lots of questions about a topic. You have to do a topic. Fine. But what’s happening, cognitively, during that problem set? Can we cause those changes more effectively or efficiently, e.g. through some kind of targeted practice? What are the implications for topics which don’t come ready-made with problem sets? And: how can we ensure that whatever insights are acquired during this period are retained, like the other material reinforced through memory practice?
Transfer-appropriate processing
For one set of answers, we can look to a theory called “transfer-appropriate processing”, which suggests that our ability to remember information depends in part on how well the processing involved in encoding matches what's involved in retrieval. It's not necessarily enough to just practice recalling some information: practice should require processing that information in the same way you expect to be processing it when you want to use it later.
I saw something like this firsthand with Alex. He could fluidly explain to me how electric fields relate to electric forces, but he struggled to apply that knowledge in a problem where he needed to find the force that a given field would exert on a given charge. Once I demonstrated how he could do that, he quickly saw the connection to the conceptual explanation he’d just given. But that connection was a separate thing from the explanation itself—not one and the same as the explanation.
Cognitive psychologist Garrett O’Day recently ran a relevant series of experiments. He aimed to explore the impact of transfer-appropriate processing on retrieval practice, in the context of complex problem-solving. In one experiment, he gave undergraduates a brief tutorial on Poisson processes. The “practice” group was tested on recalling the procedural steps to solving a kind of Poisson process problem. The “control” group spent the same amount of time reading more worked examples. Then they were both asked to solve problems like the ones they’d been studying. This is the sort of situation which would normally produce a “testing effect”—active retrieval usually produces better performance than passive re-reading. But it didn’t. The groups performed about the same—poorly. O’Day ran a follow-up experiment in which the “practice” group repeatedly solved similar practice problems with feedback, rather than just recalling the procedural steps. This time, on a post-test one week later, the practice group substantially prevailed.
Studying worked examples wasn’t enough; rehearsing the procedural steps wasn’t enough. Both of those groups performed poorly on a post-test. O’Day’s experiments suggest that to become good at solving problems, you need to practice solving problems, ideally with feedback. Theoretically, transfer-appropriate processing suggests that you don’t need to solve problems, per-se; you just need to practice doing something which involves a similar kind of cognitive processing. I’m not immediately sure what that would be. Maybe it would be enough to set up a problem or to take a single “step” in a problem-solving sequence. I’m not yet aware of any experiments along those lines.
Schema acquisition: building problem-solving flexibility
O’Day’s findings highlight another challenge. When test problems were similar to practice problems, practice produced good performance. But when the test problems required small changes to the procedure, performance fell. The gains of practice didn’t transfer

Cognitive psychologists Yeo and Fazio observed a similar result across several experiments using similar materials. But they observed something else of interest: in an experiment where students struggled with the practice problems (< 50% correct), students were better off studying worked examples, rather than solving practice problems. Then, when the materials were changed in later experiments to produce better problem-solving performance during the learning phase, the testing effect returned.
Yeo and Fazio suggest that we’re really observing multiple interacting processes. On the one hand, students need to move what they’re learning into long-term memory and build fluency. Practice and testing support that goal. But to tackle the transfer problems on the test, students also need to generalize what they’re learning. The literature calls this “schema induction”. The claim is that when experts solve problems, they lean heavily on schemas, structures “that permit problem solvers to categorize a problem as one which allows certain moves for solution”. To build flexible problem-solving capacity, you need to acquire these schemas. That’s often done through induction—noticing patterns across a set of problems, noticing that a particular set of moves seems to help.
Another cognitive psychologist, John Sweller, suggests that problem-solving practice poses an important trade-off. Difficult problems may create stronger memory encodings, but they also tax your working memory, creating “cognitive load”. You may end up with little remaining capacity for noticing patterns in problem structure, “acquiring schemas”.
In Yeo and Fazio’s first experiment, the problems were quite difficult, and students were better off studying worked examples than solving practice problems. The authors suggest that’s because students experienced less cognitive load when reading worked examples than when solving problems. And so those readers would have spare capacity for schema acquisition—though the penalty was that they forget what they learned more quickly.
In their second experiment, Yeo and Fazio made the practice problems easier by making the problems’ surface features identical (only the numbers were changed). That helped: students who practiced solving problems were better off than students who studied example problems. But their transfer performance was poor, which makes sense because they practiced a bunch of identical problems. Students built durable memory of what was involved in the problems they practiced, but they didn’t acquire general schemas.
So, in a final experiment, Yeo and Fazio made the practice problems easier, but varied their surface structure. Students’ transfer performance improved enough that it no longer exhibited a statistically significant difference with performance on identical problems. The authors suggest that this experiment’s practice problems were varied enough to generalize over, and the cognitive load was light enough that schema acquisition was possible. Unfortunately, their experiments weren’t really designed to test this particular hypothesis, so we’re left making cross-experiment comparisons for now.
The rough implication here matches common sense. Schema acquisition and memory are at least somewhat independent. You can notice patterns, but fail to remember them durably; you can reinforce your memory of problem-solving, but in a brittle fashion which won’t transfer to unfamiliar problems. Cognitive psychologists often talk of desirable difficulties—productive struggle which induces more complex processing—but students do face a real possibility of undesirable difficulties. If you want both flexibility and durable fluency, then you should practice, but along a gentle slope. You don't want to overload your working memory so much that your mind can’t reorganize its representations of the relevant concepts. And, when you’re getting started, that will often mean that you’re better off carefully studying a worked example, rather than solving a problem yourself. To put it too reductively (because memory is of course involved in building flexibility), you want to be able to solve problems in the first place, and then you want to build fluency so that you can solve them for the long term.
None of the papers I’ve mentioned has anything to say about maintenance—that is, keeping this kind of transferable problem-solving performance durably accessible for a long time. My guess would be that once these flexible schemas are acquired, you could reinforce them through ordinary distributed retrieval practice. As we discussed in the last section, you’d want to actually solve problems, rather than just retrieve the procedure. Maybe you could set up problems with “friendly numbers” so that they could always be readily solved in your head. And to reinforce those flexible schemas, you’d want the problem to vary each time. One could probably use a language model to generate pretty good variations of that kind, but it’s probably not necessary to have a truly bottomless pool: if practice is widely distributed, you could probably get away with practicing a small handful of structural variations, particularly if the numbers were randomly generated.
It’s worth noting that at this point, we’re approaching the territory of intelligent tutoring systems (ITS), another branch of educational technology research. Heavily inspired by the same theory of cognitive load, these systems are laser-focused on finding the smoothest problem-solving slope to any desired destination. They’re usually less concerned with flexibility and durability as goals, but it’s interesting to consider how memory systems might be adapted with ITS techniques, or how ITS might be adapted to support flexibility and durability.
Incomplete absorption
In these past two sections, I’ve begun to address some of what’s happening, cognitively, during problem sets, and how we might make some of that activity more effective. But there’s another simple explanation for some of the struggle I’ve been seeing: Alex hadn’t actually understood parts of what the textbook was saying, and he hadn’t noticed.
Unfortunately, the exercises didn’t clearly reveal those holes. Problems relying on missing pieces just felt confusing and difficult, in a diffuse and undirected way. But in think-aloud video of his memory practice, quite a few questions elicited remarks like “I remember that the answer is X, but I don’t see why,” or “I’m confused that the answer is X.” Just as an example, one question was: “Why does the electric flux through a box containing a proton not change if you double the size of the box on each side?” There are many valid ways to answer that question, but the problem here was that no answer really made sense to Alex.
With this question and the others like it, I wasn’t trying to help Alex generate new understandings or acquire general schemas. These questions are really just intended to be straightforward memory reinforcement. They directly recapitulate some important explanation from the text, with similar wording. If Alex can’t produce an answer, but the solution immediately makes sense once he reads it, then more retrieval practice will probably help. On the other hand, if the answer doesn’t make sense, rote repetition isn’t the right fix. If he doesn’t understand why the answer is what it is, then he doesn’t understand the concept we’re trying to reinforce. There’s not much point in just memorizing the answer.
This is an important distinction to understand, and I fear it’s one that memory system designers consistently ignore: the appropriate intervention for a “wrong” answer will vary enormously with the nature of the prompt. If you’ve forgotten something you’re supposed to memorize, then it’s probably fine to review it again until it sticks. If you never understood the answer in the first place, then you need some very different intervention. And there are other meaningful categories. If it’s a generative task (“hum a melody in pentatonic minor”), and you find it too difficult, you might want to browse some examples, or tone down the difficulty of a task (“hum a final bar for this pentatonic minor melody”). If it’s a problem-solving task, you might want to read a worked solution, then add focused prompts about important patterns or procedural steps. Or, ITS-style, you might want to read an explanation of the misconception which your answer suggests, and then queue up some simpler problems which focus on that confusion.
In Alex’s case, I think the most appropriate next step is to simply reread the relevant passage from the textbook. I don’t think the book’s explanations were too confusing; probably his attention just lapsed during those passages. (I think this happens to everyone, more or less constantly, but we just don’t notice.) I’ve linked each prompt directly to a source location, so re-reading is relatively easy mechanically, but it’s more difficult practically. It feels bad to interrupt the smooth flow of a review session to go read a textbook. Also, Alex prefers to review on his phone, and it’d be pretty awkward to pull up textbook pages on that tiny screen. So it’s probably best to flag these questions and to establish a process to work through that queue during study sessions. If the book’s explanation still feels confusing, then the right next step is probably to discuss it together. Or, for some concepts, it may be better to just set it aside for a while and see if it makes sense later.
In Quantum Country, I think our design helped with this sort of situation. You first review every prompt in the context of reading the book, rather than in the context of a memory review session. So if you notice that you’re confused by some answer, it’s much less disruptive to scroll back up a little and re-read. And because the reviews occur every few minutes of reading, the relevant passage won’t be too far away. Readers we interviewed often remarked that they felt an unusual sort of confidence while reading, since they knew the interleaved reviews would ensure they absorbed everything they were “supposed to”.
Why didn’t we use a design like Quantum Country’s for Alex’s physics review? At first, it was because he was reading a physical copy of the book! He later switched to a digital edition. I’ve implemented a PDF reader with integrated memory prompts, but PDF rendering is obnoxious enough that I haven’t yet managed to integrate the interstitial reviews. Right now, prompts are presented in the margins, like last summer’s prototype:

In this discussion, confusing memory prompts play an unusual but important role. Instead of doing the job of reinforcing long-term memory, they act as backstops. They make sure that if you didn’t understand that part of the text, or if you weren’t paying attention, you’ll notice! I think this is quite valuable. The usual alternative is that you try to solve exercises and end up confused or stuck because you’re missing some important conceptual understanding. As I mentioned earlier, the trouble there is that it’s not clear what you’re missing. Even reading a worked solution may not make the missing concept nearly so obvious as the simple prompts we’ve been discussing.
Learning in conversation
In the introduction to his Lectures, Feynman writes:
Problems give a good opportunity to fill out the material of the lectures and make more realistic, more complete, and more settled in the mind the ideas that have been exposed.
I think, however, that there isn’t any solution to this problem of education other than to realize that the best teaching can be done only when there is a direct individual relationship between a student and a good teacher—a situation in which the student discusses the ideas, thinks about the things, and talks about the things. It’s impossible to learn very much by simply sitting in a lecture, or even by simply doing problems that are assigned.
Alex and I have met roughly once a week to discuss the material and to solve problems together. I’m not sure how true Feynman’s claim is, but Alex has told me that these sessions have felt extremely valuable. I don’t think these sessions replace the role of a memory system: we get more out of our time together because we don’t have to review the fundamentals, and he’s not looking things up all the time. And the memory system ensures that much of whatever progress we make will stick.
But I can imagine that in an ideal world, Alex would have me (or a more effective tutor) by his side during 100% of his problem solving time. This isn’t cheating; I’m not making his work easier by giving the answers away. I let him struggle, and when he’s stuck, I ask probing questions which might unblock progress. I can supply raw information where it’s needed. If I notice a misconception, I can ask a question which might create some revealing conversation. I’ll call attention to patterns or connections he might have missed.
It’s hard not to start wondering: could a large language model do some or all of this? I don’t think I’m doing anything terribly special, though I’m certainly drawing on my subject matter experience in both physics and education. A sufficiently good model could also help with the question of how to handle topics which don’t come with ready-made problem sets and discussion questions. There are now six zillion “AI tutor” startups in flight, but none I’ve seen have yet felt like they’re on the right track. I don’t have a clear sense of what my complaint is: too instructional, too interested in “right answers”, not interested enough in discussion and sense-making.
There is one element of our collaborative problem solving sessions which I’d guess these systems would have trouble providing: social energy. Alex is motivated and disciplined, but like all of us, his gumption ebbs and flows. On some days, it can be hard to muster the energy to pick up another chapter. But he’s observed that when we’re working together, it’s much easier to get and stay excited about the learning process. I’ve certainly had that experience when collaborating with others on… well, basically everything!
I’ll close by noting that while I’ve introduced a lot of problems in this essay, I’m thrilled about what I’m learning here. This feels like a basically ideal context for my research. Alex is highly motivated to deeply understand. There are strong pressures on that understanding. He’s struggling, so there’s real opportunity for augmentation. I’m getting an intimate view of his learning process, ample opportunity to intervene, and tight feedback loops on anything I try. It’s exhilarating! Now I just have to deliver.
Comments
Eric, thank you so much for this very detailed answer and all the example screenshots. I spent a lot of time with this, and I learned a great deal! It's extremely instructive to see how people use these systems for larger and more complicated tasks.
Andy Matuschak
2023-04-22 23:40:02 +0000 UTC@Andy: here is some more detail which I think gets at your "encoding" question! I often process them fairly lightly, but sometimes end up making several cards for one practice problem. I've got some examples of the "lightly processed" variety of card posted here (scroll down for math): https://imgur.com/a/7dpMHhc *Selecting* what to learn is probably the most significant part of my encoding process. Most domains have a lot of basic properties that you are expected to know by heart when solving problems—I like to create cards to prove a lot of those properties. If a worked example (or proof in the wild) uses a property I don't know about (say, the tail-sum-formula for probability theory, or an infinite sum I am uncomfortable with), I stop and learn those first with their own cards. In this sense, creating cards from worked examples works a little bit like vocabulary-building: I start by making sure each step makes sense and filling any holes in my knowledge at that level, before moving on to how the full proof works. —— But then, in the simple case, I state the theorem/problem to be solved on the front (possibly re-writing it in my own words if the original is complicated). And then the proof/solution simply goes on the back. Before I have worked through it myself, I might start with a a photo/screenshot of the solution. Once I've solved it "in my own words," though, I replace the textbook solution with a mixture of LaTeX & MathJax that I feel clearly expresses (my variation of) the solution. This works well for straightforward problems such as "find the velocity of a particle moving vertically at earth's surface if wind resistance is proportional to speed" (which boils down to posing and solving a differential equation). I find that I will often tweak the solution text months later, as my understanding of a problem improves. Often there is a gap in my knowledge I don't realize at first (as you pointed out in the OP!)—but after the 4th or 5th time of doing a problem I will realize it myself, or else have learned more from my readings so I can go back and improve a solution (or correct it, in the occasional case that I really bungled and memorized something incorrect!). —— For more complex cases—which either have a great many steps, or casually use concepts that I am not very familiar with (so my brain can't easily "chunk" the solution into intuitive pieces), I sometimes follow a strategy of 1. breaking a card that has a long solution into several cards for each part, and 2. still keeping the original large card (which pulls all the pieces together). This is often the strategy I need to use SRS to learn proofs I encounter "in the wild." It gives me some pressure toward the "minimum information principle"—practice cards are more complex than flash cards, and synthesize more, but smaller is still better IMO. Example: I recently read a classic paper that proved that estimation of distribution algorithms (a kind of optimization method) converges with various kinds of selection operators. I wanted to learn one of those proofs: but since the sub-domain was new to me mathematically, my practice cards looked something like: Q. In the mathematical theory of EDAs, what expression is used to formalize truncation selection for an infinite population model? A. [big hairy equation here] <— this one is just a declarative recall card, but I consider it a "practice card" when the answer is long enough that doing it with pen and paper is much easier than doing it verbally Q. When proving the convergence of an EDA under truncation selection, how do you show that the expected fitness at t + 1 is always less than at t (E(t+1) < E(t))? A. [proof of that property here] Q. When proving the convergence of an EDA under truncation selection, how do you show convergence once you know that E(t+1) < E(t)? So I end up with four cards for one proof. —— The main advantage here is that a) I get most of my practice through smaller cards, so my deck doesn't become unbearable to review, and 2) the scheduler will be better at having me focus on the parts of proofs that are *hard for me,* rather than busywork (repeating full proofs). Incidentally, this same strategy of breaking long exercises into pieces, but also practicing the full ensemble from time to time, is also how I approach piano repertoire practicing with Anki (ex. "Play the Moonlight Sonata 1st Movement" becomes "Play the Moonlight... part A, the C#m introduction," "... starting from part C, the fun A♭ run"). I've been pretty happy with that strategy for piano as well (having to know how to start a piece from multiple points in its middle is useful too!).
Eric 'Siggy'
2023-04-10 13:01:42 +0000 UTCYes, expertise reversal is another implication from the CLT theory family! In this context, it would suggest that you'd want to adjust the worked example <> problem-solving over time according to expertise.
Andy Matuschak
2023-04-10 03:51:57 +0000 UTCThank you for sharing! Can you say anything about how you encode the worked examples from your readings?
Andy Matuschak
2023-04-10 03:50:49 +0000 UTCTo clarify, I was referring to 25 or 50 within a *subject* (like “chess,“ or “optics”) rather than a single concept (which rarely deserves more than a liberal handful of cards). Mastering a subject (at least with SRS as the primary learning strategy) ultimately requires hundreds of cards at least, so 25 is a drop in the bucket. For my process, I distinguish sharply between “flash cards” and “practice cards.” I configure the latter to roughly follow a Leitner walk, and (unlike flash cards) I very often bury them: because practice involves procedural (not just declarative) memory, repeating a card I got wrong (or just found difficult) for several days in a row seems beneficial (especially for, say, piano practice, but also for math proofs). Creating practice cards is easy, because I can just harvest exercises from a textbook (or from the wild, for more advanced material). Keeping up with the reviews is a different story. Not only are the cards more time consuming, but they usually require special tools or conditions: ex. “Pencil and paper to do a physics problem,” “quiet space to recite a poem/read a passage in Spanish,” or “a martial arts dummy to demonstrate a technique.” This means they don’t satisfy the traditional Anki paradigm of “do your reviews while standing in line at the grocery store,” and I have to be intentional about keeping up with them to avoid a backlog. I still make flash cards for anything they are a good fit for, since flash cards are always easier to maintain.
Eric 'Siggy'
2023-04-04 03:10:45 +0000 UTCReally interesting! The Yeo and Fazio paper is awesome, the finding of a 'switch' for novices and experts vividly reminded me of the Expertise Reversal Effect from Kalyuga and Sweller (2003). I hadn't seen the transfer-appropriate processing research before, it seems really relevant! Having a tight feedback loop and tons of points of observation for generating and testing your model sounds awesome, I'm so glad to hear you have this process!
Robert Cobb
2023-04-04 02:14:21 +0000 UTCI'm curious about your approach to SRS problem practice: how easily does it scale? Writing 25+ cards for a single concept is daunting.
2023-04-03 23:48:38 +0000 UTCI appreciate you sharing this summary and analysis of your experience and findings. It's interesting how much of this reminds me of the reasons I enjoy being a software engineer as I find it to be an almost endless source of challenge, learning, and feedback loops with an ever-changing set of schemas to keep the challenge alive. Your discussion about learning in conversation also reminds me a lot of the practice of rubber ducking and pairing with other software engineers which is a practice I normally employ a few times per day when I get stuck and the challenge is too much.
Adam Zolyak
2023-04-01 15:13:16 +0000 UTCOoh, what a relevant review & discussion! The hard distinction the studies you summarize make between "practicing problems" and "reading worked examples" surprises me. As one who does use SRS to schedule problem practice, I find that my favorite way to study is to hoover up all the "worked examples" in a chapter into Anki, and then practice solving them along the SRS schedule. My hunch is that it comes back to the over-emphasis our educational culture places on "inventing" or "finding" the solution for yourself, instead of using memory: in fact, over long intervals, IMO recalling a solution you first read (or saw a teacher demonstrate) is a great way to cement problem-solving ability and a genuine understanding of the problem. Particularly since at longer intervals, you necessarily rely less on rote memorization and more on conceptual understanding. > "it’s probably not necessary to have a truly bottomless pool: if practice is widely distributed, you could probably get away with practicing a small handful of structural variations." This has definitely been my experience with math/physics, chess exercises, and go exercises (tsumego). A bank of just 5 or 10 exercises suffers from overly recognizable prompts that induce silly pattern recognition ("oh, if the queen is in the bottom-left, that mean's there's a checkmate in the top-right"; "if there's a differential equation, it can always be solved with an integrating factor")---just as memory-system skeptics would fear. But I find that this problem vanishes very quickly as the pool grows. By 25 or 50 problems (even just loosely from the same domain), I almost never have the problem of "remembering the right solution for the wrong reasons." This can be accelerated by adding problems that are designed to be similar, but differ in a critical way. For example, a chess position that differs by just one piece (such that the solution totally changes); or a go problem that is unsolvable (so the answer is "nothing can be done: the group is dead!"). The latter are hard to find, though, as most books focus on solvable problems! How well all of this prepares me for *far* transfer I can't say. But it has given me pleasant progress with self-study! > "one that memory system designers consistently ignore: the appropriate intervention for a “wrong” answer will vary enormously with the nature of the prompt" Love this point. It definitely applies to "simpler" domains like language learning too. I often say that if I get grammar wrong on a card, but I don't understand *why* I got it wrong, then that means it's a bad/difficult card (in the context of my knowledge—I talk about this in a post here with some examples: https://medium.com/euthyphroria/when-the-inductive-method-fails-learning-very-complex-grammar-with-anki-4ea4c4174ee4). In any domain, the epiphany we're looking for when we get a card wrong is "oh! I got that wrong—but I see why!" rather than "oh! I got that wrong! Darn, guess I'll try harder next time."
Eric 'Siggy'
2023-04-01 13:22:52 +0000 UTCArgh, oops, I always worry I'm going to miss one or two. Hopefully fixed now for future listeners—thanks for telling me. :)
Andy Matuschak
2023-04-01 05:54:07 +0000 UTCVery informative. I wasn’t even bothered to learn about how you deal with mistakes in narration by making a pause and saying “cut”, then rereading the passage. Accidental teaching and learning! Thanks!
Pavol Vaskovic
2023-04-01 05:39:03 +0000 UTC