XaiJu
Andy Matuschak
Andy Matuschak

patreon


Finding research–context fit

The life of an early startup revolves around a desperate search for “product–market fit”—a state in which you’ve found a solution so compelling in some market that the world starts yanking the product out of you faster than you can make it. That’s when the exponential flywheels can start spinning and a startup can start to make good on its rocket-fueled ambitions.

Aspiring inventors of “tools for thought” aren’t chasing rapid customer growth, but their research velocity does critically depend on a related phenomenon: research–context fit. I’ll illustrate what I mean by describing my own struggles to find this fit.

Quantum Country’s poor research–context fit

I’ve been trying to create environments that make it much easier for readers to engage with complex ideas—environments that aim to substantially expand what people can think and do. In 2019, Michael Nielsen and I published Quantum Country, a textbook on quantum computation written in an experimental “mnemonic medium.” The medium integrates powerful ideas from cognitive science intended to make it much easier for people to remember what they read.

Now imagine that most of Quantum Country’s readers are new graduate students. They’re about to embark on original research in a new field; they’re trying to find themselves as independent thinkers; they’re desperately trying to learn a notoriously challenging new field; they’re perhaps more than a little overwhelmed. They’d viscerally feel the successes or failures of a resource like Quantum Country in their day-to-day experience of the most important activity in their lives. Conversation with students would drive forward the research on the medium. The qualitative impact on students’ understanding would suggest important new research questions. In this world, I’d have research–context fit.

But that’s not the situation. Most of Quantum Country’s readers—even readers who keep up with its review sessions for months—are simply curious people, interested to learn the basics of quantum computing. Quantum Country is a more elaborate version of some other dense non-fiction they might read on a Sunday morning with their coffee. These readers might enjoy the book, but the success or failure of the medium isn’t viscerally apparent in their experience.

This context doesn’t create enough pressure on the medium or its ideas. Where are its biggest deficiencies? Along what axes can it meaningfully expand readers’ capacity? When can a person perfectly recall hundreds of details but still fail to understand or act? It’s hard to say. Worse, the questions which naturally arise in this undemanding context tend to emphasize a weak framing for the medium, one focused on helping casual readers enjoy the feeling of learning. I aspire to a much more powerful framing: to develop a medium which significantly expands readers’ capacity to do whatever they find most meaningful. The best way to develop a system like that is in a context which would ravenously metabolize its benefits. Intense relief and deep frustration would drive both questions and answers.

The problem I’m describing is common in the “tools for thought” space, particularly for inventors focused mostly on augmenting other people, rather than themselves. As Michael and I noted in How can we develop transformative tools for thought?:

There’s a lot of work on tools for thought that takes the form of toys, or “educational” environments. Tools for writing that aren’t used by actual writers. Tools for mathematics that aren’t used by actual mathematicians… It’s very easy to slip into a cargo cult mode, doing work that seems (say) mathematical, but which actually avoids engagement with the heart of the subject. Often the creators of these toys have not ever done serious original work in the subjects for which they are supposedly building tools. How can they know what needs to be included?
Suppose you want to build tools for subject X… Unless you are deeply involved in practicing that subject, it’s going to be extremely difficult to build good tools. It’ll be much like trying to build new tools for carpentry without actually doing any carpentry yourself. This is perhaps part of why tools like Mathematica work quite well – the principal designer, Stephen Wolfram, has genuine research interests in mathematics and physics.
There’s a general principle here: good tools for thought arise mostly as a byproduct of doing original work on serious problems. They tend either be created by the people doing that work, or by people working very closely to them, people who are genuinely bought in. Furthermore, the problems themselves are typically of intense personal interest to the problem-solvers. They’re not working on the problem for a paycheck; they’re working on it because they desperately want to know the answer.

It’s worth emphasizing: this principle seems to explain so much of the failure among aspiring inventors of tools for thought! I find it incredibly difficult to heed, myself. Like so many other technologists, I have a natural tendency towards tool-fixation. If I leave that tendency unchecked, I slip into exactly the kind of failure mode that passage describes.

As far as developing a tool for authors, Quantum Country managed this principle reasonably well. My co-author, Michael, was also the co-author of the standard textbook in quantum computing. He’s also quite serious about the challenges of being an effective writer, both generally and about this topic specifically. Likewise, in my work to expand the medium to other domains with Orbit, I’ve spent a lot of effort building relationships with potential authors for the mnemonic medium, so that their problems become my problems.

But I only recently realized that I’d been neglecting the same principle as it applies to readers. For those of us interested in creating communications media (like the mnemonic medium), we must create a close working context with both the authors and with the readers/consumers—or else we must authentically inhabit one of these roles ourselves.

Finding a better context

It’s not enough to just work much more closely with readers: I need to find readers in a context strong enough to support my research. Here are a few properties which seem important for good research–context fit, not only for the mnemonic medium but for tools for thought in general.

  1. Strong signal. You want to be able to run experiments and make observations with enough clarity to actually answer your primary research questions. With Quantum Country and Orbit right now, it’s quite hard to answer basic questions necessary to developing the medium. Very broadly, this might mean: what’s working? What’s not working? What is the actual impact of the medium on readers’ capacities? And more narrowly: how do specific changes to the medium affect its impact on readers, and on what it enables them to do? What sorts of understanding can different kinds of interactions support, in what topics?
  2. Real stakes. You want a context where the impact of your work has at least the potential to be transformative. You can’t expect instant success, but to guide your work, you need to be able to see at least glimmers of non-linear returns. One way to think about the goal of a project like the mnemonic medium is to create an environment that’s radically more enabling for some significant set of readers—to over-simplify, a 10x reading environment. But no amount of medium-level improvement can “transformatively enable” readers who aren’t interested in engaging with the subject beyond indulging casual curiosity.
  3. Drives questions. Perhaps most importantly, you want more than a context which can help answer your research questions: you want a context immersive and demanding enough that it defines its own compelling research questions. Ideally, you’re not coming with your own strongly-held research questions at all. Instead, they should arise naturally in response to the context.

One confusing aspect of this discussion is that I’m actually quite interested in augmenting curiosity-driven readers. Sunday morning reading might start casual, but it’s often the seed for later meaningful creative work. How can I reconcile this observation with the concerns I’ve been describing? Very roughly: I think there’s significant path dependence in this research space. Like a biologist without a microscope, I need higher amplification to understand the phenomena I’m studying. This seems like a common story in tools for thought. Tools are initially developed in some critical context, intense enough to foment particularly powerful ideas, and then they’re later deployed more broadly in less demanding contexts. Of course, that last step isn’t automatic. Wolfram created Mathematica to support his needs as a researcher; I’m sure it needed extensive adaptation to support more casual tinkerers. Likewise, an environment developed to augment deep reading will require careful modification to augment informal reading. But I suspect it’s probably much more difficult, or impossible, to evolve a medium in the opposite direction—to evolve Calca into Mathematica.

At the moment, my best-guess “ideal” reader context looks like: serious people trying to enter a difficult new field (probably technical) for some purposeful creative project, like original research or a startup. Because I’m developing a communications medium, I need to find the intersection of such readers and a highly-motivated author. To get enough signal, it would be best to find a tight, energetic community of such people, and to immerse myself in it.

Simultaneously, I’ll need to find domain-expert authors eager to help new entrants—and to collaborate deeply with them. It’s not enough to give an author a pre-made tool and to answer their questions as they try to write. Their sense of what the topic needs, and of their readers’ challenges, must guide the medium’s evolution. A powerful new communications medium must be radically enabling not just for readers, but also for authors.

More concretely, the two most promising author/reader contexts I’m exploring:

  1. A professor’s monograph/textbook/notes written to get new grad students in their department up-to-speed on key topics necessary for their research.
  2. An industry leader’s book meant to help people start or join companies in a challenging new field (e.g. biotech, machine learning). Readers might have either a scientist’s or a technologist’s background.

To illustrate the issues involved in choosing a context, I’ll discuss a few other contexts which seemed promising initially, but which I now fear present serious problems.

University courses

One key question I’d like to answer is: how does the fluency you build through retrieval practice relate to your capacity for understanding and creative problem-solving? University courses have built-in structures (like exams, projects, essays) which could help me explore that question.

But readers in this context have a different problem than the one I’m trying to solve. Most of them are not learning as part of some broader meaningful creative activity. Most of them are responding (appropriately!) to external incentives: learn this set of things you’re supposed to know; pass your classes; get good enough grades; etc.

I remember what it was like being an undergraduate. I was taking five classes. I was somewhat interested in half of them, and the others were requirements. I wanted to learn, sure, but campus life held lots of other fascinations. Besides, most undergraduate courses have low expectations of their students’ fluency. If a professor had told me “use this system for two hours throughout the semester to reliably remember everything from my class,” I might have believed their claim, but I probably wouldn’t have followed their advice. My existing practices seemed to work “well enough,” even though of course I knew that much of what I learned wasn’t “sticking.” And indeed, this attitude seems to match our experiences in an experiment with an undergraduate class this spring.

Lots of people have made systems meant to help people get better grades in their classes more efficiently and reliably. By contrast, I’m interested in systems which expand people’s capacity for thought and action, around whatever they find meaningful. The two goals are related, but they’re not the same. Test scores are a proxy for a certain kind of capacity, but the pressures they apply to the medium are unlikely to shed light on the problems I want to solve. I have spent enough time in the field to understand that “education” is a mighty force. It is much more likely to subvert my work than I am to subvert it.

[Medical / law / business] students

Part of the problem with the undergraduate context is that those students don’t actually need fluency to achieve their proximate goals. But medical students sure do. There’s already a huge community of medical students using spaced repetition to internalize the huge body of knowledge they need to learn for their work. I suspect that law and business students may be in a similar situation.

But as I read the medicalschoolanki subreddit, my sense is that these students are primarily driven by a desire to get a good grade on their high-stakes examinations, and secondarily by an abstract pleasure in accumulating knowledge which might someday be needed. My wife’s a physician, so I asked her what she thought drove medical students’ study practices. Her instantaneous response: “fear!” Fear of not passing, fear of a grade too low for the residency you want, fear of embarrassment in front of an authority figure. But not fear of harming patients (there are several layers of supervision); not fear of lacking knowledge essential to research projects; not fear of failing to understand something you desperately want to understand.

I’m sure that some students in these environments feel differently. Perhaps I can figure out how to work with them. But my growing suspicion is that these contexts won’t supply the right pressures for my research.

Onboarding

If you’ve just joined a new company, you’re (hopefully) eager to be productive as quickly as possible. But there’s usually a huge amount of basic knowledge necessary before that can happen. New employees at Stripe, for instance, must rapidly learn a huge volume of company- and industry-specific terms, concepts, and procedures. On top of that, a new developer will need to learn all kinds of details about the company’s internal infrastructure. This could make for quite an interesting context for the mnemonic medium.

One challenge here is that formal corporate learning systems are almost always soul-sucking monstrosities that feel like the worst of school. What banal horrors come to mind when you hear “compliance training” or "reskilling”? You can tell that every action you take when you use these systems is being compiled into a “reports dashboard” for some administrator somewhere. There’s a whole industry around this stuff called “enterprise learning and development.” Cynicism aside, I know of several well-intentioned startups (some dead, some still trying) which have applied spaced repetition in this space. I worry that they’re making enormous sacrifices to the medium's potential in order to appease their buyer. One huge challenge is that the buyer in this instance is not in fact the user, and so the expected principal–agent problems prevail.

The trick in this space would be to find a company which is big enough for better onboarding to seem quite important, but small enough for it not to be awful-by-default. My instinct here is to reframe the interaction so that it’s an employee-centric tool: point your magic wand at anything which seems important to you while you’re coming onboard, and you’ll remember it effortlessly! The tool serves you, not some enterprise dashboard. This is more or less the opposite of the usual employer-centric curriculum-on-rails which “feeds” employees tasks. Framed in terms of a startup, such an effort would be an exercise in “disrupting” the enterprise learning and development space by introducing a “grassroots” tool which employees excitedly adopt on their own, before advocating for broader adoption within the organization. A bit like Slack’s path, I suppose.

This all sounds quite miserable to me—of course I’m trying to find a research context, not to “win” a “market segment”—but I still believe this path could be very interesting with the right partner.

High-stakes life changes

Certain key life events tend to be associated with a spree of book-buying: having your first child, founding your first company, building a house, grieving a loss, and so on. Many less-technical topics like these are nevertheless quite high-stakes, and they’re connected to deeply meaningful activities. Since many people in these situations are already buying books, perhaps there’s potential impact in making those books much more effective?

I hesitate here because while all these things are difficult, I’m skeptical that “learning complex ideas” is really the most important limiting factor for these situations. Of course, there are other interesting opportunities for better books in these domains, opportunities which aren’t about learning complex ideas. But I’ve spent a lot of time thinking about how to use the mnemonic medium to help people learn complex ideas, and I feel it would be a shame to shift focus without pushing harder on that problem.

I have similar concerns when considering the potential of augmenting books about personal development. For example, I’ve argued that a book like Atomic Habits could be much more powerful if the authored experience were extended over time, to help readers integrate its ideas into their lives. But my instinct is that taking such a challenge seriously would mean developing a very different medium. To give another example, studying the Buddhist dharma does involve internalizing a fair amount of precise knowledge. Maybe the mnemonic medium could help with that. But retrieval practice for the eightfold path and the four noble truths is probably not the most important opportunity to augment such books.

————————

Part of me is worried that I’m overthinking this and letting it stop me from building momentum within some “good-enough” context. Another part of me is worried that I’m not worrying about this nearly enough—that it’s actually by far the most important problem facing my work, and it should be the exclusive focus of my attention until it’s resolved.

I’ve had enough experience with the problems of poor research–context fit in my work at Khan Academy that I’m inclined to believe good fit is an essential condition to good work. It’s subtle. It doesn’t show up as an obvious blocker like missing equipment or skills. But if you aspire to augment other people by building systems for them, your domain of insight is substantially determined by their domain of use.

Comments

Another off-the-cuff idea: corporate trainings. Not the most exciting domain, but companies are often looking for measurable changes in knowledge or behaviors based on trainings. The RQ could be: does a mnemonic training outperform traditional trainings when surveys are used to test employee knowledge a year later? Or in terms of some measurable behavior improvement across the org?

Eric 'Siggy'

Thank you for these very generative ideas, Mickey! Authoring tools have indeed been a great venue for tools for thought, I think—stretching all the way back to Sutherland's Sketchpad. The idea of augmenting screencasts to create an accelerated communications medium is really fascinating to me, and makes me also wonder about augmenting Twitch. Business consultancies do seem like a solid potential context. I spoke with a former BCG-er, and his concern was that most consultants are only interested in getting "up to speed" enough to finish the project, and that they're not interested in long-term fluency. But I should probably get some more data points there! Your suggestion about coaching makes me wonder about the value of "take-aways." A beloved former piano teacher asked me to write a summary of each lesson after we met, then to email it to him for comments. He'd elaborate, emphasize, or correct my summaries as necessary. I could imagine augmenting that with a mechanism to make sure that the takeaways really stayed with me…

Andy Matuschak

I'm an amateur student of Buddhism, so I may be totally off base here, but when I think about augmenting the learning process in that domain, I imagine that retention is useful and important, but rarely the limiting factor for aspirants. In my own limited experience, progress has mostly come through sudden insights, often prompted by a teacher phrasing some instruction just slightly differently, so that things click. If I were building a system in this space, I'd want to understand that better and try to use the ritual of meditation practice as a "delivery vehicle" in a similar manner to SRS sessions.

Andy Matuschak

Thank you! These are both great posts. I've read the DARPA project reports, and it's an interesting continuation of much past work on intelligent tutoring systems, a related and promising (though historically somewhat fraught) field. I enjoyed reading the teacher's piece, including all the details about the challenges of making new systems work in practice. Those conclusions are consistent with my sense of formal educational environments, but I believe they're quite different from self-motivated learning (i.e. for some project or meaningful application).

Andy Matuschak

Thanks—I totally agree! This is a potentially incredibly powerful application for the medium. One of the things I have to keep straight for myself as I think about this stuff is the distinction between prospects for applications of the medium vs. prospects for answering research questions about the medium. For example, I think this is a potentially very exciting context for application, but boy would it be hard for me to use it to answer my current research questions (I think!). Instead of understanding the relationship between retention and understanding in the context of one book, I'd have to shoot for "coverage" of a large span of material, then understand how that holistically enables… I hope we get there!

Andy Matuschak

This post makes a lot of sense! Essentially, tools are probably best developed in the context of difficult real world problems. And part of the trick is to pick a few promising such difficult problems. Ideally, a few things could be tried. For example one that focuses more on memorization, a a different one which focuses more on understanding.

I like this framing and feel it viscerally at times in similar soul crushing ways. That said it makes me wonder where I’ve found research context fit. Below are some examples and I wonder how you’d analyze them for your work. It’d help me zero in on where you think fruitful soil might be found. When you’re working in CAD (my experiences at Autodesk) a project arrives to your consultancy or team (build a new campus, build a bioflame camp stove, or the new sequel to Avatar, etc), you likely won’t the work from your portfolio but there are new aspects you or your team members have to (and or want to) get up to speed fast on. Autocad (the classic monster) has over 1500 tools in the 2019 edition. The combination of the tools into tool flows can make or break whether you end up with an efficient and or creative or build able solution. If you’ve ever used photoshop, Alias Maya is like that tool times a million. It is overwhelming. So I wonder if classic (rich) authoring tools would be a good venue for tools for thought. Autodesk research has done a significant amount of work in this area however I don’t recall anything like orbit or spaced repetition. The screencast utility (formerly project chronicle) is used by CAD users to do peer learning. Make a screencast and share (it captures telemetry like mouse clicked, construction history, tool frame in use etc far beyond screen scraping). I wonder if that would be a useful medium. Another example (one I am considering trying orbit in but haven’t made progress yet) is onboard if new BCG employees explicitly about an emerging practice area that looks at the lowering barriers of atoms, bits, minds, and joules. These new hires may be thrown into projects where they are doing an analysis or building a pitch or being a contributing member to a multi week collaborative session with sponsoring orgs, scientists, designers, business analysts, startups and incumbents from classic industries. The ultimate goal might be to catalyze an ecosystem of startups, investment, and classic research or big business to move from extractive economics to generative ones (the deep tech thesis). The end users need to learn and apply their learning fairly quickly (consultancies like BCG are up or out style pressure cookers at times). That said classic hires (MBA’s) don’t cut it alone, so orgs like that are hiring scientists, designers, economists, complex system thinkers, etc. How do you help an ad hoc cognitively diverse group to learn enough about each other’s domain to extend via analogy, metaphor, etc the work and find novel pathways to pilot? I could imagine startups in that realm will also need to become more Feynman like extenders of other diverse knowledge. Lastly I think of the emerging focus on leadership and peer to peer coaching. I’ve found that helping a leader try new things and build new habits is tricky. There is a vast collection of literature and how tos, yet remembering the right framing and questions at the right time to help unlock the coachee is just plain hard work. You have to get thousands of hours of targeted practice and make sure you’re doing things that do no harm and don’t run afoul of voodoo or wishful thinking of shamanism. The learners in this case are often third chapter “students” over 55 and curious about how they take on their next role and use their lifetime of knowledge without being dogmatic.

A couple of possible research avenues: DARPA digital tutor project. They created bespoke system to teach problem solving for navy engineers, apparently with great success! https://www.lesswrong.com/posts/vbWBJGWyWyKyoxLBe/darpa-digital-tutor-four-months-to-total-technical-expertise Teacher who has experimented thoroughly with SRS in classrooms. Current opinion is it best for memorising vocab and similar repetitive tasks, but not for conceptual stuff. (Has also moved to belief that forgetting things is not main issue in teaching in classrooms.) https://www.lesswrong.com/posts/F6ZTtBXn2cFLmWPdM/seven-years-of-spaced-repetition-software-in-the-classroom-1

Lovkush Agarwal

Beautifully articulated---having spent much of this year struggling with "research-context fit" in my own work, I can really relate. One factor I think you've glossed over in the prospective contexts is *speed*. If Quantum-Country-like resources were available for all the topics that become briefly relevant to me during my daily technical work before I have to move on to a new task (ex. radar fundamentals, Kalman filters, Kubernetes, time series analysis, propensity score matching, etc., etc., etc.), I'd be able to learn 10x more and 10x faster on the job. Maybe that's an opportunity, rather than a burning customer "pain point," but it's still a heck of a story. My employer has nice subscriptions to giant libraries of tech books. Access isn't the problem---it's the time involved in digesting it into a form with landmarks and scaffholding that can be remembered and accumulated. People who aren't Anki-True-Believers would take convincing, but if the numbers show that enhanced mediums speed up knowledge acquisition, then a whole new educational market could be born overnight. Medium matters tremendously for speed. An example in my hobby life right now is that it's easily 10x faster for me to acquire Arabic knowledge than ancient Egyptian. I'd like to pick up both for an upcoming vacation, but Arabic resources are common and high quality (just open Duolingo and hoover it up into Anki), whereas Egyptian resources are written for "scholars" with all the anti-patterns that culture brings with it (ex. 300 pages of exhaustive discourse on the minutia of nouns before getting around to how to say something trivial like "The pharoah walks fast." If I had a dollar for every so-called "textbook" that waits 10 chapters to introduce the first verb... like, what are they even thinking? So much technical writing is built on the assumption that rote memorization is good enough, with no scaffholding or conceptual links for effective prompt-writing). In one, I can make 100 new cards in an hour easy peasy. Maybe more. In the other, it takes me 30 minutes to figure out how to extract 1 memorable card from a confused and challenging table or exercise. We're talking a legit 50-fold difference. That's arguably the same as the difference between Quantum Country and a typical technical textbook.

Eric 'Siggy'


More Creators