Taylor Rogalski and I have continued our work (first described here) on… what to call it? A new programmable attention system? A garden for glimmers of interest? Taylor's striking phrase, "a prosthesis for your will"? Well: at least the video's only eight minutes. I console myself with Kevin Kelly's advice to "work on something that no one has a word for".
This video closes the book on the second major iteration of our system. We're looking forward to sharing more about the third, when it's ripe!
2025-09-02 01:40:32 +0000 UTC
View Post
Thank you all for your kind notes these past months as I've supported my mother through her final stages of cancer. She passed away a few weeks ago, and I'm glad that I was able to be with her to make her final days as meaningful as possible.
I'm still very much recovering. I expect I will be for some time. So I felt deeply disoriented when I was notified a few days ago that I've been awarded a large grant I proposed earlier this summer. It would make it possible for me to hire a full-time team of several collaborators for a year, and to build out a studio space for our work. For this month's letter, I'd like to share the main section of proposal with you all.
I confess I'm feeling quite ambivalent about this award. The funding is timeboxed for a year starting from October, and ideally I'd have the team already recruited by that start date. I am—to put it mildly—not feeling like a creative leader at the moment, and certainly not a high-capacity recruiter. I may try to defer the award. Also, I've been really enjoying my collaboration with Taylor on programmable attention, and this project would take me in a different direction. (We have another report on that project coming up.) We'll see. More as I have it…
The awe at root
A few thousand years ago, our ancestors didn’t have written language. We’re almost the same as those humans, biologically, but what goes on in our minds is alien, incomprehensible. Growing up in a world with paper and pen has changed us. We can follow a chain of thought beyond our working memory. We can think abstractly, analytically. We can build on ideas of people we’ve never met, and record our own for posterity. Natural selection didn’t give us those powers. Other people did. We inherit the astounding power to shape tools which then shape us, the power to transcend our own mental capacities.
Written language was the work of whole cultures. But what excites me most as a designer is that small groups and even individuals can invent transformative tools for thought. I think of Playfair and statistical graphics in the eighteenth century, or Mendeleev and the periodic table of elements in the nineteenth. Tools like these are discovered as much as invented: their power lies in distilling deep domain insights (like periodic law) into external forms we can use to think and communicate.
In the twentieth century, the computer enabled a new class of these forms—dynamic media which behave and respond. When the modern architect drafts a site plan, his medium comes alive to actively support his creative thought. He sees in real time how his choices affect sunlight, material strain, and building cost. When the modern music producer mixes a track, spectrum visualizers let him quickly spot and fix muddy segments; automation curves let him express and manipulate the sound consistently. We can see similar stories in climate modeling, genomics, and other quantitative domains. These new media absorb part of the cognitive load, so that we can explore and refine ideas more closely to the speed of thought. In so many arts and sciences, dynamic media have once again created alien minds.
The inadequacies of scholarly media
Meanwhile, the basic media of scholarship have changed remarkably little. By that, I mean reading, writing, sketching, presenting—the communications media we think with and through. These media are astounding triumphs of human civilization. And yet: they’re profoundly inadequate. We often fail to absorb key ideas from what we read, without noticing. We forget details we found fascinating last week. We find our working memory constantly overloaded. We write notes in margins, then struggle to see the big picture, to make sense of the thread connecting various texts and our own ideas. We drown in an undifferentiated backlog of browser tabs and citations.
We nominally use computers for all these tasks, but we’re not really using them. We’re mostly just working with pictures of paper on screens. The miracles of dynamism are mostly confined to simple retrieval: search, databases, rapid access. In other domains, we have dynamic media which dance with us, which provide active support for our difficult creative work. Meanwhile, in scholarship, the paper lies inert, and we shoulder the full cognitive load. Echoing Engelbart: bricks are still strapped to our pencils.
Where is our AutoCAD for scholarship? Where is our Logic for scholarship?
You could say: maybe scholarly media are special. Maybe there’s something about reading or writing or presenting which prevents the transformative tools for thought we’ve seen in other domains. But in fact, modern science has uncovered a great deal about human cognition, sensemaking, and insight. Surely we can incorporate those powerful ideas into new media.
For example, cognitive psychologists understand the dynamics of memory formation quite well. There’s something like a recipe one can use to reliably and efficiently remember anything—the trouble is that books don’t naturally perform it. With collaborator Michael Nielsen, I invented a new kind of book which incorporates that insight, so that the act of reading naturally performs that recipe. We published a quantum computing textbook in that new “mnemonic medium” and demonstrated that with a small time overhead, our readers reliably remember hundreds of detailed points from the text, long after reading. Meanwhile, without support, most readers forgot many key details. So, clearly, it’s possible to radically augment reading. Some of my other recent experiments have demonstrated promising directions in sensemaking, writing, attention, and other central practices of scholarship.
Today’s software industry is enormous. Why hasn’t it pursued this problem more seriously? We could quibble about whether the market for architects and musicians is larger than the market for scholarship. But I favor another explanation. Powerful new tools for thought require both imaginative design insight and deep domain insight. I first learned design at Apple during the early days of the iPad. I watched designers with long-held passions for music and film pour creative ideas into Logic and Final Cut, tools that have transformed their respective industries. Meanwhile, I helped build Apple’s most innovative contribution to reading interfaces: the 3D “page curl” effect that follows your finger when you turn a page. I don’t think it’s an accident that Apple designers’ ideas about reading interfaces were limited to window dressing. None of my colleagues read seriously. That’s not specific to Apple; it’s been my experience with industry software designers in general. Scholarship is just not part of the culture. Meanwhile, academics in cognitive psychology and computer-supported collaborative work have domain insight, but bold design work is rare in their culture.
The project at hand
And so, I propose a program of augmented scholarship. We’ll invent tools and media which will transform the intellectual work scholars do through communications media—reading, writing, sketching, presenting. If paper and pen made us aliens to prehistoric humans, I want to make a surface for thought which is to paper as paper is to no paper. I want to make future scholars unrecognizably more capable.
My primary approach is to repeat what has worked in so many other domains transformed by dynamic media, but which has rarely been attempted for scholarship: to combine deep insights about the nature of human cognition and scholarly work with skillful and imaginative design work. While we catch scholarship up with the last revolution in media, we’ll also participate in the next one. Modern language and vision models offer dizzying opportunities to make scholarly workspaces contextually aware, not only of what a user is reading presently, but of how that connects to everything else the user exploring. Now, I’m not interested in automating the creative process. I locate the creative center within the scholar. But just as a pen can become an extension of the hand and mind, I believe AI can become a supportive extension of sensemaking and discovery.
I’ve spent the last few years building prototypes towards this vision of augmented scholarship, mostly alone or in dyads, and with minimal resources. Now that my work is accumulating, and frontier models have become more capable, it’s time to take a larger swing. I’d like to assemble a team of talented designers and technologists in a studio environment for an intense year of invention and experimentation. We’ll draw on my past few years of foundational prototypes and develop new ideas together. Because this project’s success or failure will depend on the richness of our domain and design insights, we’ll kick off the year with a collaborative workshop, inviting some of the most inspiring inventors in the world—people one could never ordinarily recruit for a full-time role. Embedded within [the institute], we’ll create a living laboratory, our prototypes serving [the institute's] scholars, and our observation of their use serving our creative process. We’ll publish our design work continuously, and we’ll end the year by delivering open-source demonstrations of new approaches to augmented scholarship, validated by internal and external usage observation.
2025-08-01 03:16:17 +0000 UTC
View Post
A few months ago, I wrote A startling glimpse of malleable software: LLM-generated Obsidian plugins. That essay demonstrated that idiosyncratic personal Obsidian plugins could be generated by Claude Code. It's janky, yes, but it's an interesting prototyping context which suggests some properties one might want from a malleable creative environment. In the essay I reimplement a bunch of examples from a classic paper on malleable software.
Now Claude Code has an SDK. I couldn't resist: I made a plugin-generating plugin called Trough. (You know—for your slop.) It gives you an in-Obsidian UI for creating idiosyncratic behavior on the fly. In the video above, I show Trough replicating the "Cash Register" document behavior from the earlier essays.
I've attached a copy of Trough's sources below. See the Readme for documentation. You'll need some developer experience to get it set up properly. Please note that this is a very quick sketch, and it may eat your data. I'd suggest trying it out with a test vault.
A personal note: Thank you to the many of you who sent kind notes and condolences regarding my mother's worsening condition. It's certainly been a difficult couple of months. I'm grateful to be staying with her in Portland, in what are now likely her final few days. Life is beautiful and precious. May we all savor it.
2025-07-01 01:50:20 +0000 UTC
View Post
Hi, all. Some other human-computer interaction researchers are trying to create a community for people thinking about dynamic abstractions. That is, interactive and computational representations. I think of that as things like CAD and digital audio workstations, Mathematica, etc.
One of the organizers asked me to host a session of their reading group. You're welcome to join if you'd like:
A key dream for computing pioneers like Alan Kay and Seymour Papert was that we'd replace inert textual or symbolic explanations with real-time interactive simulations. As a result, people would understand complex topics much more easily. How has this dream fared? We'll examine that broad question in general, and by way of one of the most elaborate exemplars, Earth Primer by Chaim Gingold (contributor to Will Wright's Spore, author of Building SimCity, and former member of Bret Victor's lab at HARC). We'll also discuss the debate in cognitive psychology about the role of these abstractions in understanding.
Reading: Earth Primer (focus on "Interior" and "Surface" chapters), Why Minimal Guidance During Instruction Does Not Work
Date & Time: 2025-06-27 (Fri) @ 12 pm EDT (9 am PDT)
Zoom Link: https://utoronto.zoom.us/j/85283871524
Bring your questions and observations! (But please do only join if you've done the reading.) I expect the discussion will be recorded.
2025-06-23 03:54:16 +0000 UTC
View Post
A personal note: my mother has a rapidly advancing cancer, and I've been quite distracted these past few weeks coordinating her care. I ask your patience with slow(er than usual) responses and inconsistent access to audio recording gear.
I've been spending a lot of time recently trying to articulate the perspective at the center of my research. These disparate projects all feel like part of one larger vision to me, but I haven't done a great job of explaining that. It's not just a matter of communicating to others: I haven't articulated it well for myself, which can make it hard to steer.
So this month I'd like to share some preparatory notes towards some clearer articulation of what I'm doing here. They're incomplete and wrong in many ways, but in many places they do represent a real increment in my thinking. And one thing I've learned about research is that often all I can ask for is to improve my understanding, one increment at a time.
The awe at root
My central obsession is with humanity’s ability to transcend itself: to invent outward forms which become part of us and then redefine our inward capacities. We are the same, biologically, as our pre-literate, pre-iconic ancestors, but our minds are utterly alien.
What’s even more directly inspiring for me is that single individuals and small groups can create such forms. The invention of written language feels abstract and diffuse. I’m tremendously inspired as a designer by the likes of Playfair, Mendeleev, the Lumière brothers, Sutherland, etc. I want to lay bricks in that cathedral. I want to continue the arc of human transformation begun by written language and cave paintings.
The frustration at root
Language, paper, ink, graphics, video—astounding, yes. And yet: not good enough. I feel a profound frustration at the inadequacies of our mediums for knowledge work. I feel the excise and waste. My central complaint is that the mediums are inert. I have to drive everything: orchestrating my attention, understanding, internalizing, practicing, sensemaking, drawing connections, creating anew. The words are dead on the page; the environment is largely unaware of the practical mechanics of cognition, scholarship, and creation.
So I strain to understand complex material, remember little of it, and deeply internalize even less. I lose track of papers and videos in a bottomless “to-read” pile, just as I lose track of notes on my own ideas. Working on a topic, I’ll read and mark up dozens of articles, but those marks end up lost in the margins; there’s enormous excise in seeing the whole, making sense of it, connecting it to ongoing conversations and notes over time.
I want materials and workspaces which actively support knowledge work, designed to reify our deepest understandings of human cognition, scholarship, and creative work. If a paper and pen can make me an alien to prehistoric humans, I want to make a surface which is to paper as paper is to no paper. When I strain at the page, I want it to be because of the essential complexity of the ideas, not because I’m needlessly taxing my working memory or metacognition.
“Books don’t work”, broadly construed
It’s not just “books don’t work” in the sense that I forget much of what I read. I also fail to absorb key points without even realizing it. My understanding too often remains very shallow. I strain to visualize and manipulate complex dynamic systems described with inert words and symbols. I fail to spot connections to other things I’m reading. I fail to connect my reactions to ongoing conversations with friends and colleagues. I make notes in the margins, and they usually stay there lost in the folds, useless. I have a hundred papers lost in open tabs, but I end up reading a paper linked in a recent tweet instead of the ones I’d endorse as most relevant/meaningful. I have mental and practical routines which can help with most of these problems, but they’re costly and unreliable, so I use them inconsistently.
It’s not just “books don’t work” for the purpose of learning. What I worship is the feeling of understanding something complex and beautiful, then making it one’s own, making something new of it. My interest is in creative engagement with ideas, in enablement, in doing. And so the problem is not with the isolated inertness of individual books but with my whole creative and scholarly workspace: the connections between the materials, the index cards strewn on the table, my own writing in progress, the computational systems I create from my writing.
It’s not just “books don’t work” in the sense of literal books. When I say “reading and writing”, I include static figures, videos, audio lectures, whiteboard drawings, and so on. The real project is cognitively convivial communications environments which proactively support the consumption, synthesis, and production of creative thought.
A sense of possibility
The point isn’t digital reading or digital knowledge work for its own sake. Like many of the personal computing pioneers, I see computers as a means to an end. Sensors to collect context and expressive input. Computation which can decide what to do with that context and input in real-time. Displays which can present dynamic information that proactively supports the knowledge work. Networks that ground the information in social and collaborative contexts.
I’ve seen what happens when we take those raw ingredients seriously. We get environments like modern digital audio workstations and CAD tools, which have utterly transformed the nature of creative work in those fields. My sense of possibility here comes from a parallel frustration: that almost no one competent has even tried to reproduce those successes for knowledge/scholarly work.
Many academics have tried, of course. But I also have a subtler belief—that success here involves inventive interface design work. Academics don’t really do that. I believe what’s needed is the imaginative design energy which created the iPhone and Facebook Paper, but directed towards enablement. I worked on Apple’s most inventive contribution to reading interfaces: the 3D “page curl” interaction when turning a page. It’s sad. Almost no one I met there read or cared about reading. But I did see designers creating Final Cut Pro and Logic, with tremendous imagination and passion. That’s what’s needed here. That’s what gives me such a strong sense of possibility.
The vision of the Primer
My quest is against the dead page. We can think of that as two targets: the words themselves—the content—and the page or desk they sit upon—the workspace. I’ve mostly talked about the latter: extra tools and layers which augment memory/learning, surface connections, orchestrate attention. That stuff sits on top of and between the content.
But I’m also deeply frustrated with the inert content itself. That’s one of my deep connections to the vision of the Primer. It’s a vision of situated communication, the “multimedia” we deserve. Words woven seamlessly into environments of use. Directly manipulatable responsive representations.
I don’t want to watch Grant Sanderson explain quaternions with a beautiful 3D visualization I can’t actually use myself. I don’t want to watch a video about EQing audio track in Logic, constantly pausing and switching windows to compare the knobs in the picture to my own running copy. I don’t want to read a paper about a novel interface design, zoom into the tiny dead figures in the PDF, then try to line that explanation up with the live demo in my browser. I don’t want to look at a statistic in a ML paper and rummage through endless Python scripts to find the details of the evaluation which produced it.
I want universal deixis. I want to break down barriers between discussion about a dynamic representation and the thing itself. It’s “don’t separate by mode of production”, not just in terms of producing a static explanation, per Tufte, but in terms of the full creative process.
Part of this vision is that it needs to be much easier to create and even improvise dynamic representations. But that others (including Bret) are working on that; it feels less centrally like my own personal quest.
Differences with pioneers
Relative to Bush in “As We May Think”. The memex is mostly a database. It’s primarily about instant access. Its main write-oriented verb is to create a coarse connection between documents. I think this is a good metaphor for a lot of what computing has done for the core elements of knowledge work and scholarly work: unlimited storage, instant access, a search feature. The machine can present whatever information you want. It’s up to you to decide what it should present, and to make something of it.
Relative to JCRL in “Man-computer symbiosis”. For him, computers are for calculating and processing information. We outsource costly computation to them but generate and explore the hypotheses ourselves. I get the image of a very elaborate desk calculator when I read him. I’m interested in a different kind of symbiosis. I want a communications media environment which provides active cognitive support, which is aware of how thinking, learning, sensemaking, and creation happen—both for humans in general, and for me in particular—and which presents affordances, adaptations, and information layers accordingly. It’s not a desk calculator; it’s more like magic paper which transforms your engagement with what’s inscribed upon it.
Relative to Alan Kay. His vision is of a new universal literacy—that we should all be able to read and write dynamic systems. I’m interested in that project, but my focus is less on fundamentally computational content (like simulations) and more on using computation to make communications media more active participants in understanding and creation. His instinct was to focus on children, so that the new literacy could shape their whole understanding of the world. My instinct is to focus on adult experts, because that’s where I see the most compelling levers. Also because trying to impact children means navigating corrosive condescension/paternalism (in oneself) and enormous entrenched sociocultural institutions.
The role of AI
My sense of possibility in this space was plenty strong before the current wave of AI. Most people interested in AI for knowledge work are focused on generating output, on automating parts of the creative process. The user plays a manager driving agents which do the substantive work. I’m happy to automate the mundane, but I locate the creative center within the user. I want to support the complex processes which happen inside the user’s mind—in understanding, sensemaking, ideating, expressing. This is more a model of symbiosis and extended cognition than outsourcing.
That said, I want my cognitive workspace to have awareness and (to some degree) agency—to provide active support as I understand, synthesize, and create. I want that support to be richly grounded in my context, in everything I’m reading and writing, in my own aims and activities. I think AI is likely to be an important instrumental ingredient in that.
For example, suppose I’m writing notes about a new idea. A forgotten paper buried deep in my “to-read” list contains some relevant background. It’s suddenly quite relevant to my interests in the moment, upgraded from “hm, maybe I should check this out someday.” I want my environment to notice that connection and to unobtrusively surface it to me. Most “reading lists” present the interface of an endless abstract to-do list. I want a “reading list” that deepens my orientation to what I care about in the moment, presenting entries not as a static database query but responsively to my context.
Another example: suppose I’ve written some flashcards about details I found particularly interesting. I want to engage in retrieval practice on the substance of those cards, not their form. But after a while, I pattern match to their form. AI can vary the surface language to avoid that—and, perhaps, can helpfully vary and deepen the substance over time as well.
Lots more relevant examples: multi-level summarization of texts, re-surfacing old notes which might have relevance to new work, marginalia which point out the parts of texts which might be most relevant to my work, automatically synthesized dynamic representations of static descriptions, etc etc…
The project, overall
Hard to summarize. Transcending paper, ink, written language, static figures and videos. An environment tailored for human cognition, which actively augments reading–writing–thinking. (Like a full-body exoskeleton for the mind?) “Paper” that works on your behalf to support understanding and creation.
---
Thanks to Ozzie Kirkby, Geoffrey Litt, Michael Nielsen, and Sara LaHue for helpful conversations as I've pulled on these threads.
2025-05-31 04:02:32 +0000 UTC
View Post
Inspired by last month's letter, I recently wrote this grant proposal for a new project to begin late this year. I decided not to submit it after all, at least in this form, but I thought you all might it stimulating, despite its brevity. (The grant had a very tight length limit!)
---
If you watch over the shoulder of an experienced programmer, you’ll see an environment exquisitely tuned for an expert doing serious work. Dense overlays surface rich information for understanding the system. Sharp tools allow the programmer to rapidly manipulate their material. This environment can fluidly shift with the needs of the work. And if you watch a different engineer, you’ll see different interfaces, reflecting different needs.
Now, watch a scientist reading papers or books on their computer. You’ve moved from a dynamic airplane cockpit to Fischer Price. Digital reading environments are impoverished, both in comparison to their physical equivalents, and in terms of what computers make possible. These systems give us a dictionary, search, and perhaps some clunky annotation tools. Serious readers deserve serious reading tools.
People have made unusual digital reading tools before. One system makes it easy to follow citations. Another adds LLM chat. Yet another lets readers manipulate clippings on a spatial canvas. But these projects are siloed, incompatible. Each must reimplement a complete environment before it can add its special tool. Readers can’t use one system’s citation tool and another’s annotation tool. The work can’t accumulate.
And so I propose a malleable reading substrate—one which can be flexibly adapted and extended like a programmer’s tools, or as writers have done with Emacs for decades. Unlike those systems, which require tedious programming to extend, this substrate will be designed with LLMs in mind, so that idiosyncratic workflows can be improvised on the fly.
This project has two interlocking workstreams: creating the first malleable digital reading environments, and using that environment as a lab to invent and develop novel interfaces for scholarly reading. The latter lends itself to parallelism, to a small team of designers exploring in a focused studio environment.
A small sample of the reading interactions I’d like to explore:
Programmable lenses and highlighters. e.g. Surface contradicting evidence from your library in the margins. Shade citations which do not support the claim being made. Highlight a key term in purple to add its definition to your spaced repetition system.
Multi-device workflows. e.g. Do a quick pass over several papers using a stylus and tablet. Then your desktop simultaneously displays all the passages you marked, so you can draw connections across documents.
Social connections. e.g. Discussions about this paper from your team’s chat system appear in the paper’s margins, and vice versa.
Multiscale representations. e.g. The scroll bar displays visual information about where you’ve spent time, the density of your markup, etc. Seeing the structure of the whole text while viewing a part.
Deep integration with writing tools. Reading, writing, and thinking are inseparable. Your reading tool should be bidirectionally connected to your writing tool.
---
Thanks to Ozzie Kirkby for helpful comments on an earlier draft.
2025-04-30 21:07:02 +0000 UTC
View Post
One of the great dreams of the personal computing revolution was that we’d all have not only our own computers, but also our own software, exquisitely tuned to our own purposes. Instead, we got the application model. Software is expensive to build, so developers try to capture as large a market as possible, creating one-size-fits-all packages. You can only change what the developer gives you knobs to change; you can only combine apps at junctions the developer has provided. In short, our workflows often don’t work the way we wish they did.
People have been trying for decades to break us free of the siloed application model (for a review, I recommend Marcel Borowski’s doctoral thesis, p31-44). One approach has been to let people make their own computational tools, without any complicated programming. That has turned out to be difficult. Digital spreadsheets’ computed cells are usually cited as the major success story, but it’s hard to find others.
Now we have large language models (LLMs). Everyone can build their own software, right? Well—maybe, eventually. In spring 2025, my impression is that most software built end-to-end by LLMs is either a shiny demo or a low-stakes component at the periphery of the real work. Filmmakers aren’t using LLMs to personalize their editing workflow. Musicians aren’t using LLMs to personalize their composition workflow.
I want to see the personal dynamic media dream fulfilled for experts’ central work in their domains. Part of the trouble here seems to be those activities often take place in extremely complex “mega-apps”. If people want to customize a few design details for their personal workflow, they usually need to recreate all that complexity. Neither today’s LLMs nor their typical users are prepared to rewrite Photoshop on the fly to fit their personal needs.
So: malleable software needs a more malleable substrate than today’s zero-to-one “text-to-app” workflow. Researchers working on this problem know that, and they’ve proposed many principled architectures for flexibly extending software behavior. Unfortunately, none of these approaches is as versatile and accessible as the “text-to-app” workflow. Can we find some happy middle—enough structure to permit adaptation without rewriting well-honed functionality from scratch, but with enough versatility and accessibility for typical users in serious, real-world contexts?
Some surprising experiments
A startling question occurred to me a few weeks ago: can it be possible that Obsidian is the only deeply extensible WYSIWYG word processor that has ever reached production? By “deeply extensible”, I don’t just mean theming, macros, or even custom sidebars (e.g. Google Apps Script’s). I mean that third parties can alter the interface of the text editor itself—for instance, with custom interactive elements interleaved into the document, custom behavior in response to interactive input, or custom interpretation of the data. Now, we’ve had deeply extensible plaintext editors (like Emacs) for decades. OpenDoc and OLE aspired to some part of this flexibility, but were abandoned before maturity. As far as my limited awareness goes, we got our first deeply extensible WYSIWYG word processor in 2022.
Deep extensibility doesn’t automatically translate into the kind of personal adaptation we might want. Obsidian plugins require programming. Extensions which modify the interaction design of the WYSIWYG text editor itself require surprisingly complex programming. So, in practice, a typical Obsidian user must rely on someone else to create the functionality they want. The standard pressures of software development still exist: because plugins are costly to make, the focus on is on plugins which can serve a wide audience; because these coarse-grained plugins become somewhat one-size-fits-all, they often become bloated with settings and complexity.
Yet now: LLMs. As part of my project with Taylor Rogalski, we’ve been building a very complicated Obsidian plugin—basically using Obsidian as a prototyping substrate for a speculative tool for thought. I began with a traditional software engineering approach, augmented by line-level LLM suggestions. Meanwhile, Taylor enthusiastically experimented with having LLM agents build features for our plugin, end-to-end. There are limitations there, of course, but I was surprised by how much they can handle.
Inspired by Taylor’s experiences, I started asking LLM agents to produce simpler Obsidian plugins end-to-end, with no expert intervention on my part. I just cloned a plugin sample project, installed it into a test Obsidian vault, and let the agent do the rest.
My first attempt was actually to fulfill a workflow idea that Taylor had jotted down. His design was based on The Email Game, a gamified and timed two-pass inbox triage. Taylor wanted a sidebar listing all incomplete to-dos in the current file. Once he selects a few to-dos and clicks “Start”, the interface should show one task at a time, with a 2-minute timer and a text box for jotting notes. When the timer’s up, or when a “Done” button is pressed, the view should advance to the next task and add his text notes as a bullet in the document, beneath the task just completed. Cursor’s Agent (with claude-sonnet-3.7-thinking) built that on the first try, without any extra input on my part. I was honestly quite stunned. Taylor’s desired workflow is idiosyncratic enough that no general plugin exists for it, but the agent could make it in just a few minutes. And, critically, this isn’t a separate app; it’s integrated into an environment where Taylor does serious creative work.
Next, I asked for a plugin which continuously tallies up all the new words I write, records them in a CSV (updating the current day’s entry as necessary), and shows an unobtrusive celebratory banner every time I add another 250 words. One shot—done. No manual edits or iteration needed.
These examples begin to show how far it’s possible to tailor my writing environment without any programming. But I’m interested in malleability at the level of the writing surface itself. You could implement my examples in Emacs, or with a graphical sidebar via Google Apps Script. What’s so interesting about Obsidian is that the editor itself can be modified to include custom behaviors and interface elements. The trouble is that those plugins are much more complicated to build. I’m an experienced engineer, and I find them very difficult to write correctly by hand.
Potluck was one admirable attempt to let users add their own interactive behavior to text editors, without traditional programming. It introduced a shared representation for the desired behavior, riffing on spreadsheet cells:

Of course, since we’re still stuck in the app-centric world, this solution required the authors to build their own custom text editor. You can’t bring Potluck’s behavior to your own preferred editor. You can’t bring your preferred editor’s behavior to Potluck. And you’re constrained by the primitives they provide: a few basics like buttons, sliders, timers, etc.
The Potluck report includes many interesting examples of custom interactive behavior. So I decided to test the AI-generated Obsidian plugin workflow by replicating the first four featured examples. To my great surprise, Claude Code successfully implemented all four examples with zero or minor feedback, and with no expert intervention (see more details in Appendix 1).
Plant tracker lets you track when your plants need to be watered. It colors red any plants which are due for watering. When you’ve watered a plant, you can click a button to update the date in the document.

Agenda lets you plan in terms of agenda item durations, then see how those turn into actual times once you set a start time. Note that the blue text is just an interface decoration—your durations are revealed when you move your cursor into them.

Cash register lets you specify items for sale as emojis with prices, then “ring up” transactions and show a total. Again, this blue text is a non-editable dynamic interface decoration.

Timekeeping lets a freelancer track their different projects as separate headings in a text document. It adds buttons to the editor for starting and stopping billable work. Again, this colored text is a non-editable dynamic interface decoration.

Inspired by these successes, I attempted two more difficult plugins, neither of which Claude Code could implement:
Footnotes as marginalia: Display the contents of Markdown footnotes in the margins next to their reference points, adjusting the vertical positions to avoid collisions as necessary.
Claude Code couldn’t get close on the layout. I think that if it had a tool which let it interactively query the DOM and rendered output, it would have been able to do this. That intuition is based on watching it build the project, look at the build errors, and make appropriate changes to fix those errors.
Book covers: If I add the “book” tag to a note, find its cover (using the filename as the query) and display it as a “cover image” at the top of the editor when the note is opened.
I deliberately didn’t specify an API to use for finding those images. Claude Code stumbled here a bit until I insisted that it just pick one. But then it didn’t know the correct structure of the response data. I think that it could have succeeded if it had been permitted to search the web for suitable APIs and their documentation.
The agent’s solution also struggled to interpret the varieties of filenames I might use to specify a book: e.g. with or without an author, with or without a dash to separate the author from the title, maybe using an abbreviation for the author’s first name, etc. I think it could have fixed this with some iteration if I gave it a bunch of examples, or if one could find a “fuzzy” book search API.
Some lessons from Obsidian
I’m not saying that if we rub some AI agents on Obsidian, we’ve suddenly solved this longstanding problem. Rather, I think it’s worth examining what makes Obsidian such a suitable substrate for this kind of idiosyncratic personalization. I hope that can inform future work that’s more explicitly focused on complex surfaces (like Obsidian’s editor) central to serious work.
The most decisive architectural decisions here weren’t made by the Obsidian team. Its extensible editor is an open-source library called CodeMirror; Obsidian exposes that editor for plugins to modify. CodeMirror has an unusually complex—and unusually thoughtful—programming interface for modifying its behavior. Its author, Marijn Haverbeke, has refined that interface over many years of working with developers trying to use it in unusual ways. It’s very difficult to design a system like CodeMirror. You have to find a way to express the behavior of your system in an extremely general and composable way, so that extensions can modify that behavior consistently and without interfering with each other. But now that CodeMirror’s programming interface has been refined, I can absolutely imagine translating its core concepts to, say, a Figma-like canvas or an Excel-like spreadsheet.
Malleable software researcher Clemens Klokmose proposed this list of six properties we might like in our truly personal dynamic media:
malleable: users can adapt tools to fit idiosyncratic preferences and uses
shareable: multiple users can work on the same data, using their own views and tools
distributed: the data and tools can be used on different kinds of devices, in different situations (desktop, tablet, phone)
composable: the user’s preferred tools can be combined into workflows well adapted to their activity
interoperable: different tools can interact with the same data, so that the user can always use the “right tool for the job”
computational: the system behaves and responds dynamically
Obsidian provides key infrastructure which allows it to perform surprisingly well as this kind of substrate:
malleable: An extensive plugin programming interface, combined with Claude Code, provides rich malleability.
shareable: Obsidian has built-in support for collaborative editing, including plugin sharing. Plugins sync, but each user can choose which ones they enable. It’s monolithic, though: custom behavior (e.g. the plant tracker) is attached to the “vault”; there’s no way to “attach” it to the plant tracking document and share just that document. The collaborative editing support is pretty simple—it doesn’t respect any semantics of rich syntax or plugin syntax. But it’s good enough for a lot.
distributed: Obsidian runs on tablets and phones. The programming interface includes ways to customize behavior for those devices, and it applies reasonable defaults. Plugins run on mobile devices without any extra intervention. I was startled when the first plugin I built “just worked” on my phone: we don’t usually see rich user-defined behavior in apps on our locked-down phones.
composable: Obsidian’s interface language of tabs, panes, sidebars, command strips, and so on provides a lot of latitude for composability. But there’s still a significant boundary between Obsidian and the other applications (Figma, Slack, etc) you might use in a single activity. Its support for non-text documents is limited.
interoperable: Obsidian vaults are just folders of Markdown files. A huge ecosystem of tools support that format.
computational: Obsidian plugins can dynamically behave and respond in complex ways, as the examples I’ve given illustrate.
Of course, I want to be able to work this way not just with my text documents, but with my design work, and with my sheet music, and with my books, and so on. I think the interesting thought experiment here is to imagine if one’s whole operating system behaved more like Obsidian in this way. Actually, like its spiritual predecessor Emacs, Obsidian already does many operating-system-like things. Note that it also provides for all plugins:
a shared data layer to read from and write into
syncing across devices
a custom metadata layer for documents (Markdown “frontmatter”)
a themeable style language (semantic colors, type, spacing, etc)
standard interface concepts and implementations (modals, tooltips, popovers, sidebars, notifications, the command palette, keyboard shortcuts, settings)
a system for configuring, updating, and sharing plugins
Important missing conceptual pieces
Potluck includes quite a lot of beautiful conceptual design work to make it plausible for non-expert users to create the examples I showed above. I replicated its featured examples without any of that machinery—but that doesn’t mean all that design work is unnecessary. On the contrary, I believe the biggest problem with the agent-driven route to malleability is the lack of a shared representation of system behavior.
When I was demonstrating the plant tracker to my wife, I created a sample entry which read “every 2 weeks, last watered on 03/01/2025”. The plugin didn’t work. I knew I must have made a mistake in the formatting, but I wasn’t sure where. It turned out that the plugin only supported “every N days”—not “weeks”. I figured that out by copying and pasting the original Potluck example document into my editor and poking around. By contrast, if I’m confused about the behavior of a document in Potluck, I can view this representation of the system’s behavior:

This interface makes the situation much clearer: the plugin looks for “every {number} days”. I can see right away that it won’t support “every 1 day” or “every day”. In the agent-based workflow, if I want to support weeks, I could ask the agent to add that, but I could only understand whether it’s working (and whether the “days” format still works after that change) by trying it out in the editor. It would be better if, as Potluck co-author Geoffrey Litt has suggested, as you asked the model to make that change, you could see the impact directly on a shared representation of the system’s behavior.
I don’t think this is a nice-to-have. I think that our understanding bounds the complexity of systems we can create—even if an LLM is doing the programming. If we can’t really understand a system’s behavior, except through trial and error every time a change is made, problems and confusions will pile up. It will become more and more difficult to change the system predictably.
In a recent talk at Stanford, Haijun Xia made a similar argument. He proposes that AI-augmented malleable systems should have user-modifiable representations of both the activity (e.g. freelance time tracking with separate projects and totals) and also the interface (e.g. the start/stop buttons, the styling on the computed hours display).
In the talk, Haijun demonstrates a project called Jelly led by Yining (Rima) Cao, one of the students in his lab. Jelly shows the user an explicit shared representation of the activity, visually indicates how it maps onto a generated interface, and highlights changes to that specification over time.
A key challenge here is that if we want our custom interfaces to involve behavior, then in the limit, this specification must be Turing-complete. I can ask for a button that does anything. How can we clearly represent arbitrary behavior to the user? Aren’t programming languages created to do exactly that? If so, are we back to solving the problem of helping non-programmers understand how to program?
Perhaps one meaningful difference is in the precision that these users need. Programming languages are designed to let engineers specify exactly what should happen. “Higher-level” languages leave more details to the computer; “lower-level” languages take control of more details. Perhaps LLMs can enable an even higher-level language which specifies “just the important details”—an admittedly undecidable category—in English. A user would give a short description of what they want; the model would synthesize this more detailed shared English representation; then it would generate actual code which implements that intermediate representation. Conceivably, a user could read, understand, and modify the details of that shared representation. (See Appendix 2 for an example of that for one of the Potluck samples.)
But such a representation is still abstract. For the Timekeeping example, it might include “Identifies lines formatted as "[duration] minutes [description]" (e.g., "30 minutes Introduction”)”. But suppose that you wrote “30-45 minutes for questions”, and you were confused why that agenda item didn’t get included properly. I think the ideal here is that you could directly manipulate specific instances of system behavior to contextualize changes to the system. For example: selecting that text in your document, and asking “Why doesn’t this show up as an agenda item?” or “Make time ranges work too.”
My instinct here is that if one looks back through the decades of work in end-user programming and reconsiders those ideas with an eye to the new capabilities of LLMs, one would find many interesting possibilities to try.
————————
Thanks most importantly to Taylor Rogalski for enthusiastically trying to make LLMs write Obsidian plugins. I’m so engineer-brained that I doubt I’d have seen this letter’s central insight without his example.
Thanks to Geoffrey Litt for influencing much of how I think about malleability and LLMs, and for his paean to browser extensions, which I’m sure shaped my thinking here quite a lot. Thanks to Haijun Xia and Adam Wiggins for conversations which helpfully prodded my thinking on this topic. Thanks to Marijn Haverbeke for his remarkable work on CodeMirror’s API design, and to the Obsidian team for building a uniquely flexible substrate around it.
————————
Appendix 1: Potluck example replication details
Cursor Agent failed to implement my first two tests, so I switched to Claude Code, which did very well. They’re both using sonnet-3.7-thinking, so it’s possible that this is just a matter of luck. I haven’t tested that thoroughly.
Some minor details and caveats on the replications:
Plant tracker worked perfectly in one shot.
Agenda needed to be told to fix the build errors, to be told that a particular subfeature wasn’t working, and to be given some console logs to help it fix that subfeature. Future models will surely see build errors and console error logs, so the only real user intervention would be saying “this bit doesn’t work”. More speculatively, UI automation could solve that too.
Cash register needed to be told that a subfeature wasn’t working (but no technical detail).
Timekeeping needed to be told about a bug (but no technical detail).
I should also note that Claude’s implementation isn’t styled nearly so well as the original authors’!
I've published the source code for the replications here.
For those interested, I’ve reproduced below the Claude Code prompts which produced the Potluck replica interfaces. I’ve also included all of the “bug reports” and feedback I gave. Some of these prompts include a fair amount of detailed guidance. I haven’t tried reducing them to see if that guidance was necessary. It would be interesting to find the minimal guidance needed.
Plant tracker
Implement an Obsidian plugin called "Plant tracker".
* It will use a CodeMirror editor extension.
* If a line in the editor begins with the 🌱 emoji, then search that line for a string like "every 5
days, last watered on 10/31/2022". Notice the interval ("every 5 days") and the date
("10/31/2022").
* Underline the "every N day(s)" string and the date.
* If the date is recent enough (i.e. it's fewer than N days ago), color it green. If it's too old,
color it red.
* Insert a small inline button widget after the date. The button's text is the 🚿 emoji. When I
click the button, it changes the date to be today's date.
Agenda
> Implement an Obsidian plugin called "Agenda".
* It will use a CodeMirror editor extension.
* Look for a line which contains a time, like "19:00". We'll call this the start time. Underline the
time in blue.
* Look for a line which begins with a duration, like "30 minutes Introduction". We'll call this an
agenda item. Without altering the document, visually replace that string with the times it would
take place if the meeting started at the start time.
* Write that replacement in blue text.
* So, for a start time of 19:00, if the first item is "30 minutes Introduction", it would display
as "7:00 PM - 7:30 PM Introduction". If the next item is "15 minutes Snacks", it would display as
"7:45 PM - 8:00 PM Snacks". Note that the times accumulate.
* If the user's cursor moves into an agenda item duration, display the original duration as an
underlined string--so the user can edit it directly--and move the timing preview which replaced it
into a floating tooltip above the duration.
---
app.js:1 Plugin failure: obsidian-agenda TypeError: Cannot read properties of undefined (reading 'decorations')
at Object.provide (plugin:obsidian-agenda:216:39)
at e.define (app.js:1:277778)
at anonymous (plugin:obsidian-agenda:198:53)
at e.<anonymous> (app.js:1:2267896)
at app.js:1:239667
at Object.next (app.js:1:239772)
at a (app.js:1:238490)
---
Try to build; fix the errors.
---
> OK. The start time and time range previews work. But when I try to move
the cursor into a duration, I get this console error:
Uncaught Error: Ranges must be added sorted by `from` position and
`startSide`
at e.addInner (app.js:1:300017)
at e.add (app.js:1:299791)
at createDecorations (plugin:obsidian-agenda:182:15)
at e.update [as updateF] (plugin:obsidian-agenda:206:14)
at Object.update (app.js:1:278099)
at e.computeSlot (app.js:1:288949)
at We (app.js:1:282043)
at new e (app.js:1:287918)
at e.applyTransaction (app.js:1:288904)
at e.get (app.js:1:284047)
Cash register
Implement an Obsidian plugin called "Cash register".
* It will use a CodeMirror editor extension.
* Look for strings like "<emoji> = $<price>", e.g. "🍰 = $4". These strings specify the prices of
items. Underline these strings and track the mapping between emoji and dollar values.
* Look for lines consisting of only one or more emoji in sequence. These lines represent sales.
Without modifying the document, display the total dollar value of all the emoji on the line in blue
text as a decoration at the end of the line.
* Look for a line which says "Total: ". Without modifying the document, display the total dollar
value of all sales lines in blue text as a decoration after that string.
---
Great! The price and sales lines seem to work, but the "Total: " line doesn't show a decoration. My test document:
🌱 = $3
🍰 = $4
🌱🍰🍰
Total:
Timekeeping
> Implement an Obsidian plugin called "Timekeeping".
* It will use a CodeMirror editor extension.
* It's an interactive timekeeping tool for plaintext documents.
* Each H1 heading in a document represents a separate project.
* Without modifying the document, add a "start" button widget to the end
of each H1 heading line.
* When the start button is pressed, it inserts a line immediately below
the heading like "13:11 -" (where 13:11 is the current time).
* Anytime an open time range like that ("HH:MM -") appears, display the
current time in blue text as a decoration after the hyphen, without
modifying the document. Display a "stop" button after that. When the
stop button is pressed, the current time is inserted with a space after
the hyphen, so that it will read e.g. "13:11 - 13:30".
* Anytime a closed time range ("HH:MM - HH:MM") appears, display the
total number of hours in the time range (rounded to at most two decimal
places) in blue text as a decoration after the time range, without
modifying the document.
* Anytime the string "Total:" appears, display the total amount of time
in the closed time ranges between that position and the previous H1
heading--i.e. the total time worked on that project.
* When you think you're done, build the plugin and fix any errors.
---
> Looking good. Just one bug: if I have two projects, and I click the
first project's "start" button, the new entry is inserted below the
second project's heading.
Appendix 2: Intermediate-level English specification example
For the Timekeeping example, I asked Claude Code: “Think carefully to produce a detailed outliner-style explanation of this system's behavior. It should specify the behavior clearly enough that a user can understand its functionality and interaction design in detail. But assume the user is non-technical, a non-programmer, so don't surface implementation details. Keep it at the level of conceptual and interaction design.”
I’ve included its result below. It’s perhaps a decent start, in that it seems complete. But it’s quite a lot of fairly undifferentiated text to read, even for this relatively small plugin. I don’t know that this would scale very well to a system with moderate complexity.
Purpose
- A visual agenda time calculator that automatically transforms meeting
agenda items into a scheduled timeline
- Helps you plan and visualize meeting schedules without manual time
calculations
Core Functionality
1. Start Time Detection
- Recognizes a time in 24-hour format (like "14:30" or "19:00") as the
meeting start time
- Visually highlights the start time with a blue underline
2. Agenda Item Processing
- Identifies lines formatted as "[duration] minutes [description]"
(e.g., "30 minutes Introduction")
- Automatically calculates start and end times for each agenda item
- Builds a sequential timeline based on the durations
3. Visual Transformation
- When viewing notes: replaces duration text with calculated time
ranges (e.g., "7:00 PM - 7:30 PM")
- Shows times in 12-hour format with AM/PM indicators
- Maintains the original description text
4. Interactive Elements
- When editing (cursor on a duration): shows the original text (e.g.,
"30 minutes")
- Displays a helpful tooltip with the calculated time range
- Highlights the duration with a blue underline while editing
User Experience
1. Creating Agendas
- Write a start time in 24-hour format anywhere in your note
- List agenda items with durations (e.g., "45 minutes Discussion")
- The plugin automatically calculates and displays the time schedule
2. Editing Agendas
- Original text remains when editing for easy duration adjustments
- Timeline automatically recalculates when durations change
- Non-destructive approach preserves your original markdown
3. Viewing Benefits
- Clear visual representation of meeting flow
- Easy identification of when activities begin and end
- Helps with time management and meeting planning
2025-03-31 22:10:01 +0000 UTC
View Post
This talk gives a high-level overview of my work on tools for thought, and some of my design principles for working on such tools. It's adapted from an "overview" talk of sorts I was invited to give last month for a local group.
Much of the work shown here will likely be familiar to those who have followed my work for years; for those who are newer to my work, this will provide a quick survey. But for both groups, there will be new material in the discussion of design principles.
2025-02-28 15:40:14 +0000 UTC
View Post
Joe Walker has a wonderful podcast doing deeply-researched interviews, mostly with people who haven't done long-form interviews before. To prepare for one of these interviews, Joe will spend hundreds of hours reading, studying, and writing. Then, a few weeks later, he'll shift to a totally different domain. So along some strange axis, he's a sort of Olympic athlete of learning.
In this video, I interview him about his learning process. We look at some of his knowledge management systems. There's a lot of spaced repetition discussion. I hope this surfaces some useful tacit knowledge!
2025-02-09 19:00:09 +0000 UTC
View Post
Something a bit different this month: a look inside a piece of my design process. In this video I talk through of a recent design for looking at an overview of all my annotations on a book. I discuss its influences, criticize it, and show an iteration.
2025-02-01 07:06:30 +0000 UTC
View Post
Since many of you are new here (hello! thank you!), and since Patreon's navigation is quite cumbersome, I thought I'd gather my big 2024 projects in one place. In roughly descending order of fidelity:
How Might We Learn?, a vision talk—on making it possible to learn anything by "just diving in", rather than through abstract study segregated from authentic use.
Latticework, an augmented sensemaking system with Matthew Siu, winner of my first research fellowship. New design ideas for transclusion and backlinks.
Exorcising us of the Primer, on the deeply tantalizing vision of Neal Stephenson's Young Lady's Illustrated Primer, why our field must move beyond it, and what that could look like. I've been trying to understand this well enough to write this essay for many years—very gratifying.
What does spatial computing want to become? The Vision Pro was released in February. I spent the month building many prototypes, then wrote this essay about the medium and hosted a workshop on it with Gray Crawford.
And two large new projects well underway:
I'm in the airport now, returning from a reflective and energizing retreat. I hope 2025 is beginning beautifully for you all!
2025-01-12 08:07:06 +0000 UTC
View Post
A few months ago, at a conference dinner filled with academics and founders, my table-mates were gamely trying to understand my work. One asked: what am I aiming for with my output? Am I trying to influence others—e.g. by publishing, like the academics? Or am I working towards practical impact—e.g. by making ideas directly usable, like the founders?
Somewhat petulantly, I replied that I’m mostly aiming for the transcendent satisfaction of creation. There’s an expressive element to the work. I have some instinctive yearning; I want things to be otherwise in some particular way. It’s extremely gratifying to condense an idea from the ether which manifests those impulses. There’s also a real thrill of discovery. There’s a meaningful pattern hiding in the texture of reality; I’ve traced its seams, made it dance.
But my answer wasn’t quite true. If I had all that creative joy in a secluded cabin, permanently disconnected from the outside world, I wouldn’t be satisfied. I want to help bring alien cognitive and creative powers into the world—not just into my own imagination. If my ideas don’t eventually become widely usable, that doesn’t seem like success. It seems more like carefully planting a beautiful orchard, then failing to water it. Or like watching those trees grow, then discovering that they can’t survive the local climate after all. Neither outcome would be gratifying.
These answers highlight a difficult tension. The work I find most satisfying focuses on invention and discovery. Yet invention and discovery aren’t enough. I want my ideas to impact people’s lives—which involves a great deal of a very different kind of work. As I look back at this year and plan the new one, I’m wondering anew: what should I be aiming for with my output? What is my theory of change?
SuperMemo and Sketchpad
Consider SuperMemo, the first computational spaced repetition system. In practical impact, it has been largely eclipsed by Anki and Duolingo. But SuperMemo got the ball rolling. I’d be very satisfied if I were its inventor, Piotr Wozniak.
SuperMemo began with a few years of paper prototypes as part of Wozniak’s masters thesis. Then in 1987, he implemented a simple initial version in DOS that he and his friends could use. Wozniak wasn’t entrepreneurially minded, but two of his schoolmates were. They would eventually develop and market SuperMemo to the world. If Wozniak had stopped with his master’s thesis and those early prototypes, I don’t think that Anki, the mnemonic medium, or any other modern memory system would exist. It’s true that the testing effect and spacing effect are widely studied in educational psychology, but to this day there has been little effort to translate those effects outside the approach that Wozniak established. I would not be satisfied as this alternate-world Wozniak, with the system deployed only to me and a few friends.
Consider Sketchpad, Ivan Sutherland’s groundbreaking 1963 CAD system, and the first realtime interactive GUI. It was his PhD thesis. Sutherland didn’t develop or market Sketchpad much further. But I’d certainly be satisfied if I were in his shoes. Sketchpad was wildly influential. It’s one of the most important works in the history of personal computing.
What’s the difference here? Why was merely creating a prototype and publishing about it enough for Sketchpad, but not for SuperMemo? One answer is that Sketchpad’s ideas were much more general, and they slotted nicely into a swell of activity which was already taking place. Another answer is that Sutherland was at the MIT Lincoln Laboratory, along with many of the best minds in computing. He was supervised by Claude Shannon and Marvin Minsky. Meanwhile, Wozniak wrote his thesis at a Polish university where it was hard for him to regularly access a computer—the impoverished country was struggling under communist rule.
These examples illustrate that if my theory of change centers on influence, my success will depend a great deal on my environment, on who’s getting influenced, and on synergy between our work.
Latticework and Quantum Country
Consider Latticework, the augmented sensemaking system that Matthew Siu and I published earlier this year. That work isn’t Sketchpad or SuperMemo, by any means—but I’m proud of it, and I’d like to see its ideas get into people’s hands. Latticework hasn’t yet been terribly influential. I don’t think that’s because of the quality of its ideas: they have limitations, but other more limited designs have gotten more attention.
I think the main problem is that it’s not real enough. We built a prototype, but its edges are very rough, so we put it in a footnote and haven’t encouraged people to use it. If we want those ideas to go anywhere, the people we need to influence are mostly the kind of tinkerers who will be motivated by something they can get their hands on. That doesn’t mean our article is useless: if one of those tinkerers was inspired to improve upon our prototype, they’d benefit enormously from all that background thought and explanation. But we have to get someone to that point first.
Quantum Country makes for a good contrast. It presents a new idea in a tightly-scoped context. It’s easy to try the system out and get a feel for its potential. And if you’re interested in the more elaborate design thinking behind the system, we made that available too. Quantum Country has been relatively influential—at least half a dozen founders have told me that their projects were inspired by it. I think the system’s tangibility has been key to its relative success.
A lesson I take here is that my systems are much more likely to be influential if they’re instantiated in a prototype that people can meaningfully use.
Quantum Country and Orbit
Quantum Country also illustrates an anti-pattern for my work. Since the mnemonic medium seemed to be working well, I wondered: does this generalize? Can I apply these ideas to a wide range of texts? So I built Orbit, a generalized platform for the mnemonic medium. Then I built tools so that authors can easily add Quantum Country’s interleaved flashcards to any online book.
This was a mistake. Once I had the system working, and once I’d built relationships with authors to write a few texts using it, it didn’t take long to find serious flaws in the core design of the mnemonic medium. But because I pushed to generalize, it took me a year to get to that point. I could have identified those problems in a fraction of the time if I’d just built another one-off or two like Quantum Country.
I think this was a matter of absorbing too much San Francisco culture. As Michael Nielsen has put it to me, there’s an oppressive “cloud of people wanting startup answers to questions.” I wasn’t thinking about a startup, but I was thinking about scale, about generality, about “platforms”. It meant a lot of wasted effort, and technical debt that I still carry.
A lesson I take here is that I must carefully separate the tasks of discovery, influence, and impact. Quantum Country was neatly self-contained and legible enough to influence people. It was a one-off, so it could never have direct impact—but figuring out how to generalize it was a problem of discovery which, by building scalable infrastructure, I approached with the tools of impact.
When to build for impact?
Suppose that I followed my own advice, and I built one-offs to discover the various problems with generalizing the mnemonic medium. Then suppose I built new prototypes which solved those problems in a legible and influential way, like Quantum Country. What then? Would I want to switch to building generalized infrastructure?
My instinct is to try hard to get the impact I want without doing that. I sympathize with Kevin Kelly’s policy: try hard to give away an idea—to get others to pursue it—before committing to it myself. I’ve built mass-market products before. That work often has little in common with the practice of discovery and invention that I find so rewarding, and that I can more uniquely do. My ideal would be to give a concept just enough clarity and momentum that it takes on a life of its own, with the help of others of course, and snowballs into something impactful.
I can imagine exceptions, of course. If I’m convinced that I’ve invented something transformative, but no one’s deploying it properly, I may need to push it over the finish line myself. Some ideas—like a social network—can’t really be properly explored without lots of production work. Some ideas—like Edison’s power grid—are about scaling, and so can only be developed in the context of scaling. And maybe, for the right idea and the right group of people, building a version for the world might seem so fun that I’d be happy to dive in despite my usual preferences.
————————
Thanks to Taylor Rogalski for a suggestion congruent to the “cabin in the woods” thought experiment in the introduction.
2025-01-01 04:51:31 +0000 UTC
View Post
I was traveling during publication of November's letter, without my recording equipment. Now that I'm back home, I've recorded the audio version.
Kyoto is very beautiful in the fall.
2024-12-08 02:07:15 +0000 UTC
View Post
(audio version here)
I want to create a new kind of spaced repetition system—a new kind of enabling environment—with a different central primitive.
The need for a new central primitive
The central primitive of existing spaced repetition systems is the flashcard. If you want to remember some fact, you transform it into flashcard form and add it to your system. The system doesn’t know about the fact you want to remember, and it certainly doesn’t know how that fact connects to other things you’re thinking about. The system knows about flashcards.
When you want to learn a simple fact, like the capital of a country or some foreign-language vocabulary, that arrangement works well enough. But spaced repetition systems are important because they can be used to support all kinds of learning, to help you internalize all kinds of ideas more deeply. The trouble is that—in my experience—the further you get from that simple fact–flashcard correspondence, the less well these systems work, and the more difficult they are to use.
Michael Nielsen and I have argued that “memory system” is a more powerful framing for this problem space than “spaced repetition system”. The latter phrase emphasizes one tactic. Cognitive psychologists have characterized a wide variety of deep facts about human memory, and we should exploit as many as possible. But another reason to prefer “memory system” is that this phrase is closer to what you actually want. You want a system which causes you to robustly remember. (And in fact, you want even more, as we’ll soon discuss.)
For robust memory of vocabulary, mere spaced repetition may be enough. But for more complex ideas, I find that flashcards alone often produce a brittle memory. Once I’ve seen a flashcard a few times, the specific wording will often act as a strong cue. I’ll remember the answer not because I’m actually processing what the words mean, but through kneejerk pattern matching. An effective memory system would help me build robust memory by presenting the idea with different cues, from different angles, through different connections, so that I encode the memory in many ways.
We can zoom the frame back again. Most of the time, I don’t just want to remember; I want to learn. I want those topics I’m practicing to be alive and functional. I want to be able to apply the material flexibly and fluently. I want my understanding to deepen over time. I want to see new implications and have new ideas. Piotr Wozniak, inventor of the modern spaced repetition system, has written that you should learn before you memorize. But as I see it—both from experience and from my understanding of cognitive architecture—this is an ongoing parallel process. As we exercise and elaborate material we’ve learned, we make new connections and understand more deeply. Memorization is intertwined with that process, not something that happens before or after.[1] And so, even the term “memory system” is inadequate. Our aspirational system needs a broader name—one reason I’ve often used the term “enabling environment”.
All this is awfully cognitive. Much of what I put in my spaced repetition system isn’t. I’ll capture an insightful observation from a friend over dinner, or a beautiful quote, or something someone did that surprised me. The point here isn’t really to memorize. It’s to be changed—to metabolize an experience so that I feel or act differently in the future. But that point is also true of much of my more traditional learning. When I study music theory, I don’t just want to learn; I want to feel and act differently when I play music. When I study a historical figure, I don’t just want to learn; I want to add their way of looking at the world to my own set of lenses, so that I experience the world differently. So that, in some small but real way, I become a different person.[2] A system focused on this would be an “enabling environment” in a much deeper sense.
An alternative primitive: situated ideas
With those lofty goals articulated, I can now state my central complaint about flashcards: they’re static. They’re “dead fish”. Robust memory requires varied cues and connectivity; robust learning requires rising depth and complexity; robust metabolization requires contextuality and vividness.
To bring dynamism to these systems, I think we need new central primitives. Today’s spaced repetition systems are structured around flashcards—adding them, organizing them, scheduling them. I think we need to move upstream. If we want our review sessions to vary and deepen and connect over time, we can’t just supply a static task. In fact, if the goal is to support transfer learning, we can’t write the task ourselves at all: transfer requires surprise. We need to somehow point to the idea which inspired that task, situated within the context which inspired us, so that a stream of varying and deepening tasks can emanate from it over time.[3]
If I’m reading a book, a natural way to point at an idea is by literally pointing at some part of the prose, perhaps with a highlighter, and perhaps with some marginal comments about what we found meaningful. If I’m reflecting on a conversation or experience, the same approach might work for my journal or notes.
More concretely, instead of question/answer fields, the primitive I have in mind—a ”situated idea”?—would store:
a pointer to relevant context, with full text; e.g. a book, a journal, etc.
a range (or ranges?) within that context representing the idea to be metabolized; i.e. like a highlight you made in your book
an optional extra comment clarifying your intent or interest; i.e. like marginalia you wrote next to your highlight
And then the system would synthesize appropriate activities over time, based on that input, and on connections with other situated ideas in related contexts.
All this roughly mimics the work that professional instructional designers do: given a set of “knowledge points” introduced in a text, they construct a series of activities (worked examples, exercises, reflections) and present them in varying and deepening ways over time. In some cases, that sequence may even respond to your performance, though few courses will reinforce your memory as effectively as a spaced repetition system.
The key difference in the system I’m proposing is that it shifts the locus of control from the instructional designer to the user. That was the most important lesson from my work with the mnemonic medium these past few years: self-motivated adult readers rarely want to passively study whatever an author tells them. People want different things from a text. They have different goals. One learner wants to learn the theorems; another wants to be able to prove them. They have different backgrounds. One learner will need a lot of reinforcement in one spot; another in a different spot. And they’re interested in different subtopics. One learner will skim a section which another will eagerly devour, and vice versa.
Those insights led me to last year’s experiments with a “magic” highlighter, and the delightful frame: what if we could make highlighters actually do what people wish they did? When students are polled about their favorite study practices, the most common responses are usually re-reading and highlighting. Meanwhile, if you make a list of the most effective study practices, those two methods are usually at the bottom. But highlighting feels great: it’s a way of indicating interest, a way of participating, of literally making your mark on the text. People imagine that highlighting will help them internalize the material. It (mostly) doesn’t. But maybe it could: you could use a special highlighter to add “situated ideas” to your library, and then the system would ensure that you’d internalize that material.
Alright, okay, fine: machine-generated retrieval practice tasks
When I started writing about this idea in 2022, I thought about it very differently. Large language models (LLMs) couldn’t generate good retrieval practice tasks for ranges of a source text. In fact, as of late 2024, LLMs still can’t generate good retrieval practice tasks for ranges of a source text. But I’m now collaborating with Ozzie Kirkby on a project to fix that.
Lots of people have made small attempts on this problem. None of the solutions I’ve tried performs well enough to be interesting. I’ve resisted the problem myself because I don’t like the dominant motivating frame: efficiency, ease, accessibility. Those aims feel too much like premature optimization to me. I want a difference in kind. It would often be nice to avoid the cost of flashcard-writing, yes. But the main thing I want is for these systems to work better—to produce more robust memory, deeper learning, richer metabolization.
The opportunity for a new primitive is a much more interesting frame. I want something like a spaced repetition system, but where review activities vary and deepen and connect over time. Such a system would necessarily require extremely expensive content-by-content labor, or machine-generated tasks. And only machine-generated tasks afford the possibility of activities tailored to idiosyncratic personal contexts.
So, machine-generated tasks it is! Ozzie and I have been working together on this for about six weeks, so it’s still quite nascent, but I’d like to share a bit about our approach and learnings so far.
First: today’s models aren’t automatically good at generating effective tasks for content beyond simple facts, at least with mere prompt engineering and N-shot examples. This makes sense. Very few people use memory systems for anything other than simple facts; there would be few examples in the training set. Problem sets and exercises, yes, but retrieval practice tasks are a very different dialect. And, even with a specific highlight and the full source context, the models are quite bad at targeting—deciding what a reader might want tasks about within that highlighted range.
That said, the models sometimes generate well-constructed, well-targeted tasks. So we’ve begun by training a classifier to score the quality of tasks. The theory is that it’s fine—at least initially—if the models rarely produce good output, so long as we can filter out the bad tasks. And then, with a way to evaluate different models and architectures over time, we can “scale” our relatively rare task-writing taste.
As with most machine learning pipelines, data is a limiting factor. We began by hand-constructing a data set of several hundred “situated ideas”—source content, a highlight range, a retrieval practice task, and a classification score. We’ve augmented that data with manually-scored LLM-generated tasks (both good and bad), so that now we have more than two thousand samples. So far, the classifier performs well within the same source text (on held-out samples) but generalizes much less well to out-of-sample source texts.
And so, we’re collecting more data. We’ve created a nice workflow:
as we read a PDF or web article, we highlight it with Hypothes.is
a bot replies to the highlights with proposed tasks
we reply to each of those replies with a machine-readable “grade”; freeform critical comments; (optionally) a “corrected” rewrite of the same task; and (optionally) tags for failure modes we’ve noticed (e.g. focusing on trivial details)
when none of the machine-generated tasks is good, we reply to the highlight ourselves with the tasks we would have wanted
The classification scores and hand-written tasks from this workflow are then fed back into improving the classifier, which in turn improves the output. We’ll also feed the critical comments and hand-improved examples into N-shot or fine-tuning material for task generation. Our hope is that over time, we’ll accept more of the model’s outputs, which make our manual feedback cheaper, which will in turn let us collect more data, and so on. And once the output is somewhat reliable, we can crowdsource labels to accelerate the process more.
What’s next
My long-term goal here is to move us towards a more dynamic learning system as I’ve described. But in the short term, I expect this project will be quite useful to users of existing spaced repetition systems. Our plan is to work towards an interface which would allow users to conveniently highlight texts and import the resulting tasks into Anki, Mnemosyne, and other tools. That’s assuming our pipeline ever performs well enough. In any case, we’ll certainly open-source our data set and code.
Even within the limited frame of traditional flashcard generation, good integration with existing systems will eventually require more invasive changes. In an ideal integration, the user wouldn’t evaluate the machine-generated tasks while they’re reading. They would just read, and highlight, and then later review. The trouble here is that sometimes a given highlight could reasonably point to several distinct ideas—and you probably don’t want all of them. Users will need to give the system feedback on targeting, on their desired level of depth, and so on.
The easy way to implement that is to make the user evaluate machine-generated tasks during their reading session, but I can tell you from experience: that’s unpleasant. It would be better to provide feedback at review time. That will require more complex integration. And longer-term, if I get my way, it wouldn’t make sense to have the user evaluate and approve the machine-generated tasks, because the task will change with each review. It’s a different conceptual model.
For this kind of dynamic review, machine-generated tasks are necessary but not sufficient. Some early experiments suggest that LLMs can pretty reliably generate simple surface variations of known-good tasks, to avoid the pattern-matching problem. But we want tasks which deepen, connect, and recontextualize over time. Those will need separate investigations and pipelines.
————————
My thanks to Ozzie Kirkby for joining me on this adventure, for insightful conversation on these topics, and for comments on a draft. Thanks, also, to David Holz for pushing me to think about classifiers.
————————
[1] I’m probably over-simplifying Piotr’s view here to make the point. I expect he’d agree in broad strokes with what I’m saying. He might (fairly) say that those new connections and deeper understandings are actually different knowledge atoms—so, what’s actually happening is a sequence of related learn/memorize phase pairs. At a high level, from the perspective of system design, my point stands: learning and memorization are intertwined as part of an ongoing process; a good learning system should support that process.
[2] This frame has some clear connections to the project I described in last month’s letter, Towards scalable blip cultivation.
[3] This idea has been percolating for a while; for earlier related discussions, see:
2024-11-29 01:17:49 +0000 UTC
View Post
I’ve been collaborating with my friend Taylor Rogalski on an experimental new programmable attention system. Our work is still quite early, so in lieu of screenshots and prototypes, I’d like to share some of our high-level conceptual design thinking.
The problem
My life is filled with neglected blips of salience. They need my attention to thrive, but they don’t fit into a to-do list or my main creative process. So, mostly, they get lost. A few examples:
Questions: I re-watched Bret Victor’s “Inventing on Principle” and wondered: insofar as I’m inventing “on principle”, what are those principles?
References: On a stroll, Gena Gorlin suggested that psychological research on the “stages of change theory” might be relevant to my work on timeful texts.
Insights: While journaling, I noticed my implicit belief that great work needs righteous conviction and grand vision. But my friend Rob Ochshorn has a playful, loose attitude and yet consistently inspires me. Maybe it’s safe to be more playful.
Ideas: Chatting with Matthew Siu about use cases for Rewind-style continuous screen recorders, an idea emerged: a button to show the recorded moment before any browser tab was opened, so you can see “where it came from”.
Quests: I’ve stumbled a few times on drawabox.com, a website for learning to draw. Each time I think: this looks cool, but also enormous and overwhelming. I don’t immediately know how to engage.
Yearnings: I want some regular venue to sing harmony with talented people, but I don’t want to perform gigs or maintain a repertoire—just singing new music together, each time.
I know that if I put these blips in my to-do list, they’re probably just going to rot. Maybe I could move these blips along if I gave them my deep creative attention, but that limited resource is usually committed to more coherent projects.
Part of the trouble is that I often don’t know how much I care about these blips. A quickly-jotted scrap like “look into stages of change theory” doesn’t (yet) mean much to me, so I’m hesitant to give it deep attention. But I don’t understand my relationship to this blip enough to just drop it, either. It might become important, if I engaged a bit. Meanwhile, it’s trapped in a frustrating zone of ambiguity.
Complicating matters, many of these blips aren’t really “completable” in a single sitting, no matter how focused. For example, consider “what are my principles?”, or “maybe it’s safe to be playful” or “try drawabox.com”. These call for repeated engagement over time. But it’s tough to effectively orchestrate my attention that way.
In summary, blips are hard to handle because they often:
lack clear next steps
require creative brainpower
may or may not be important to me
aren’t relevant now, but may be in some future context
can’t be addressed in one sitting
I have some attentional systems that work well: I know how to do big, focused creative projects; I know how to maintain routinized habits and interests. But where do those things come from? My sense is that they often grow out of the blips we’ve been discussing—an idea, a question, a reference, a yearning. But it’s remarkably haphazard. When the blip arrives with enough clarity, or coincides with a stretch of free time, or repeats itself many times, it can grow into something real. Otherwise, it ends up haphazardly dropped, but not because I decided to drop it. All this leaves me with a sense of loss.
One obvious parallel here is to spaced repetition memory systems, which address the haphazard nature of remembering. If I add something to my memory system, it’s no longer up to chance. I can be quite confident that I’ll remember it. Likewise, for well-specified operational tasks, Getting Things Done-style systems (usually) give me confidence in place of what feels like chance.
So: how might we construct a working attentional system for these blips?
Our approach
We propose:
a feed which resurfaces blips over time
with lightweight activities to move them forward
in low-energy Twitter-scrolling moments on your phone
until you have enough clarity to “do” the blips, or let them go
I’ll illustrate with an example. Suppose I quickly capture “look into stages of change theory”. Then:
The next day, I see that blip in my feed with a task suggesting I set a timer for two minutes and ramble into the microphone about how this came up, and why it might be interesting. My speech is transcribed, cleaned up, and saved with the blip.
A couple days later, that blip resurfaces with a summary of the theory from Wikipedia and links to a few articles. I read the summary and an abstract. Then I note that I’m particularly interested in the interventions, and how they might apply to spaced-repetition systems that support self-help books.
A few days later, that blip resurfaces with a list of popular interventions. It suggests that I mark any that seem particularly relevant.
Over the next few weeks, during idle periods, I read little relevant bits of papers describing those interventions, ramble observations into the microphone, and slowly form a high-level understanding of how this theory might fit into my work.
Finally, I build enough clarity that I feel comfortable letting go. I tell the blip to snooze for six months, or until I mention “timeful texts” in another blip.
It might seem strange that this example ends with inaction. But this is actually a very real success. In idle moments of time across a few weeks, I came to understand enough about this theory to see how it might fit into my work, and to draw connections in the future. I accumulated that understanding in notes I can use later. I know that I’ll see this blip again if it’s likely to be relevant, so I feel safe letting it go. It’s not dead: it may well turn into a key part of a future project. By contrast, if I’d added this blip to my to-do list, I doubt any of this would have happened.
Let’s see how the system addressed our earlier list of problems with blips. We said they:
lack clear next steps
require creative brainpower
may or may not be important to me
aren’t relevant now, but may be in some future context
can’t be addressed in one sitting
Though this example could be addressed in one sitting, I probably wouldn’t want to—it’s not important enough.
So the system smeared the task out over time.
Compared to naive spaced repetition
I’ve tried to solve this kind of problem before. For example, I’ve created a spaced repetition-style writing inbox, so that I can capture a blip like “what principles am I ‘inventing on’?” and repeatedly return to it over time. It’s a nice idea. But in practice, I found that when a prompt like this came up, I’d rarely want to engage.
One problem was the sameness of the prompt. I’d think “ugh, this again.” Faced with the same blank page I saw last time, I often don’t have anything new to say.
Another problem was decontextualization. I’d capture some idea or question while I was thinking about a particular problem. But they’d often resurface when I wasn’t thinking about that problem anymore. I’d feel emotionally disconnected, and wouldn’t feel like engaging.
A related final problem was the “endgame”: I’d develop my idea a bit, and then I’d reach the point where I was done with it for the moment. I didn’t want to see it regularly, but it didn’t feel safe to just archive my note, because then I felt I’d never see it again.
Two key ideas in our new system help address these problems.
The first is the system’s scaffolded microtasks. When a blip resurfaces, it’s not just showing you the same old prompt again. The system offers some suggested next actions, and possibly some new information to react to. As much as possible, we make those actions have time limits (“set a timer for two minutes and…”), a clear sense of the progress you’ll be making (“let’s make a list of examples…”), and clear goals (“add at least three examples…”). We’re accumulating a library of microtask patterns, and connecting them into playbooks that make sense for different types of blips.
The second key idea we call contextual resurfacing. When a blip shows up in your feed, it includes links to other related blips—possibly old blips which wouldn’t otherwise appear for a long time. Sometimes those related blips will actually be part of a suggested microtask (“how are these related?”). You can also explicitly specify topics which should trigger a snoozed blip to resurface. These elements aim to help the decontextualization and endgame problems by making it easy to double down on a juicy context, or to snooze a blip until you’re thinking about something in that context again. The net effect is a feeling of safety: you see each blip resurface many times, and you know you’ll see it again.
Both of these ideas depend on recent advances in large language models (LLMs). Still, we take a different approach to LLMs from most “productivity”-adjacent systems. Such systems tend to focus on using LLMs to do the task. But LLMs can’t really “do” any of the example blips I mentioned. As Taylor has charmingly put it: where other systems are about you prompting the LLM, this system is about the LLM prompting you.
What’s next
Taylor and I have been prototyping this system in classic Wizard-of-Oz style: we send our blips to a special email address. Then we each “play AI” for the other person, sending replies to resurface blips with microtasks. The recipient can “do” the microtask by replying, and so on. This has been remarkably generative! We can identify and reuse patterns which we imagine building into an LLM pipeline, but we can break out of those patterns whenever we want to experiment with something new.
I mention our approach here because I meet many technologists who immediately jump to building LLM infrastructure, then get stuck. Unless your domain needs realtime interaction, there’s no point in automating something until you know it actually solves the problem. It’s much easier to iterate on the conceptual design with manual human interaction. Once you have something which works so well that you want more of it than you can provide manually, that’s a good time to try introducing automation.
We’ve now reached that point with our prototype system. In many cases, we’re quite excited about how the system’s design has helped us push our blips forward. But as the “database” grows, and as we want to throw more material into it, we’re having trouble keeping up with the increasingly intricate scheduling and contextual resurfacing demands. We’ll soon need to introduce elements of automation.
We also have a huge number of open conceptual design problems to address, including many relating to system details not described in this letter. It’s all a bit overwhelming, but I’m thrilled that we’ve finally broken through to some promising uncharted territory for programmable attention.
------
My thanks to Taylor for his generous spirit of collaboration on this project, and for helpful suggestions on this text.
2024-11-01 00:28:19 +0000 UTC
View Post
A 1994 interviewer asked Alan Kay what he’s found most surprising about technology in the classroom. He complained that “computers are treated much more like toasters, [with] predefined functions… or running packaged software, and less as a material to be shaped by students and teachers.” He’d originally envisioned the personal computer not as a vessel for monolithic tools, but as a new medium for expressing and manipulating dynamic systems. Thirty years later, app stores and cloud services have further entrenched the “toaster” paradigm. The functions are predefined, as in 1994, but most data now lives in sandboxes and proprietary cloud databases, where it can’t easily be manipulated across app boundaries.
What would it be like if your computer were, instead, more like material you could “shape”? One metaphor is that it would be more like a wood shop. You’d assemble materials and tools from various sources, arrange them just as you like, and combine them to express what you have in mind. Note that in a workshop, tools can be modified: if a tool’s grip is too small, you might wrap some tape around it. New tools can be improvised: you might make a jig to repeatedly execute a tricky angle. And if you like, you can bootstrap your own machine shop from a simple charcoal foundry.
Apps rarely work like this. If you like Zotero’s metadata inference feature, but prefer EndNote’s inline citation interaction, there’s no good way to combine those tools from those two apps. Each has its own database. You need to pick a silo. If you prefer Notion’s editing interface, but your collaborator prefers Google Docs, there’s no good way for each of you to write with your preferred tool. You need to pick a silo. That silo is generally produced by an industrial manufacturer, and it cannot be modified except—if you’re lucky—through proprietary and inevitably limited plugin systems.
There have been many efforts to break us out of these silos. People sometimes complain that we wouldn’t be in this mess if we’d stuck with Smalltalk’s original vision, or if we weren’t beholden to business incentives. But I don’t think it’s so simple. There are some serious unsolved conceptual problems at the foundation of personal computing—in the language of the medium. I don’t mean that in the sense of programming languages, as we usually think about them. Rather, how can we express dynamic systems so that they’re easy to read, write, modify, and combine—even when collaborating with others? A “language” for expressing dynamic systems doesn’t have to (just) mean programming: consider the classic UNIX philosophy of small programs connected through text streams.
Two (very different) modern attempts
In 2015, Clemens Klokmose published a remarkable paper which has instigated a new wave of effort in this space. His system, Webstrates, was built on a key practical insight: the web’s Document Object Model (DOM) already contains primitives for expressing rich content and computation; could we modify those elements to create the kind of fluid medium we want? He presented a provisional answer through a working system and several dramatic demonstrations, which have since led to dozens of elaborations and conceptually inspired variants.
To me, those last two sentences demonstrate why Klokmose’s work has catalyzed a generation of young researchers: the ideas are pragmatic, and very real. This movement—which I’ll imprecisely label “malleable software”[1]—is not interested in boiling oceans. They want a practical solution which can evolve gracefully from computers, operating systems, and software paradigms as they exist today. They build systems you can download and use, right now.
Meanwhile, Bret Victor and colleagues at Dynamicland would like very much to boil the computing oceans—to give us a glorious, if less familiar, new start. Their system is called Realtalk, and their aspirations are about much more than breaking us out of siloed applications. Realtalk is a physical, communal computing environment. It’s about interacting with real, tangible objects, with your real senses and body, in a real space, side by side with other real people. Realtalk programs are malleable, composable, and interoperable, but that’s in part a matter of convenience. The true aspiration is to enable universal literacy in a new medium: all users should be able to author anything in the environment, from scratch, including the operating system. Indeed, these users should have no need for a “software industry”. No few sentences can adequately summarize this project. This paragraph’s job is only to convince you to spend some time with the material on the (new!) Dynamicland web site, if you haven’t already.
Realtalk does exist, but it is not at all pragmatic. It is pointedly uninterested in a smooth evolution from software workflows as they exist today. You can’t download it or use it. In fact, there isn’t (yet) a published technical description of the system. I sympathize with Clemens Klokmose when he wonders aloud “how I (and others) could contribute to the Dynamicland vision without spending a lifetime creating a poor imitation.”
But I want to suggest that it’s worth inverting the question and engaging in some deliberate sacrilege. How can Dynamicland’s ideas contribute to much more pragmatic malleable software efforts? What if we ignore Realtalk’s noble contributions to tangible computing, social computing, liberatory computing, etc etc, and—playfully relishing the hideous myopia—strip-mine it for ideas[2] we might use to bring malleability to more traditional software environments?
This letter is an attempt to get the conversation started with a few initial sketches.
Loose coupling with a global reactive blackboard
One weak class of “malleable software” many of us use every day is the browser extension. For example, you might install an extension which hides Twitter’s “like” and “retweet” metrics, or which adds Spotify links to YouTube videos which include music. These extensions work by injecting custom Javascript or stylesheets into a web page. The trouble here is that the modification usually depends in very precise ways on the deep structure of the web page. And so, if the web site changes, the browser extension can easily break. For the same reasons, you can’t straightforwardly use the Twitter-metric-hiding extension on Mastodon.
In their thesis on malleable software (p86), Memphis Tchernavskij points out that these same kinds of issues apply to Webstrates and many related projects. For example, suppose that you’re using a digital sticky note board to manage a project. You’d like to make a tool which lets you point to a sticky note and quickly add links to any related email conversations you’ve had. Ideally, you should be able to use that same tool with, say, papers you’re reading.
With Webstrates and other typical malleable software techniques, you’d generally need to deeply couple your tool to each piece of software you want to use it with. That is, you’d need to teach your tool how to read and write a sticky note in the sticky note app, and a paper in the paper reading app. In Webstrates, you’d do that by directly encoding assumptions about the interface’s DOM structure. As with browser extensions, these coupling details are often not only brittle but invisible, so it can be difficult to understand what’s going on when things don’t work as expected.
Realtalk programs use an unusual strategy to communicate with each other: a global reactive blackboard. Programs don’t connect directly with each other. Instead, they make claims, wish for things to happen, and describe their behavior through game-like rules. The result is a flexible and loosely coupled ecosystem, in which it’s surprisingly easy to write composable tools. Programs don’t have to “reach into each other” when they can read and write from a shared blackboard. Here’s a brief overview created by Tabitha Yong; see this article from Omar Rizwan for much more detail.

Let’s return to the “related emails” tool for a moment. Here’s how you might implement that tool in a weakly-coupled fashion:
Sticky notes and PDFs and emails claim that their “textual content” is whatever it is.
When the tool is used on an object with “textual content”, it finds related emails using those claims and wishes that links to certain emails be displayed alongside the object.
When something wishes that related emails are displayed on an object, wish that a particular overlay interface be drawn alongside it. (That wish is handled by a system-level graphics layer). Note that this program can be swapped for another to alter the visual representation.
There’s still a sort of brittleness problem here: what if the sticky note program uses a different label, claiming its “body” is what it is, not its “textual content”? But this is easy to fix without actually modifying the sticky note program. You can just make another program which translates claims about a sticky note’s “body” into claims about a sticky note’s “textual content”. This strategy is similar to Tchernavskij’s “entanglers” (p91), but generalized to adapting any kind of data relationship between programs.
This approach seems to require boiling oceans and rewriting everything. But one could write, say, a Trello-specific adapter which translates its API into claims and wishes. Such adapters are cumbersome and somewhat limiting, but at least the ugliness would be (hopefully) isolated to the adapter, and the tool wouldn’t need to worry about it.
Composability through spatial arrangement
Another way that Realtalk achieves loose coupling is by making the spatial arrangement of programs a first class object. Instead of “variables” and “arguments” and “pointers”, programs and data often refer to each other by physically pointing to each other. This often lets users describe systems in much simpler and more direct ways.

By way of example, Omar Rizwan demonstrates a dynamic map with multiple “layers”, e.g. for demographics, transit, major roads, etc. He points out that if you were implementing this kind of feature in normal software, you’d make a “layer list” panel with visibility checkboxes, a rearrange interaction, etc. But instead, with Realtalk, each layer is a physical object, and he just rearranges them in physical space to get the result he wants. He points out that “you automatically get the operations of the physical world: placing and picking up objects, moving and grouping objects in space, pointing objects at each other, and so on.” I really value Realtalk’s tactility, but I think Omar’s points mostly apply to virtual objects on a spatial canvas.
Composability in Webstrates often means describing a path in the DOM from one program to another. Perhaps those technical and coupling-laden references could be replaced, at least in part, with spatial arrangement.
Giving programs spatial location might also help with another of Tchernavskij’s criticisms: that in web apps (and so in Webstrates), there’s no consistent relationship between the structure of an interface and the structure of its code. That is, if you want to modify something you see on the screen, it’s hard to find the code responsible for it. One way to solve that is to colocate visual representations with the programs which create them.
Physical objects are naturally malleable
One key “move” in Realtalk’s evolution has been to move knowledge out of the computer system, and into physical space. If you want to build a digital sticky note app, you’d typically make a database, where each row has some text, and a position, and then some code to display those items in particular locations, and to allow users to modify the text and the position.
But with physical sticky notes, you can just put them in a position, and you can just write on them. No code is required for that. You just need code for the parts which are necessarily dynamic—like, say, highlighting stickies that you haven’t touched recently. And then, if the user feels inspired to draw a little diagram on the sticky, you don’t need to add an “attachments database” and extra interface elements for drawing. The user can just start drawing on the sticky note. Paper is naturally malleable, in terms of its physical representation. Objects—in the sense of object-oriented programming—are usually not.
Translating this idea to a purely virtual context, I wonder about adapting something like FigJam—or perhaps something like FigJam with the dynamic ideas from Apparatus and Cuttle. Rather than writing code to draw an interface and its data, perhaps users can directly draw the interface and its data, and then enliven them with computation.
If you draw a bunch of sticky notes on a spatial canvas, you don’t need a separate database of sticky notes: the sticky notes are just objects, there on the canvas. They’re just rectangles with a fill color and a child text element, or a child vector drawing, or whatever else the user happened to add. If the user wants to add a program which highlights stickies that haven’t been touched recently, they can do that without a formal database concept of a sticky note: perhaps they could add a program to the canvas which matches top-level rectangles with an old modification date, and adds an extra highlight overlay to them.
Practical malleability requires small programs
Many end-user programming systems pride themselves on universal editablity: the user can modify any piece of the system, in realtime. But just because the user has the technical capacity to modify any piece of the system, that doesn’t mean they have the practical capacity. Leaving aside the barrier of programming itself, the central barrier is that even simple modern applications are often many thousands of lines of code. Bret observes:
C++ and Rust advertise "zero-cost abstractions". Their cost metric is tied to execution speed. But the cost we're concerned with here is related to simplicity, transparency, understandability, a grasp of the whole. The confidence of the user, the absence of myth and superstition. What are "zero-cost abstractions" here? What material are they built from?
Realtalk feels as malleable as it does in part because its programs are relatively short. One key reason that Realtalk programs are short is that the system moves much traditional interface complexity out of the computer and into the real world. We’ve discussed a few ways it does that. But there’s also some more traditional systems design insight: Realtalk’s reactive blackboard pattern simplifies much everyday event handling and state management.
Because Realtalk programs are short, the system can make the norm that an object’s program always be visible. You can’t make Chrome’s program always visible because it’s 23 million lines long. Visibility is a key component of understandability and practical malleability. I don’t know that we can translate that property literally to screens—they’re just so much smaller than a typical Realtalk workspace—but I do think that malleable software efforts must take these observations seriously.
In particular, I don’t see how one can get programs small enough using a typical Javascript-based programming system, as Webstrates does. Maybe it’s a matter of finding the right API, or creating a dialect. Or maybe one can get surprisingly far by transliterating the Realtalk programming language into a virtual spatial canvas. It probably won’t work, but I’d guess one would learn a lot by trying.
————————
Thanks to my friends from Dynamicland. Please forgive the blasphemy. Thanks also to my malleable software friends. I’m sorry if this letter reads as critical: I admire your work a great deal!
————————
1: Sometimes “malleable software” labels a broader effort which also emphasizes the importance of realtime collaboration, flexible movement across various device form factors, data sovereignty, and other priorities. Sometimes “malleability” denotes the idea that a user can modify or reprogram a piece of software to suit their preferences, while separate terms like “composability” and “interoperability” describe the ability to flexibly recombine different tools, across application boundaries, to work on the same data. Sometimes the phrase isn’t used at all, but the cultural connection is clearly present.
2: Bret is quite concerned about this kind of behavior. He’s observed that because PARC’s ideas about the personal computer were brought into the mainstream in a partial fashion, with surface resemblance, most technologists can’t even see the richer foundational ideas and aims we’ve inadvertently discarded. He’s also expressed concern about third-party partial reimplementations of Realtalk for this reason, among others. I’m sympathetic. But I think—maybe naively—Realtalk will still look like an alien fantasy next to any “malleable software” projects which narrowly adopt its relevant ideas. I don’t think anyone can mistake software trapped in screens as “basically” Realtalk.
2024-10-01 05:17:01 +0000 UTC
View Post
(Please don't share this video publicly.)
Thank you all for a lovely discussion today! I feel that I understand a notch better after our chat.
Here are links from the chat transcript:
00:20:22.523,00:20:25.523 Clemens Nylandsted Klokmose: https://varv.projects.cavi.au.dk
00:45:39.541,00:45:42.541 Andy Matuschak: https://dynamicland.org/2023/Improvising_cellular_playgrounds_in_Realtalk/#
00:55:49.200,00:55:52.200 Andy Matuschak: On "why not AR?" https://dynamicland.org/2024/Why_not_head-mounted_displays.pdf
2024-09-16 03:17:43 +0000 UTC
View Post
Links: Google Meet and GCal
Last weekend, over brunch, a non-design friend innocently asked: "So, Andy, have you seen this Dynamicland thing? What do you think?" Oh, goodness. Where to start? [1]
If you're here, you're probably aware that there's a bottomless new Dynamicland archive available, with one zillion new documents, images, and videos. I've been digging through them; probably many of you have, too. There's a lot that's totally new.
If you've never seen Dynamicland before, the new introduction compresses the ideas well. It's an attempt to invent a new kind of computing medium—one that's physical, social, and authorable by anyone.
Anyway: let's discuss! Bring obscure findings from deep in the stacks. Bring questions. We'll meet (via Google Meet) this Sunday at 9AM PDT [GCal]. I'll record the discussion and publish it here afterwards.
[1] "Like many designers, one of my most salient creative problems right now is: how to deeply engage with all the new Dynamicland material without crippling myself with feelings of inadequacy?"
2024-09-11 04:36:42 +0000 UTC
View Post
I’ve spent a lot of time these past two years talking with serious adult learners. That is, people independently studying technical subjects (e.g. physics) in support of some creative or professional project they really care about (e.g. their startup). I’ve noticed a surprising pattern to these conversations. We’ll spend quite a while discussing how to make learning efficient and effective: curriculum, comprehension, memory, practice, fluency, and so on. But then, an hour or two into the conversation, it will often emerge that despite their earnestness, the learner has trouble actually sitting down to do the learning.
All that earlier talk about comprehension and memory doesn’t really matter if the books aren’t getting cracked open. As I’ve run into this situation more and more, I’ve become convinced that it’s a seriously under-appreciated problem. Tools and resources often assume that the main thing a serious learner needs is clear explanation, or feedback, or effective practice. But for so many of these learners, the most immediate thing they need is help arranging their life and mindset so that they spend time learning what they intend to learn. In educational psychology, we’d say they need help with self-regulation.
I’m not talking about disaffected teenagers blowing off their history homework. These are smart, driven, creative adults, trying to learn topics of their own choosing, in support of their own creative goals. They intend to study. But then, week after week, they don’t. What’s going on here? Why is this so hard? And what can a designer do to help?
Where is it easier?
We can understand the problem better by looking at some situations where adult learners seem to have less trouble with self-regulation.
Synchronous courses. Even as they’re telling me about their struggles with independent study, learners will often tell me that they’ve enrolled in a course in some other subject that’s really working for them. Key elements seem to include: a regularly scheduled meeting time, a syllabus with clear scope and expectations, an energizing social environment, a respected facilitator, light accountability (via enrollment fee, social norms, etc). On the other hand, these courses usually suffer from functional learning issues (memory, comprehension, fluency); and they can often feel disconnected from the learner’s true aspiration.
1:1 tutoring. This seems to be a higher-variance strategy than a traditional course. The stronger relationship, personalized feedback, and sharper accountability often inspire more gumption. But for better or worse, most tutors have no syllabus. Scope, outcomes, and expectations are often vague and improvised, which leaves many learners feeling adrift or disconnected.
Peer study groups. Here in San Francisco, adult learners often gather with peers for regular evening study groups. The most common structure I hear is people studying separate topics individually, together in the same room, with a bit of socializing before or after. Helpful elements include the regular schedule, social accountability, and the pleasure of seeing friends. This structure seems to be even more effective when everyone’s working through the same text, because the social setting provides a meaningful way to engage with the material through discussion. But I do notice that these groups tend to fall apart after a few weeks or months. Stable long-term peer study groups seem rare.
Collaborative community groups. Stepping away from technical material for a moment, consider community music groups, sports teams, and acting troupes. These are like peer study groups in that they provide structure, social accountability, and community. But they’re also collaborative: the team or band succeeds or fails together. Members will feel each others’ growth, or lack thereof. When one member grows, that can inspire others to grow, too. When one member is struggling, the rest of the group is incentivized to help. These sorts of groups are common in the performing arts, but it’s less clear how to construct them for students of technical subjects.
Project-based learning / apprenticeship. When “just diving in” works, it seems to be most learners’ favorite approach. The learning is grounded and scoped by a real purpose; tinkering can replace textbooks; the project itself provides feedback and reinforcement. This approach also lends itself to a social setting, either with peers, or with more experienced mentors. Unfortunately, when conceptual load is high, this kind of on-the-fly learning often doesn’t penetrate deeply enough into conceptual foundations.
Scheduled performance. A reliable (if stressful) way to learn something is to volunteer to give a talk on the topic, or to teach a short course. I’ve seen a related structure in conversations with podcasters Dwarkesh Patel and Joe Walker. Both are known for deeply researched interviews, which often involve dozens of hours of study for each guest. In all these situations, there’s an impending date for a high-stakes performance, which creates urgency. The date also creates finitude: learners know when they’ll be “done”.
Stepping back, let’s look at some of the common elements these approaches provide:
Default schedule: Many of these structures construct a specific time for study. It’s often consistent and not easily changed. This inverts the default: you don’t need to keep deciding to study; if it’s on your calendar, you need to decide to cancel.
Clarity: What should I do next? How should I distribute my effort? How much is “enough”? Many of these structures offer answers.
Sense of progress: e.g. through checking off parts of a syllabus, producing work products, or conversations with peers or mentors.
Immediate reward: It may take a long time to learn enough for the real project you want to do, but these structures can give short-term meaning to your engagement with the material, e.g. through discussion with peers or practice projects. Those contexts may also make the material more immediately vivid and interesting.
Assurance: Many of these structures say in some way: “Don’t worry. This is the right thing to be doing. If you keep showing up, you’ll get where you need to go.”
Permission/legibility: “I can’t go Tuesday; I have [class/study group/tutoring] that night” is much easier to say than “I can’t go Tuesday; that’s one of my two free nights this week, and I’m trying to study physics two nights a week.” It’s also easier to tell yourself. The vessel offers a kind of narrative permission. Likewise, you might feel fine asking your boss to leave work early for a class or a study group on some relevant topic, but much less fine just sitting at your office desk reading a textbook.
Timeliness: As Taylor Rogalski has put it, you might have a good reason to learn that topic, but you don’t have a good reason to do it this Saturday. Even when these structures don’t include explicit scheduling, they create a reason for immediate action, generally through a short-term threat: study in a timely fashion, or face social cost. I’m wary of relying primarily on negative reinforcement—more on that later—but learners do need a good reason to study today, as opposed to someday.
When adult learners try to study independently, outside of one of the structures I’ve mentioned, they often end up with none of these elements. No default schedule, no clarity, no sense of progress, no immediate reward, no assurance, no permission, no timeliness. From this perspective, we can see just how much self-regulation is demanded of independent learners. They need to create and sustain enough of these elements to keep themselves going, all while also learning a difficult topic. No wonder they struggle so often!
Starting with time
Given all the discussion above, an independent learner should strongly consider joining or creating some structure like the ones I’ve discussed. Find a local class, study group, or tutor; identify a project you can use to drive the learning; invite your friends to a lecture you’ll give; etc. But many of the adult learners I’ve talked to have good reasons why these might not work, or might not be enough. The courses are too disconnected from their actual context of use; they’ve tried several tutors but none seem to click; their only available time is very early morning; they don’t know how to prepare to give a lecture; etc. What should such learners do?
In these conversations, the first thing I usually focus on is time. Often these learners have curated elaborate curricula and study tools for themselves, but they’ve neglected to set aside time to actually sit down and learn. One approach that often works is to create “study date nights”. That is: block a few hours on one or two nights per week for dedicated study. But then learners must defend those blocks against a never-ending barrage of social and practical encroachments. That’s often quite a challenge. If they succeed, learners create a vessel which can supply a few of the self-regulation elements we’ve discussed: a default schedule, timeliness, and even permission/legibility (“I can’t go; I set aside Tuesday nights to study math.”)
One common roadblock with the time conversation is that the learner simply doesn’t have any. They’re fully booked with professional and personal obligations. Sometimes it doesn’t seem that way—they have a few hours at night here and there—but every time they try to study at those times, the learner is too tired to focus properly. Happily, this situation is pretty straightforward. If the learner hopes to make meaningful progress, they must cut back on other activities enough to create a few hours of high-quality time on a regular basis. When exhaustion is the problem, the temptation is to somehow “figure out how not to be exhausted” during those late-night slots, but I’ve never seen that work. Learners are more successful when they secure permission from their workplace to study during the day, shift their schedule to study in the morning before work, or focus their study on weekends. There’s no point in talking about other more complex self-regulatory issues or study strategies until this problem is solved.
Contra “discipline”
After we sort out time, another issue often emerges in my conversations with adult learners. They’ve set aside time, but it’s not really working. They find themselves watching YouTube or doing email instead of reading the book they intended. Or they let themselves get pulled into some conversation or task around the house. Learners often express a lot of shame here. The sense is something like: “if only I were more disciplined!”
I think these learners usually make the wrong diagnosis when they prescribe themselves “more discipline”. But they’re right that something is wrong and needs to be fixed. Part of the trouble is that we use the word “discipline” in too many distinct ways. Let me get more specific.
Let’s start with a related classical virtue: temperance. Someone whose reason is ruled by ignoble appetites has a temperance problem. In the context of learning, the problem would be that the learner is so addicted to social media or Netflix that those appetites override their reason. In this case, yes, the learner needs to increase their control over their appetites, perhaps through some kind of corrective training. But I don’t think most of the learners I’ve talked to have this problem. The giveaway is that they don’t ordinarily lose hours to social media—it’s only when they sit down for a learning session that this problem flares up. Their appetites are usually in reasonable harmony.
For these adults, there’s something about independent learning, in particular, which makes their appetites more likely to dominate, or which makes them dissociate. Maybe what these learners need is more like grit—the tenacity to stick with a painful task? Classic stories of heroic grit might involve soldiers trudging through difficult terrain for days, or a single mother waking up before dawn every morning to finish a few hours of writing. In these stories, the task is necessarily painful, but the hero endures that pain in service of some higher cause. I don’t think this fits our adult learners very well. The same learner who’s struggling to focus on their own often has no real trouble when they’re in a course or study group.
I don’t think these learners need to become better at enduring necessarily painful tasks. Instead, we should ask why so much pain (or anticipated pain) surrounds independent learning for them. When I dig into that question, I find that learners usually have good reasons to avoid their studies:
Low confidence in plan. The learner lacks conviction that studying the materials they’ve chosen will actually help them with their real goal. Have they picked the right topics to focus on? The right texts? They fear they’re wasting their time.
Disconnect from authentic goal/interest. The learner is excited about working on brain-computer interfaces, but they’re mired in undergraduate calculus, and their excitement for BCIs doesn’t translate into excitement for epsilon–delta proofs. The material feels like an obstruction to be cleared before the learner can “get to the good stuff.”
Unclear progress. The learner shows up to study, but they can’t tell whether their efforts are producing any real results. Their goal is so far away that they can’t feel themselves getting any closer.
Sense of futility. The learner feels that they’re moving too slowly, that they’re not really understanding the material, that they’re not good enough. Feelings of incompetence, inadequacy, doom.
Lack of permission. The learner feels that they’re on borrowed time; they’re starting from a deficit. In some sense, they’re already supposed to know what they’re studying. This work “doesn’t count” as an achievement socially or professionally, and that’s a problem because the learner only feels secure when producing a steady stream of legible achievements.
“More discipline” isn’t the right antidote to these problems. If what you’re doing is useless, or futile, or meaningless, you shouldn’t persevere. In the last case—the feeling of insecurity—persevering might be a good idea, rationally, but railroading this felt threat will likely only produce more internal conflict over time.
A few healing themes
In all these cases, I think it’s worth taking the feeling seriously, and trying to address it. Why relate to independent learning as something to be endured through discipline and gritted teeth? Why set up a situation that depends on summoning willpower each time we sit down to study? The goal should be that we’re excited for the time we’ve blocked off on the calendar, this gift of curiosity and growth we’ve given to ourselves. I think it’s an attainable goal—though not necessarily an easy one.
Park ranger
Adult learners often struggle to find a good tutor for every topic they’d like to learn. Beyond undergraduate-level topics, the candidate pool shrinks considerably, and most technical experts aren’t eager to spend time tutoring on a regular basis. Most of them aren’t likely to be terribly good at it, either. But if we look at the list of problems above, or at the list of common elements in successful structures, I’m not sure tutoring is really what’s necessary.
A better metaphor might be the park ranger. A conversation with a park ranger can help you feel confident that you’re taking appropriate trails to get you where you’re going, taking into account your fitness and equipment. They can assure you that many other people in similar condition have happily made this hike without incident. If you tell the ranger that you’re particularly excited about waterfalls, they might be able to chain together a route which features them. You can ask why the trail takes that big ugly detour, and then feel better knowing that you’re avoiding a rockslide zone. The ranger can give you a map with their suggestions, and that’ll help you see your progress relative to the larger route.
The park ranger relationship demands much less from experts. These are questions you can ask in a short coffee chat, perhaps with a follow-up after a month or two. There’s no need to secure a weekly commitment. And this approach is more tolerant of an expert’s teaching inexperience. I’ve played this role to adult learners, connecting the dots between their authentic goal and learning materials, and my impression is that it’s been quite helpful, while demanding relatively little from me.
This concept relates to a proposal I made in “How Might We Learn”: that perhaps a large language model could offer this kind of guidance. I find that prospect quite exciting for more detailed information: not just suggesting which books to read, but which subsections, given the learners’ specific goals. But the confidence-building element is more emotional and relational—probably best left to humans. It seems to me that human “park rangers” are available in sufficient supply for higher-level guidance.
Recalibrating expectations
Many of the adult learners I’ve met are studying upper-level math or physics for the first time. These learners’ problems often stem at least in part from unrealistic expectations. First and foremost, they’re all experts in something, and they’re used to reading material in their own field. But for many of these learners, the explanations are just much more dense (especially in symbols and abstraction) than content in their own field.
So these learners often feel like they’re failing. They’re making progress too slowly, or else they try to read at their usual pace and then feel like they can’t understand anything. This creates a sense of futility and doom. But it’s an unnecessary sense of futility and doom. Modern information culture may make this worse: we’re constantly receiving the message that anything can be learned in a twenty minute video or a short blog post. Slowing down doesn’t pose any practical problem. If the learner works steadily, they’ll make plenty of progress. But the sense of effort, of really grappling with each page, is often unfamiliar and scary.
Sheldon Axler gives an appropriate warning in his introduction to Linear Algebra Done Right:
You cannot read mathematics the way you read a novel. If you zip through a page in less than an hour, you are probably going too fast. When you encounter the phrase “as you should verify”, you should indeed do the verification, which will usually require some writing on your part. When steps are left out, you need to supply the missing pieces. You should ponder and internalize each definition. For each theorem, you should seek examples to show why each hypothesis is necessary. Discussions with other students should help.
Unfortunately, I’ve met several people who have been frustrated by their slow progress in this very book. None of them could recall this bit from the introduction. I’m not sure it would have been enough if they did. At least here in San Francisco, the urge to speed-run everything is strongly embedded in the culture. If something can’t be speed-run, that’s often taken as a sign that it’s not the right thing to do. Courses and study groups do seem to counteract this effect, since they create common social knowledge that everyone is moving more slowly.
It’s worth noting that the medium of the book isn’t helping matters here. In the quote above, Axler lists all kinds of actions that the learner is supposed to take—on the side, outside the boundaries of the book, orchestrated by themselves. Surely the book could do more to make this methodical behavior feel like the natural default, perhaps through interactions which support comprehension. Books could also do more to help learners viscerally feel their progress, for instance by surfacing some representation of their accumulated learning, and by projecting when they’ll reach key milestones, as Math Academy does.
Enjoying the process
I’m talking to adult learners who are driven by a real goal: they’re studying in support of a project or question that’s really important to them. Their relationship to the material is totally different from a typical student in schools, studying to check a box on some dreary form or to avoid punishment.
But the fixation on that authentic purpose can be so strong that many of these adult learners don’t allow themselves to sink into the material along the way. All the prerequisites are just “in the way” of the project they want to do. If it emerges that a new topic must be added to the queue, that’s a new imposition added. I can understand this sentiment, of course, but it does mean that the learner is condemning themselves to countless hours trudging through obstacles.
Alternatively: once a learner sees that a topic will be necessary to their project, they could adopt a stance of cultivating curiosity about it. If the material’s not just an obstruction between the learner and their goals, its natural beauty can start to enchant. Concretely, I find it helps to start with an engaging video explanation of the topic, just to stoke enthusiasm, before diving into the denser materials which will actually build understanding. It also often helps people to talk about what they’re learning and what they’re wondering with peers, or on social media. Per Agnes Callard, “Twitter is a way to care about anything.”
Another approach, which I proposed in “How Might We Learn?”, would enrich the prerequisite texts with meaningful connections to the learner’s project. I’ve done this for students in conversation, and it seems to have helped a great deal. I suspect today’s large language models could go a long way here, though I haven’t experimented there yet.
For some learners, I think a real barrier here is a sense of permission. These people are quite curious by temperament. They’d love to really immerse themselves in the prerequisite topics. But they don’t feel like they’re allowed to enjoy the learning process. They have to start producing value; everything else is a frivolous luxury. This is a tough stance to untangle.
For some learners, the belief may be true. Their startup’s runway may really be quite short. In this case, I often suggest a combined approach: scrappily dash through whatever’s needed, just-in-time, while slowly and steadily backfilling a strong foundation for future projects.
But quite often, the belief isn’t actually true—it just feels true. One common story here is overriding concern for others’ perceptions of the learner and their productivity. All I can really suggest is that it’s a topic worth exploring with one’s therapist. Is there anything for a designer to do here? I’m not sure.
————————
If you’re an independent adult learner, studying material in support of something that really matters to you, I’d be curious to hear about your experiences with this issue.
————————
Thanks to Alec Resnick and Sara LaHue for helpful discussions, and to the many adult learners who have shared their adventures with me.
2024-08-31 20:19:58 +0000 UTC
View Post
Last month, I published an essay that I'd started writing five years earlier. When I began it in 2019, I knew I didn't understand the topic nearly well enough to produce the work I wanted. I still wanted to make progress, though—to articulate and refine what little I could, as a stepping stone.
So, largely inspired by Niklas Luhmann’s Zettelkasten, I started experimenting with note-writing methods. I don’t care about notes for the sake of notes: my aim is a powerful context for thinking. I wanted to create an environment where I could gather my ideas over time, where those scraps could accumulate and evolve with my thinking on the topic, until I could synthesize it into some meaningful whole.
The usual approach to notes doesn’t work very well for this kind of accretion. Notes are typically an ever-unfurling scroll of scratchwork, with new material appended to the end each day. Promising nuggets of insight are too often buried by the next day’s stream of consciousness, rather than being iteratively refined into crisp ideas.
In my approach, I still keep a daily notebook with lots of scratchwork. But when I stumble on something juicy, I try to write a sharp note about that one claim, question, or idea. (You can read some examples here.) The aim is to make each note clear enough to stand on its own, but I fill them with links to other small notes, so I can move around quickly and see how my ideas relate. Each note is aspirationally an “evergreen” entry on its topic, so that I can make progress by accumulating and improving the notes over time, individually and as a network.
Five years later, my recent essay represents a kind of success for this approach—but it also demonstrates some important failures. Now seems like a good time to reflect on my experiments in note-writing environments.
Divergence and maintenance in notes
Why “evergreen” notes? The (aspirational) idea is that each note in your library reflects your current best thinking about that issue. So, as you develop your ideas on a topic over months or years, you can draw upon, and extend, strong foundations. And because the notes are densely linked, when you improve one note, there’s a sense in which you improve all the nearby notes in the network.
The downsides of all this is maintenance burden and a kind of “conceptual debt”. It reminds me of a trope in software engineering. Novice engineers often get excited about how many lines of code they’ve managed to write, how big their projects have gotten. But experienced engineers often view code as more liability than asset. They’d rather have less of it, because they know that every line contains more moving parts to maintain, more pieces which can break. Over time, one part of the system typically evolves in a way which makes another part inconsistent or outdated—creating “technical debt”.
This May, I felt I had finally accumulated enough understanding to write the essay I wanted to write. My work on ”How might we learn?” had given me some new perspective on old problems, and I had a few new ideas ready to articulate. But I didn’t write evergreen notes on any of that new material. I felt ready to dive straight into the essay; writing notes would be an unnecessary detour. So I developed that new material directly in the manuscript.
Having now written that essay, I still have many dozens of notes on this topic. Many of them no longer reflect my most recent thinking: as usual, my views shifted and sharpened while I wrote the essay. Some of those old notes actually seem wrong to me now. And perhaps a third of the ideas in the essay are new, not captured in notes at all.
So, my notes on this topic now seem like more liability than asset. I could update them, of course, but I have no appetite for that right now. It would feel like bureaucratic busywork. The old notes have an air of staleness to them which makes me unenthusiastic about revisiting them. At the same time, I don’t like the idea of just throwing them out: I’m sure I’ll return to this topic someday, and there’s plenty of material in the notes which didn’t make it into the essay.
In a sense, the system served its purpose—helping me prepare, over years, to write this essay. But now my note library has an unmaintained corner which subtly pollutes the experience of using it.
What to do about this? One approach would be immediate visual indications of staleness, like yellowed old paper. It’s tempting to imagine that an AI system could help me automatically maintain these old notes. Unfortunately, that AI would need information that’s hidden inside my head. For instance, if I no longer find an old note convincing, I won’t necessarily denounce it anywhere visible to an AI: I may simply not mention it in recent work. But not all unmentioned ideas are bad—many just didn’t fit into the themes I was discussing.
Notes as scaffolding
Developing new ideas is hard. There are often too many moving pieces to juggle, too many interlocking problems to solve simultaneously. The miracle of writing is that it reduces the number of threads you have to hold in your head simultaneously, because the paper can do some of that work.
Unfortunately, writing is also hard. Note-writing acts as a kind of scaffolding. The environment says: you don’t need to write a ten-thousand word essay. You don’t need to make everything fit into a clear narrative. You don’t need to decide yet what’s important and what’s not. Just focus on that one idea, and express it as clearly as you can. Progress is much more tractable when it comes in freestanding chunks of a few hundred words. Scaffolding like this is precious when tackling tasks beyond your abilities.
Shortly after I started my note-writing practice, I began publishing monthly essays for my patrons. This created another context for writing and thinking, with higher stakes and higher fidelity than my working notes, but still much less demanding than my more serious major articles. Over the past five years, I’ve published well over 150,000 words in these monthly patron essays. And through all those words, I’ve become a much more confident, capable writer. When I started, I needed scaffolding to develop ideas beyond a limited scope. Now I routinely develop new ideas directly in the context of an essay I’m writing. Preparatory notes often feel less necessary. And if I can develop the ideas directly in an essay, I certainly want to do so: the work will generally be more lively, immediate, true.
Like all creative tools, my note-writing practice imposes both costs and benefits. As I become a more confident writer, its marginal benefits do seem to have declined. And as I accumulate more notes—more to link and maintain—I do feel the cost rising somewhat. Perhaps I’ll look back on this note-writing system as a “training” exoskeleton of sorts, one which I’ll someday have fully outgrown. I’ll still be grateful to have had it as a stepping stone.
Notes as public artifacts
In early 2020, basically on a whim, I started sharing my working notes publicly. My original motivation was that in Twitter and chat conversations, I’d often find myself saying “oh, I’ve written a note about that issue”. I wanted to be able to share pointers to individual notes as a way to quickly enrich discussion.
Much to my surprise, this practice ended up creating many more new discussions. People wander my notes, stumble onto something they find interesting, then write to me (or their friends) about it. That network effect has created a lot of new connections and opportunities for me.
One problem with my note-writing practice is that it often lacks a strong emotional context. When I’m squirreling away material for some abstract unknown future, rather than responding to the needs of some specific live project, the experience can feel dull and lifeless. Worse: it can feel like a duty—something I “should” do, rather than something I’m excited to do. But if I know that others will read even my working notes, before they’re subsumed into some more coherent project, that can create a meaningful creative context. That context becomes even stronger when I have particular people or conversations in mind.
The social context can also create some short-term rewards, which I certainly appreciate when my projects stretch on for years. It’s nice to receive warm emails about my notes. In one astounding instance, a college student introduced herself at an event and told me that she’s in a group chat of students at her university that started to discuss my ongoing work, including my notes.
All that said, a socially-motivated context—especially a totally public one—isn’t ideal for rough working notes. If I’m grappling with a fragile idea, that’s already hard enough. I don’t want to be simultaneously modeling and second-guessing what my hypothetical reader will think. That’s its own creative challenge, one better addressed when I have a stronger handle on the material myself. And I don’t want to reward myself too much for working notes. I want to orient myself around producing major essays and projects; the notes are just stepping stones to get there. So, practically speaking, when writing my working notes, I don’t draw much on the social-emotional connection. I pretend the notes are just for me, unless the idea is something I already understand pretty well.
The public availability of my notes makes staleness a bigger problem. Over the years, lots of people read my preparatory notes for last month’s essay and wrote to me about them. Those notes are still out there. Probably lots more people will still read them, even though many of the notes no longer reflect my views. This feels like a double bind: if I don’t update those old notes, they’ll subtly pollute the information environment, and people won’t get the benefit of my best work; if I do update them, I’m signing myself up for chores. One simple half-solution would be to stick a notice banner across all the relevant notes. (Ironically, my notes about note-writing need such a banner.) Unsatisfying.
Notes and spaced repetition
This unorthodox note-writing practice lives alongside my unorthodox spaced repetition practice. It’s been interesting to observe how these two practices have coevolved. Each practice creates a powerful context for honing my understanding, and each aims to combat the tides of forgetting as I work over years. Each practice also has an associated medium—the evergreen note and the spaced repetition prompt—but those two mediums have very different grains.
Evergreen notes are prose, and even though I try to make each note focused on a single idea, they’re better suited for longer, more holistic discussions. Spaced repetition prompts are more suited to laser-focusing on a single detail. A collection of prompts can dissect a complex system, but only once I understand it quite well—and sometimes that understanding can only come through prose writing. The two practices are focused on different outcomes, too: the evergreen notes make ideas available in prose for future writing and thinking; spaced repetition prompts make ideas available in memory for those future tasks. The latter is more ideal—but it’s more costly, and doesn’t work as well when the ideas aren’t so precise.
One useful guiding heuristic has emerged: I write evergreen notes mostly about my own ideas, or to develop the relationship between others’ ideas and my own; I write spaced repetition prompts mostly to internalize others’ ideas, or knowledge about the world.
Of course, the lines here are blurred. When I sit down with a book, I’ll often begin by ensuring I understand what the author means; then I’ll find myself making connections to my own ideas, shifting away from what the author intends for a while. This isn’t a one-way movement: the two modes will often interleave. To support fluidity switching stances, I’ve integrated spaced repetition prompts into my notes, so that I can jot a handful of prompts alongside paragraphs of extended prose.
The integrated note/prompt surface makes my coexistent practices a bit more natural, but I notice there’s still a significant amount of duplicated work. I’ll often use prose writing to figure out exactly what I think an author means by a certain line of argument, then I’ll turn that conclusion into a few spaced repetition prompts. In many cases, this is basically just busywork: writing the prompts is a matter of translating my prose sentences into “prompt-speak”, almost a kind of style transfer. The resulting duplicative text adds line noise to my notes and makes them less usable. Observations like this make me enthusiastic about an “idea-centric” memory system in which items are prose insights-in-context, and practice tasks are adaptively generated from those highlights.
One last collision between note-writing systems and spaced repetition systems is the possibility of somehow placing the notes themselves on a spaced repetition schedule. I was quite enthusiastic about this a few years ago, but I’ve tried many permutations of this idea without compelling results. For instance, perhaps my environment should surface random notes from the past each day, to foster serendipity. Or perhaps, if I have some spark of an idea but I don’t know what to do with it, I should make it reappear a month later. On a couple occasions this has led to generative writing sessions, but the rest of the time it’s just seemed like noise. The key issue here seems to be emotional connection: I get interested in a topic, so I write a note… but when it’s algorithmically resurfaced, I’m probably thinking about something else, and that old note doesn’t seem meaningful. I’ll often find myself returning again and again to old topics, but that rekindled interest doesn’t happen on command.
Future prospects
I’m somewhat baffled by most of the suggestions I’ve seen for new note-writing tools, particularly in the recent age of LLMs. I don’t need my writing assistant to polish the prose in my working notes: I’ll develop those ideas further through the process of writing them into a coherent whole. Basic issues of language, grammar, and syntax are not bottlenecks for me. I don’t need related notes to be automatically resurfaced: search and links have worked fine for me, and I rarely notice myself “losing” notes. I don’t need meeting or book summary generation in my notes: writing a précis is an important part of how I solidify my understanding. I don’t need help shifting the tone of my language around. And so on.
So, where might I benefit from help?
One angle would be to avoid writing these “evergreen notes” at all. If I could develop ideas directly in stream-of-consciousness scratchwork, but they could still meaningfully accumulate over time, maybe I could avoid the overhead involved in creating and maintaining “clean” notes. Latticework gestures in this direction, but I think a tool specialized for this purpose could go further. Ideally, I could ramble about the topic in an unstructured format, but with quick and legible handles back to the “good stuff”. I’d like to be able to incrementally refine those rough ideas over time, without the ceremony of “creating and naming a note”.
Another angle is to notice that I get stuck a lot when writing. Often the approach I’m taking isn’t working, but I don’t have the meta-cognitive wherewithal to step back and choose a different strategy. Worse, I often fail to even notice that I’m in this situation when I’m in it. In the moment, it just feels like aversion and dissatisfaction. What’s interesting here is that if I complain about my stuckness to creative peers, the conversation almost always leads me out of the swamp, usually through utterly familiar suggestions. Is it possible to create a writing assistant which can do something similar, and which could even notice when I’m stuck in this way—without it feeling like Clippy?
While I’m not excited about AI-generated suggestions for connections within my notes, I’m quite interested in suggestions for outside resources I should consult. I notice an enormous advantage that my academic friends have: their environment is full of walking bibliographies. They go to lunch with some other faculty or students, mention what they’re thinking about, and their counterpart says: “Oh, have you seen…?” It’s hard to search for something when you don’t know it exists, or that you should search for it. I’d be grateful for a tool which watches my reading and writing, notices my mind’s grasping frontier, and surfaces keen suggestions.
2024-08-01 00:20:15 +0000 UTC
View Post
Longtime patrons may remember the research fellowship I launched at the start of 2023. I spent a year working with the winner, Matthew Siu, on a new sensemaking tool. Today I'd like to share a preview of our report on that work!
✨ Read report preview ✨
Comments and feedback are welcome.
I've excerpted the introduction below, so you can more easily decide whether you'd like to click through:
Introduction
You’re in the middle of a large project, and you’re stuck. You’re staring at a pile of long, unstructured documents (journals, lab notebooks, meeting notes) with a sense that the clues you need are buried somewhere within. But there’s simply too much to keep in your head as you scroll. Instead, you hatch a plan. If you could find the important snippets and gather them into one place, it would be easier to see everything at once, to notice patterns, and to figure out what to do no ext.
So you create a new working document, and as you read through your unstructured notes, you copy and paste key passages into it. But this process is cumbersome and disorienting: you’re constantly moving both your attention and your cursor between these documents, deciding where to put each new snippet, losing your place in your unstructured notes. The snippets in your working document are isolated from their original context, so it can be hard to remember what each means and why it matters. And as snippets pile up, your fresh document can easily produce the same kind of overwhelm you were trying to escape.
If you’d been working with books or academic papers, rather than your own notes, you might have used annotation tools instead. It’s easy to enter a satisfying flow with these tools. When some phrase strikes you as important, a highlighter lets you capture that reaction immediately, without shifting your focus. Likewise, marginal comments let you record observations as they occur to you, in context. The trouble with these tools is that your annotations end up trapped in the pages of the original documents. But when you’re trying to make sense of a confusing situation, you need to get everything into one place, where you can see, rearrange, and elaborate the pieces into a new whole.
Annotation tools feel great to use, but they don’t support the follow-up thinking you need to do. Text editors give you a flexible canvas for making sense of snippets, but their design is often cumbersome and disorienting when used in this way. If you could move fluidly between these tools, you could use each where it excels and, perhaps, get the best of both worlds. In this paper, we present Latticework, a system which unifies annotation with freeform text editing, in the context of personal knowledge management tools.
2024-06-30 23:58:02 +0000 UTC
View Post
Now publicly available here.
If you want to make an educational technologist’s eyes sparkle, just mention “The Young Lady’s Illustrated Primer”. It’s a futuristic interactive schoolbook, described in Neal Stephenson’s The Diamond Age, where it lifts a young girl out of poverty and into sovereign power. It’s my field’s most canonical vision of a wildly powerful learning environment. If you ask a technologist interested in learning what they dream of achieving, most will answer: “building the Primer.”
Fifteen years ago, I’d have given the same answer. With my weak skills and understanding, the Primer was far enough away that I couldn’t see its details properly, but it was still enchanting enough to drive me emotionally. As my practice grew, and as I earnestly considered what it would mean to build the Primer, I started to notice the vision’s serious flaws. Gradually, I came to see it as fundamentally unworkable, even while it still deeply compelled me. Now I feel haunted by the Primer. I know it’s not what I want to build, but some part of my mind won’t let go of that vision until it has something else it can grab onto.
In fact, I think my whole field is haunted by the Primer. That’s not Stephenson’s fault: it’s ours. Our shared canonical vision remains a plot device from a science fiction novel because we haven’t managed to articulate something better ourselves.
I want to exorcise myself of the Primer. I want to clearly delineate what makes its vision so compelling—what I want to carry in my heart as a creative fuel. But I also want to sharply clarify the lessons we shouldn’t take from the Primer, and what it simply ignores. Then I want to reconstitute all that into something new, a vision I can use to drive my work forward.
(Audience note: I’m writing this essay for technologists entranced by the Primer’s vision, so I assume familiarity with The Diamond Age. If you haven’t read the book, don’t expect to follow this discussion in detail.)
What I want to take from the Primer
In Nell’s adventures with the Primer, I recognize something precious and scarce from my own life. Her interactions with the book remind me of my most rewarding, highest-growth learning experiences—except that the Primer produces those rare experiences basically all the time, for every topic. That’s my central connection to the book. The Diamond Age compels me because it depicts a girl who spends most of her life in those states I cherish.
With this framing in mind, I think the right approach to extracting ideas from the Primer is a selfish one. When I think about “building the Primer”, I’m not thinking about an educational intervention, a schoolbook to help children learn a curriculum; I’m not thinking about its morality tales. I’m thinking about what I want, as a capable, curious adult. I want to use its ideas to invent enabling environments for myself—to help me better think, learn, and create.
Immersion
My most cherished learning experiences have involved diving into a topic, trying things, getting my hands dirty, living and breathing it. In the real world, this kind of learning is somewhat rare. Many topics just seem too abstract. Or, when I try to “just dive in”, I’ll struggle to find an effective way to engage. But this is mostly how we see Nell learn: through immersive, hands-on action.
The main way the Primer makes this possible is through dynamic media. Its interactive representations and simulations expand the range of topics where immersion is possible. When it wants to introduce a conceptual topic, it begins with concrete hands-on projects: Turing machines, microeconomics, and mitosis are presented through binary-coding iron chains, the cipher’s market, and Nell’s carrot garden. Then the Primer introduces extra explanation just-in-time, as necessary.
That’s not how learning usually works in these domains. Abstract topics often demand that we start with some necessary theoretical background; only then can we deeply engage with examples and applications. With the Primer, though, Nell consistently begins each concept by exploring concrete instances with real meaning to her. Then, once she’s built a personal connection and some intuition, she moves into abstraction, developing a fuller theoretical grasp through the Primer’s embedded books.
We needn’t (and shouldn’t!) think of these dynamic media from a purely didactic perspective. The Primer’s nanomolecular microscope helps Nell dip her toes into cellular biology, but it’s also a tool which makes expert biologists much more capable. It’s not a toy representation, a cognitive dead end which Nell must discard as soon as she builds an intuition. It’s a tool which can grow with her into legitimate practice, a tool which in fact expands the frontiers of practice for the entire field. My collaborator Michael Nielsen has long argued that this is true of all our most powerful representations. If you make experts more capable, similar ideas will often also help novices; but if you focus on educational use, you’re unlikely to transform real work in the field. Mathematica is a great modern example of this: it was invented to support frontier research in cellular automata; happily, it also allows novices to more easily build intuition for complex mathematical ideas.
Responsiveness
Have you ever stumbled on a stairwell or lobby which felt acoustically spring-loaded—where when you hum or clap, you feel like you’re plucking an instrument, and the whole room vibrates back at you? That’s what my favorite intellectual and creative experiences have felt like. My every move is absorbed, reflected, amplified back at me through the work or the people around me. The whole situation is alive, responsive. It’s a rare and precious feeling, and the Primer offers Nell something like it all the time.
This responsiveness rewards curiosity. It’s not just that the Primer patiently answers Nell’s endless childlike “why?” questions. When she expresses interest in anything, the book finds tractable ways for her to start meaningfully engaging. Then it helps her go deeper over time, as in this passage:
In the plot, Nell planted some carrots, thinking about her friend Peter who had vanished so long ago, and in the window boxes she planted some geraniums. The Primer taught her how to do it and also reminded her to dig up a carrot sprout every few days and examine it so that she could learn how they grew. Nell learned that if she held the Primer above the carrot and stared at a certain page, it would turn into a magic illustration that would grow larger and larger until she could see the tiny little fibers that grew out of the roots, and the one-celled organisms clinging to the fibers, and the mitochondria inside them. The same trick worked on anything, and she spent many days examining flies' eyes, bread mold, and blood cells that she got out of her own body by pricking her finger.
Responsiveness also manifests through rich feedback, which the Primer provides continuously and often immediately. When learning somersaults, for instance, the Primer shows Nell a recording of her movements and points out where her form could be improved. But it does the same trick with abstract topics, too: Nell learns Boolean logic by manipulating waterways, and punchcard programming by manipulating an organ. The Primer chooses (or invents) representations which offer live interaction and feedback, minimizing the distance between idea and action, between action and mental update. It’s true that immediate feedback can sometimes subdue reflective contemplation, but I generally find I want more of it than I can easily get in most topics.
Assurance
My highest-growth experiences weren’t easy or stress-free. They were often quite demanding. But critically, in these periods, I felt my hard work paying off. I had confidence that if I kept pushing, I could reach whatever I was chasing. Likewise, part of what makes memory systems so rewarding is a sense of total assurance: if I want to remember a fact, I can add it to my memory system and be quite sure that I’ll retain it indefinitely. That certainty drives a positive feedback loop: learning works, which feels good, so I’m eager to learn more, and so on.
The Primer creates this kind of assurance for Nell. It lets her struggle—she’s often depicted attempting a puzzle many times—but it never lets her fail. She grows to trust that she can handle anything the book throws at her, because she never encounters any topic she can’t eventually learn. My memory system makes me feel that way about remembering things; I’d love to feel that kind of certainty about learning any topic or skill at all, as Nell does.
One way the Primer achieves this feeling is through dynamic scaffolding. In my highest-growth experiences, I’ve managed to find a perfect route through the difficult terrain I’m traversing. At every moment, I have the support I need to make progress, but not so much that what I’m doing feels boring or fake. And as I become more skilled, that support smoothly fades away, so that my frontier of mastery keeps expanding. These routes are rare. Much of the time, when I “just dive in”, I end up diving into a wall, or stuck in trivialities.
But the Primer provides and adjusts this kind of scaffolding for everything Nell learns. Sometimes it does that over a span of years. For instance, it initially reads aloud to her, then supports her as she sounds out unfamiliar words, then leaves her to read fine print silently. For some topics, the Primer scaffolds over days and weeks, as in the Castle Turing sequence in which Nell progresses from unary alphabetic codes to Turing machines. The Primer even provides dynamic scaffolding within individual sessions, as the interventions in this passage illustrate (my comments in brackets):
Harv and Nell were trying to build a fire. There was a pile of wet logs Harv had chopped up. Harv also had a rock, which he was striking against the butt of a knife. Sparks flew out and were swallowed up by the wet logs.
“You start the fire, Nell,” Harv said, and left her alone.
Then the picture stopped moving, and Nell realized, after a few minutes, that it was fully ractive now. She picked up the rock and the knife and began to whack them together. … Sparks flew, but there was no fire.
She kept at it for a while, getting more and more frustrated, until tears came to her eyes. But then one of the sparks went awry and landed in some dry grass. A little curl of smoke rose up and died out. [This was not an accident, but a scaffolded hint provided in response to her frustration.]
She experimented a bit and learned that dry yellow grass worked better than green grass. Still, the fire never lasted for more than a few seconds.
A gust of wind came up and blew a few dry leaves in her direction. [Again: not an accident, but a response to repeated failure.] She learned that the fire could spread from dry grass to leaves. The stem of a leaf was basically a small dry twig, so that gave her the idea to explore a little grove of trees and look for some twigs. The grove was densely overgrown, but she found what she was looking for beneath an old dead bush.
“Good!” Harv said, when he came back and found her approaching with an armload of small dry sticks. [Note that Nell finding the twigs triggers his return.] … Soon they had built up a roaring bonfire.
Emphasis on the emotional
My highest-growth experiences are all driven by obsession: I’ll find an idea utterly beautiful, or I’ll fall in love with a community, or I’ll feel an almost righteous need to realize some captivating vision. It’s not a cold utilitarian calculus; it’s a hot emotional fuel. It can be hard to stoke these fires, though, and easy to accidentally lose the spark if I make the wrong moves. One reason the Primer’s vision sticks with me is that it emphasizes the emotional.
The Primer's full name is The Young Lady’s Illustrated Primer. It recognizes that beautiful imagery can enchant. The Primer isn’t an encyclopedia; instead, it uses mythic stories, vivid characters, and immersive environments to create and maintain emotional connection. And the dialogue is performed by a professional who devotes her life to expressing care for Nell through her voice.
As much as emotions can draw me in, they can also hold me back. The biggest bottlenecks to intellectual and creative work are often emotional, and the Primer prepares Nell for that. It meters out setbacks and new challenges strategically. Its scenarios are designed to create resilience, self-efficacy, and independence, as much as to help Nell learn specific topic knowledge.
Another important emotional move: the Primer places Nell in a fundamentally active stance, framed as the author of her experiences. There’s a built-in narrative structure, but Nell’s choices and actions turn each page. It feels to her as though she’s causing the story, not passively consuming it. My own highest-growth experiences all have this authorial emotional texture.
The Primer also recognizes that learning is an act of identity construction. When I learn about a topic, I’m not just learning facts and abstractions. I’m becoming, at least partially, the kind of person who practices that discipline. I’m taking on some of that domain’s values and perspectives. I’m changing the way I view myself and the world.
Much of what the Primer does emotionally is too patronizing and manipulative—more on that soon. But the main lesson I want to take here is that it’s right to care enormously about these issues. Anyone looking to extend human cognition must engage just as deeply with emotion.
Wrong lessons to take from the Primer
The Diamond Age is a novel, not a research paper, so the Primer is presented as a single whole, rather than a list of system design properties. It’s too easy, then, to treat that vision as a single whole when saying “I want to build the Primer”—there are no other handles to grab onto. This is a mistake. The concept has serious and foundational flaws. So I want to be very clear about the lessons I don’t want to take from the Primer.
Authoritarianism
The Primer has an agenda. It is designed to instill a set of values and ideas, and while it’s supportive of Nell’s curiosities, those are “side quests” to its central structure. Each of the twelve “Lands Beyond” focuses on different topics, but they’re not specific to Nell, and Nell didn’t choose them. In fact, Nell doesn’t even know the Primer’s goals for her—she’s never told. Its goals are its own privileged secret. Nell is manipulated so completely by the Primer, for so much of her life, that it’s hard to determine whether she has meaningful goals or values, other than those the Primer’s creators have deemed “good for her”.
The Primer is built on a foundation of what Ivan Illich has called “our pedagogical hubris”: that is, “our belief that man can do what God cannot, namely, manipulate others for their own salvation.” Its design is not only patronizing and infantilizing, but (I believe) immoral. I wouldn’t want to be manipulated in this way, so why would I want to do this to others?
Ideology aside, there’s a central contradiction here. The Primer’s main goal isn’t to teach any specific domain knowledge. Finkle-McGraw commissioned it to develop creative, independent thinkers. The problem here is that Nell spends her entire intellectual life thinking about what the Primer tells her to think about. We never see her embark on a substantial creative project with her own initiative or responsibility. I simply don’t believe this can work. If you spend your whole life solving well-defined problems with well-defined solutions, you’re unlikely to suddenly pose interesting problems or find startling solutions.
These problems force us to reject much of the Primer’s structure. Its central spine, the story of Princess Nell, relies on Nell’s essential passivity. The same is true for the way that the Primer covertly weaves lessons into the adventure, so that Nell learns important skills without realizing she’s learning. Control must be inverted, so that while guidance and support are offered, learners have ultimate responsibility for their own agenda.
Isolation
When I ask people about their most rewarding high-growth experiences, they usually tell me about times when they were immersed in something that really mattered to them, like a startup, or an art project, or a competition. Lots of learning happened, but learning wasn’t the point: it was subsidiary to some other meaningful purpose, often pursued in community with others. By contrast, Nell is immersed in a fantasy world where the primary aim is learning. It’s isolated from any external meaningful purpose: Nell doesn’t have any, as we’ve discussed. She’s collecting trophies (keys), waiting for her real life to begin.
Worse, the Primer is also isolated from other real people. Its characters exist only to manipulate or to be manipulated by Nell. Miranda, her ractor, reads from a script and can’t even see Nell’s responses. Insofar as The Diamond Age comments on AI as a substitute for human connection, it suggests an enormous difference between Nell’s devoted experience with Miranda and the Mouse Army’s experiences with an AI-generated voice. But that misses the more important point: in both cases, the relationship is fully scripted by an AI designed for manipulation. Differences in voice acting hardly seem to matter. Outside the Primer’s world and at school, the girls do make real personal connections. But the book doesn’t involve itself in any of their authentic social interactions, other than some protective interventions in Nell’s youth.
The problems with both kinds of isolation become vivid when I think about the kind of enabling environment I want for myself, rather than what I think might be “good for” others. I’m suspicious of isolated learning—learning for learning’s sake. I most enjoy learning when it’s part of some larger meaningful activity, and when it helps connect me more deeply with other people I enjoy. That doesn’t mean I think learning needs to be utilitarian or instrumental. The passionate pursuit of a curiosity can be a larger meaningful activity, and it’s one I most enjoy pursuing with or for others. When I study philosophy, for instance, the material becomes meaningful when I debate it with others or when I bring the ideas into some vivid question in my life.
As with the Primer’s authoritarianism, we can’t change this without inverting many of the book’s central structures. Instead of isolating people within a virtual world, we must weave support into life as people participate ever more richly in the real world.
Gamification
Many of my most rewarding high-growth experiences were tremendously fun, even as they were stressful and challenging. But (particularly in school) learning is often an unpleasant grind. Many technologists have wondered: why can’t learning be fun all the time? Wonderful games like Portal and The Witness do reliably cause players to learn complex ideas without explicit instruction—so couldn’t we teach everything through games? For that matter, could we make a game so fun that people will play it without even realizing that they’re learning?
Apart from the issues of authoritarianism and isolation we’ve already discussed, I believe this aspiration cannot succeed for most topics. This perspective doesn’t take seriously how games work as a form. Advocates of this approach are satisfied with making mediocre games: games which can’t compete on their own merits as games, games which are doomed to fail without coercion.
Games are designed first and foremost to be fun—or beautiful, or engrossing, or exhilarating. Games are an aesthetic medium, and (generally speaking) they compel our participation insofar as they compel us aesthetically. It’s true that in some games, players end up developing certain skills or understandings along the way. But that doesn’t mean we can make a great game that teaches anything. You’re seeing the survivors. These games’ designers tried and discarded dozens of gameplay ideas in search of something aesthetically compelling. Then, only after they’d satisfied the primary constraint of making something fun, or beautiful, or whatever, the designers figured out how to ensure people would learn what they need as they play. Most mechanisms are not fun. Good games come from a demanding selection process which works the other way around: first, find the fun. There’s no reason at all to believe that for any arbitrary abstract topic, one can always “find the fun” which implicitly teaches it.
Even if you could make an effective educational game for any topic, you’d be competing along the wrong axis. Educational technologists often aspire to make games which are so fun that people will play them despite their educational content, or maybe even without awareness of their educational content. But if you remove coercion from the picture, and we’re imagining that people will choose to play whatever they find most fun, that means you’re in competition to out-fun Fortnite, or League of Legends, or Stardew Valley, or whatever—games exquisitely optimized to be as viscerally enjoyable as possible. This is the Primer’s strategy, and it’s only plausible because Nell’s life is otherwise so utterly impoverished of meaningful activity.
What if you don’t compete on pure fun, but on some other aesthetic value? After all, intellectual art house films and mindless Hollywood blockbusters coexist in the market. Rather than “most viscerally fun”, you could create interactive learning environments which compete to be the most profound, or awe-inspiring, or joyfully perplexing. But even in that framing, you’re still saddled with a demanding extra constraint, one that your competitors don’t have. You need to produce an irresistible aesthetic experience while also ensuring that the player thoroughly learns some disciplinary topic. Sophisticated films and novels have carved out a successful niche, but how many of these works also manage to teach an abstract topic? It’s hard enough to make a profound aesthetic experience, and hard enough to convey an abstract idea thoroughly. Doing both simultaneously requires rare genius, no matter the aesthetic axis.
I’m absolutely not arguing that learning can’t be fun or profound. In fact, my most cherished learning experiences were more engaging than my favorite games. The difference is that those experiences weren’t structured first and foremost around an aesthetic goal—or around learning, for that matter. Those experiences were primarily about doing something that I really cared about: creating something, participating in a community, answering a question, helping a friend. Learning and aesthetic pleasures came as a happy byproduct.
I’m also not arguing that these experiences can’t be explicitly designed, or scaled to many people. Y Combinator, the startup accelerator program, uses a huge amount of authored structure to convey a set of ideas and values: talks, dinners, office hours, rituals, feedback, deadlines, and so on. This program isn’t a game; these people aren’t “playing founder”. But for most participants, these structured activities are incredibly engaging and rewarding. I claim that’s because YC’s structure is about engaging more deeply and capably in something that matters enormously to these founders.
There are many important lessons to take from games as we try to transcend the Primer. It’s possible to design interactive environments which help people understand complex ideas through simulation and realtime feedback. It’s right to care enormously about motivation and aesthetic experience. Scaffolding and dynamic support can help people reach further. Designed environments can support social connection and community participation. And so on. But I believe the central task here is to draw on all these lessons as we construct a new medium, one centered on supporting meaningful action, rather than on producing aesthetic experiences.
Discovery learning is all you need
In some of the Primer’s most vivid passages, Nell learns not by being instructed, but through inquiry and experimentation in a highly structured environment. That’s how she learned to start a fire in a passage I quoted earlier. When many technologists talk about “building the Primer”, they paint a world in which we learn primarily or exclusively through that kind of minimally guided exploration.
I understand the appeal. Inquiry learning implies hands-on immersion, rather than dull lectures. And those moments of discovery can be quite thrilling. It’s also true that you’re more likely to remember insights you produced yourself. Perhaps you’ll even understand the idea more thoroughly, since you had to construct all the pieces which produced it. Still: while I’m quite fond of discovery learning, we shouldn’t conclude that it’s all we need.
Outside The Diamond Age, this issue has been debated at great length in the educational psychology literature (see e.g. Lee and Anderson’s 2013 review or Kirschner et al’s infamous 2006 critique). I’ll quickly summarize a few of the key issues.
Discovery learning methods often ignore learners’ extremely limited working memory capacity. In many experiments, we’ve found that when people solve a problem by straining their cognitive resources to their limits, they aren’t left with enough capacity to actually learn from the problem.
Likewise, these methods often fail to consider the realities of long-term memory. We forget new knowledge in unfamiliar topics quite quickly unless it’s reinforced through subsequent retrieval. And if we’d like learners to apply new skills or to recognize new patterns fluently, they’ll need practice, or some other intervention which will let them consolidate those memories for automatic use.
Discovery learning emphasizes the concrete. That’s great for intuition-building and engagement. But much domain knowledge relies on abstraction, and it’s often not clear how to introduce that kind of understanding without explanation. When material is introduced without abstraction, we often find that students struggle to transfer what they’ve learned from one concrete case to another. Relatedly, concreteness tends to emphasize informality and imprecision. That’s often helpful initially, but the precision and power of formal notations and concepts are important to meaningful action in most fields.
It’s worth noting that the Primer isn’t pure discovery learning! When Nell is young, the book and its characters provide guided instruction through storytelling. When Nell grows old enough to read on her own, she gains access to books within the Primer, which we’re told she studies avidly. There’s an important pattern to the books. Nell usually encounters a subject first through a discovery-oriented activity (like the chains in Turing’s castle), then once she’s built intuition and meaning through concrete interaction, she finds a book which details the topic more formally and in greater depth. It’s hard to imagine her developing her skills without those books. Of course, the books don’t suffice to make Nell’s mastery totally plausible. Outside of martial arts and survival skills, there’s no mention of explicit practice, and few instances of elaboration over time. But even in the wildly optimistic world of the Primer, discovery learning alone is not enough.
Constructing a new vision
I want a new vision to chase, something which subsumes what enchants me about the Primer: hands-on immersion, vivid responsiveness, a sense of total assurance, and devotion to the emotional heart of learning and creation.
Even as I leave the Primer behind, I still see dynamic media as central to each of those properties. Interactive representations and simulations make immersion possible for more topics. Realtime feedback enables vivid responsiveness. Dynamic scaffolding and adaptive practice can help give me assurance that I can learn what I want. Adaptive metacognitive support and vibrant audiovisuals support my emotional experience.
But we do need to leave the Primer behind. It works its magic through manipulation, through isolation from people or projects with real meaning, and through a misguided reliance on aesthetic pleasures. I see no way to incrementally rescue its structure from these flaws.
Instead, I think we should invert the Primer’s structure. Our system should be designed to help us engage in the projects and interests we find meaningful, rather than to “achieve learning goals”. Our system should be woven into the world around us, where the things we care about are happening, instead of in a fantasy world confined to a screen. Our system can support engagement by helping people participate more fully in their interests, and by amplifying the intrinsic satisfaction which results—rather than by aiming for entertainment, and hoping for meaning as a side effect, as the Primer does.
What kind of thing is this system? If it’s woven into our involvement with the world, as I think it must be, it’s not quite a book anymore. It seems to be a medium in the sense that air is a medium for sound: it’s an ever-present conduit, providing the support, structure, and representations we need to do things we care about. This probably means a kind of ubiquitous computing. My sense of that shape is still quite hazy, but you can see a first sketch of what a system like this might look like in my recent presentation, “How might we learn?”. Clearly, there is much more to do. I hope that this essay can help free some of my colleagues who have been likewise captivated by the Primer—and that they can help me design the system which will supersede it.
————————
The ideas in this essay owe much to years of exchanges with Alec Resnick, Bret Victor, May-Li Khoe, Taylor Rogalski—and, especially, Michael Nielsen, whose conversation helped incubate so many of these views. My thanks to all.
2024-06-06 04:32:18 +0000 UTC
View Post
Thanks to all who joined!
(As always, please don't share this recording publicly.)
2024-05-20 20:50:28 +0000 UTC
View Post
One of the great things about publishing a big talk or paper is that you can use them to create high-context conversations. Instead of rambling at your conversation partner about your ideas, you can direct them at a talk which compresses several months of thinking into 40m!
So: let's discuss "How Might We Learn?"! Do your desires for a future of learning diverge in any interesting ways? How might we shore up or tear down the weakest parts of this vision? What first steps seem most compelling?
We'll meet (via Google Meet) this Sunday, May 19th, at 9 AM PDT [GCal]. I'll record the discussion and publish it here afterwards.
2024-05-17 00:38:19 +0000 UTC
View Post
(Talk starts at 4:05. Sorry, Patreon doesn't let me embed videos with a starting timestamp!)
When people talk about the most rewarding high-growth periods of their lives, two themes commonly emerge.
First: they learned a lot, but learning wasn’t the point. Instead, they were immersed in some purpose with real personal meaning—like a startup, a research project, or a burning question—and they learned whatever was important along the way.
And second: the learning actually worked. People emerged feeling transformed, newly capable, filled with insight and understanding that has stayed with them years later.
If these experiences are so rewarding, why are they so rare? Why can’t we learn everything by “just diving in”? Why does learning so often fail to work as we hope, leaving us with brittle, fragmentary understanding?
This is a (proto-?)vision talk of sorts—an attempt at a broader picture of the future of learning we should create, particularly given developments in AI. It's a first step, but there are a lot of new ideas here, and I'm proud of it.
There isn't really room for an "acknowledgments" section in this sort of venue, so I'll share a note of thanks here to Alec Resnick, Bret Victor, Dan Meyer, Michael Nielsen, and May-Li Khoe for years of conversation that has molded my views on these topics. And thanks to Jim Hollan and Haijun Xia at the UCSD Design Lab for inviting me and unknowingly creating a good context for me to develop these ideas!
This is a public YouTube link; feel free to share.
2024-05-10 03:01:16 +0000 UTC
View Post
I’ve devoted the last few months’ letters to conceptual essays, so I’ll return this month to updates on my various projects. Things are happening!
“How might we learn?”
Haijun Xia and Jim Hollan at UCSD’s Design Lab kindly invited me to give a Design@Large lecture in May. I decided that I’d use this talk as a forcing function to figure out a compelling vision for how learning might be transformed by powerful AI. Or, well, at least to try.
Two forces inspired this effort. First, as I mentioned last month, I simply haven’t updated sharply enough on the remarkable growth of model capabilities. I keep stumbling over heuristics and conclusions which need to be re-evaluated, given where things seem to be headed. It’s unsettling each time I find one of these—a sign that I’m fighting the last war. So I want to aggressively invert the situation and ask: if I take these capabilities very seriously, what do I find myself excited to create? All of this is complicated by my deep moral concerns around AI, but I think there’s a path through.
The second force motivating this work: I find myself irritated at almost everything that almost everyone says about the intersection of learning and AI. But that’s good! That kind of annoyance is a sign that I might have something interesting to say, if I can tease apart the differences in my view.
If I do a good job, I’ll end up looking at my research in a very different way. My agenda may include some of the same projects, but they’ll be subsumed into a different whole, viewed from a new, and hopefully more powerful, perspective.
The title is “How Might We Learn?”. Here’s the description:
When people talk about the most rewarding, high-growth periods of their lives, a pattern emerges: they learned a lot, but learning wasn’t the point. Instead, they were immersed in some purpose with real personal meaning—like a startup, a research project, or a burning question—and they learned whatever was important along the way.
If these experiences are so rewarding, why are they so rare? Why can’t we learn everything by “just diving in”? Why does learning so often fail to work as we hope, leaving us with brittle, fragmentary understanding? In this talk, I’ll propose some paths forward and suggest how AI could help us create powerful new kinds of enabling environments.
The talk is in a week, and I’m deep into hermit mode, working unsustainably. I feel a great deal of pressure—all of it utterly self-induced. I haven’t even told my hosts at UCSD what I’m talking about. Thankfully, I’m not desperate for the approval of academic human-computer interaction researchers. But I do feel the weight of my own expectations, which I set much too high, given the scope of the topic and how little material I started with. I’m slowly internalizing a correction, but that takes some time.
Of interest to some of you: I’m using this opportunity to (hopefully) exorcise myself of the Young Lady’s Illustrated Primer, which has haunted my aspirations here for almost half my life. I understand much better what I want to take from it, and what I want to transcend. I also think I understand how it’s possible that so many technologists have been inspired to “build the Primer”, and yet most are so ignorant of what Stephenson’s book actually says about it. I doubt I’ll actually talk about any of this in the presentation—it all just shapes my point of view—but I’d like to write an appendix essay of sorts on this subject afterwards, for others likewise haunted.
The talk will be recorded and published next week. I’ll share a link here when it’s available. If you’re in San Diego, I believe you can attend in person.
Highlight-driven practice and comprehension support
Late last year, I designed and built a new prototype reading environment that centered on the question: what if highlighting actually did what people seem to wish it did? That is, what if it actually helped you absorb and retain what you read? In this prototype, I give readers a special highlighter that marks material for later practice and review. I also use the highlights as a signal for a reading comprehension support interaction. I adapted a linear algebra textbook to use my design and met in person with 14 readers to watch them study the text.
To summarize my report on that first series of tests: those first sessions were very promising; the interaction mostly worked as I’d hoped. But it was hard to get past a positive first impression in one-hour reading sessions. To really understand how the memory and comprehension support affect learning, I need to observe it in use over time.
So I switched to a depth-first testing mode. I met with one student, Tara (name and gender randomized), as she studied more of the same linear algebra text. We spent another 12 hours together over 6 weeks, working through several more sections of the text and quite a lot of problems.
Keeping in mind that this was an initial uncontrolled experiment, my qualitative impression is that the prototype meaningfully helped Tara become able to solve those problems. In the course of working through a problem, she’d often recite verbatim some detail that was on a prompt she’d practiced. Where she had trouble, it was almost never due to forgetting, and rarely due to comprehension lapses. It was usually due to problems of transfer and schema acquisition. Months later, I’d guess that Tara still remembers most of the conceptual material. And, having seen some lapses in our weeks together, I think she’d probably have forgotten much of it without support.
On the other hand, I wouldn’t say that my prototype’s impact was transformative. Tara still began each section’s problem set with quite a lot of confusion. The good news is that she was consistently able to push through that confusion and emerge with a pretty solid understanding. But it was problem-solving which proximally made that transition happen, not her review sessions. What I can’t tell is whether the review sessions and comprehension support made it possible for problem-solving to have that effect.
By way of comparison, another student I worked with (“Alex”) struggled enough in problem-solving that he usually wasn’t able to push through to the same kind of clarity. Those struggles led to this new design, and with this design, Tara was able to achieve some degree of fluency. But we don’t know if her relative success was because of this design. If so, that would be a big impact, even if Tara’s process still involved a lot of fumbling and difficulty.
I think the next step here is to do a more controlled experiment, one that would surface the counterfactual impact of the design more clearly. Maybe readers who don’t have this extra support are just as likely to end up fluent. Maybe most of them will end up utterly stuck. We’ll have to see.
A first attempt at automation
As a reminder, here’s how the prototype works. I played “expert” with this linear algebra textbook, highlighting all the details I thought were particularly important, and writing one or more practice prompts for each highlight. Then I pulled a wizard-of-oz: as readers highlighted the text, I’d manually map their highlights onto my expert-curated highlights, in realtime. When readers finished a section, all the prompts associated with their highlights would be added to their collection, and the comprehension support interaction would surface any expert-curated highlights which they skipped.
This procedure requires me to be present in realtime with test readers, to make the mapping between their highlights and mine. If I want to run a controlled experiment involving more readers, that would rapidly become quite time-consuming. So I spent some time early this year trying to automate the mapping process.
My initial round of test sessions with readers gave me a conveniently labeled data set of 130 highlight mappings. I used DSPy to bootstrap a 16-shot chain-of-thought prompt for GPT-4, but my best F1 was 0.26. Then I tried fine-tuning long-t5, but my best F1 there was 0.04. At that point, I paused this work to begin my Vision Pro prototyping month.
There are lots more obvious things to try here with more time:
map via embedding and cosine similarity—dumber but maybe better?
train a special-purpose classifier rather than repurposing a general language model
detailed evaluation of the selections—i.e. maybe it’s not choosing the same highlights that I chose, but its choices are close enough?
I expect I’ll pick this up again in the coming months.
Writing with Matthew Siu
For the last year, I’ve been collaborating with research fellowship winner Matthew Siu on a system to help with a familiar kind of sensemaking. It’s for when you’re stuck in a creative project. You have huge piles of raw, unstructured text (notes from meetings, brainstorms, journals, lab notebooks). You’re confident that you could figure out how to proceed if you could extract the right pieces, juxtapose them, and start seeing the whole. Unfortunately, all you have are these enormous text files and some scroll bars.
I’m happy to report that we’re wrapping up this project! We’re now two months into the report-writing phase, and the narrative is coming together. This is my first time coaching someone else through their first big research write-up (Matthew’s the lead author), and that’s been a rewarding learning experience for me.
Our solution is a new kind of transclusion design—one which lets you freely move back and forth between in-context annotation-style interactions (highlighting/commenting) and open-ended text editing for incremental organization and elaboration. I know that description won’t mean much on its own, but it’s hard to say a little more without saying a lot more. I’ll save it for the report.
From BookBridge to BookShots
Last December I shared BookBridge, a design concept I created in collaboration with Derrek Chow. The main idea is to bridge physical and digital reading environments. Using a document camera, we capture your reading sessions with a physical book. You can point at things on the page, gesture expressively, and talk aloud; we record all of that and link it in time and space. Then we produce a high-density digital representation of the reading session for later reference and synthesis.
We’ve done scrappy, smoke-and-mirrors prototypes, but the natural next thing to do is to implement some part of this to a high enough fidelity that I can try using it for some serious reading. I was stuck, for a while, because our prototype has so many complex interconnected parts that every next step I considered was too large to comfortably bite off. The tension is that the next prototype needs to be featureful enough to be legitimately useful and to push forward the design concept, but not so featureful that it becomes an implementation sinkhole.
But thanks to helpful conversation with Rob Ochshorn and Andrew Sutherland, I now have a next step I like, which I’m calling “BookShots”. The idea is that we often take screenshots of books and articles we’re reading on our phone, and share those in discussion on Twitter or privately with friends. The social setting creates a context to engage more deeply with the book and to relate it to other topics on our minds. So, just as I sometimes send voice messages instead of text messages, because they’re more expressive or intimate, what if I could send a little augmented video clip of me pointing at and talking about a passage of a book?
The video would include a spatially-aligned speech transcript, and some time-flattened representation of my gestures. This concept would push me to flesh out the gestural and speech transcript elements of the BookBridge concept, without requiring all the extra affordances necessary to navigate long sessions across the full length of the book. I like that this concept emphasizes expressiveness and intimacy (my hands, my vocal tone, the physical environment surrounding the book) over utility (searching, indexing, density). That feels like it’s pushing on an important difference in my approach versus other projects in this space. I also like that the BookShots notion would naturally prioritize drawing connections to my own ideas outside the book; by comparison, BookBridge focuses more on capturing reactive comments I’d make as marginalia.
All I’ve got so far are some sketches made before other projects took over my attention. I have a lot on my plate right now, and I’m not sure when I’ll circle back to this! Still, it’s nice to have a next step teed up.
————————
This year is feeling extremely dense already! My thanks, as always, for your support.
2024-05-01 05:01:01 +0000 UTC
View Post
GPT-4 recently reached its first birthday. Yet I confess I’ve still not metabolized the changed world we live in. I keep stumbling into plans and beliefs I formed in a pre-LLM world. Rather than the piecemeal updates I’ve been doing, I’d like to step back and understand: what transformational enabling environments become possible in a world with very powerful AI?
But to explore this question, I first need to understand something more fundamental: what is the purpose of learning and growth in a world with very powerful AI? What kinds of knowledge and capacity remain meaningful—or become so?
Contingency as a boundary condition
One common answer is that we’ll still need to know enough to instruct, supervise, and coordinate AI systems. As usually framed, that’s a moving target, one which depends on the AI’s capabilities. What will models be able to do in three years? In my lifetime? These questions rapidly degrade into guessing games. We can ask prediction markets, or consult scaling laws. But we can find more durable answers by exploring whatever bright lines we can find—places where it seems we can never completely outsource a task to AI, no matter how capable they become.
A few weeks ago, I asked GPT-4 to translate a fairly complex Javascript algorithm into Python. The output seemed mostly right, but it contained serious bugs which arose from nuanced differences in the two languages’ numeric primitives. I needed to understand each language well enough to detect the error and diagnose it. As it happens, even after I explained the mistake to GPT-4, it continued to emit variations with the same root problem, so I needed enough programming ability to write a patch myself.
But all that is a temporary state of affairs. I gave the model everything it needed to produce a correct solution. It could have formally verified that the translated program had identical execution semantics to the original. The model didn’t need any special information or clarification to improve its answer: it just needed the two languages’ documentation, and enough raw reasoning capacity to perform the translation correctly. So, we don’t need prediction markets or scaling laws to conclude that at some point—maybe next year, maybe in the distant future—I won’t need the knowledge I used to supervise that translation. That fact is a consequence of the task and its fundamental tractability[1].
Not all tasks are like this. For example, suppose that you’re a composer working on a cello concerto. Could you outsource this to an AI someday, like you can outsource that program translation task?
The first problem you have is one of communication. You have some inchoate sense of what you’d like to express, and you’ll need some knowledge to make those ideas legible to the model. Today’s generative audio models let you describe what you want in words. To control the output, you might need knowledge of vocabulary like “staccato”, or of concepts like altered chord structures. Words aren’t a good medium for communicating musical ideas, so you might also need knowledge of representations like lead sheets, or ADSR envelopes, or spectrograms.
But the main problem with outsourcing your cello concerto to the AI isn’t that it’s hard to precisely communicate what you want to the model. It’s that you don’t yet know what you want. Composing is a process of discovery—what Donald Schön has called “reflecting-in-action”. As you try different themes, you notice how each seems to interact with the musical landscape you’d imagined. These reactions guide the search process, but they also clarify and transform your own intent. You can’t know in advance what knowledge you’ll need, or what context the AI must have, because your attempts uncover that. You understand what the piece wants to become through the process of composing it. That doesn’t mean the AI can’t help you along the way. In fact, it will surely expand your creative reach. But you can’t outsource the process without producing a very different result.
In this way, the composition task is in a different category from the program translation task. But that category isn’t just for the arts. It also encompasses much of what Herb Simon has called “sciences of the artificial”: engineering, architecture, city planning, business, education, medicine, economics, and other sciences concerned “not with how things are but with how they might be”. For many tasks in these domains, the “right” solution must be iteratively negotiated, navigating tradeoffs in a tangled web of factors which neither you nor an AI can specify in advance. The important work often lies not in solving problems but in determining what the problem actually is, as it intersects our evolving and interacting preferences.
For example, people often wonder: why is software so hard to build? Why is it so often buggy and delivered far behind schedule? Why can’t we just specify what we want, and mechanically turn that description into predictably perfect software? Well, those tools do exist, and people sometimes use them to produce or verify especially crucial components. But they don’t get used much in practice, and I believe that’s because in most situations, we can’t specify in advance precisely how we want software to behave. Now, AI demands much less precision than traditional formal modeling tools. In simple cases, we can let an AI fill in the details of a high-level description like “write a Python script to combine these CSVs, removing all the empty rows.” But for more complex software, we discover how we want it to behave in the iterative process of building and reflecting, much as we discover what a cello concerto wants to be in the process of composing it.
So, if inherently contingent tasks like these need us to stay in the loop, to supervise an AI as it performs an increasing share of the work, what kinds of knowledge and capacity will we require?
Learning to steer
Tools like Midjourney or DALL-E are great for casually generating whimsical pictures. “Illustration of two golden retrievers in Central Park playing chess, at golden hour.” I don’t need any special knowledge or capacity to get a satisfying result. But in this scenario, I don’t have a strong expressive intent; I’m not trying to find a perfect fit for a complex situation. I just want something cute to use in a party invitation.
But I can also use these image models in more sophisticated ways. If I’m creating images to support some of my ideas in a high-stakes presentation, I’ll need to exert more control. I’ll iterate on the image over multiple rounds, noticing and expressing what’s working and not working in each image. After a dozen images, I’ll often realize I’d taken the wrong approach in the first place (“You know, what this really needs is…”), and I’ll reinterpret my past ideas in new light.
If I’m a professional visual artist, these image models may still meaningfully expand my creative capacity. But to use these tools with complex and personal expressive intent, I’ll need to exert even more control. I’ll have stronger opinions about what I want, visually, and I’ll need to express those nuances to tightly constrain the output space.
To get a controlled result, I can’t just say “I like this image” or “I don’t like that image.” The output space is too large. I’ll need to steer, and for that I’ll need some special expertise.
My mental repertoire of visual ideas and techniques determines how I can frame what I’m trying to do, and the breadth of “moves” I can imagine making. To make good suggestions, I need a strong model of how different framings and moves might impact the output, relative to my intent, as well as how those choices might constrain future moves. With each new image—each new experiment—I need to perceive the consequences clearly, both in the individual elements and in their contributions to the whole. Did the move have the result I expected? What other consequences did it produce, and how do they interact with my aims? Those observations will often shift my sense of what I’m trying to do, or push me to reframe my approach completely.
As a shorthand, I’ll call this “taste”: my repertoire of domain ideas and techniques; my model of how different frames and moves will impact the situation; my ability to perceive and evaluate the results. I’ve been talking about image generation in this section, but I think the same arguments apply to the professional domains I described earlier, like software design, architecture, business, medicine, and so on. Domains outside the arts might call this concept “professional judgment” or “decision-making expertise”, but I claim there’s a unity here in one’s capacity to steer in highly contingent domains.
In a future with powerful AI, I believe we’ll still need this kind of taste to work deeply in these domains[2]. When the task depends on humans’ messy situations and ill-defined preferences, and when the action space is so high-dimensional that guess-and-check is intractable, this is the capacity you’ll need to discover what image (or software, or whatever) you want to create. Use of AI will expand what you can imagine and what you can reach, but you’ll still need to steer. In fact, for some tasks, I expect the kind of taste I’ve described will become much more important, as AI increasingly handles less contingent elements.
What about technical expertise?
So far I’ve mostly discussed our ongoing need for fuzzy heuristics and instincts. But today, most learning focuses on acquiring information, detailed conceptual understanding, and technical fluency. How might that change in a future with powerful AI? If I’m an inventor of electronic gadgets, should I still study physics? If I’m a composer, does music theory help me? If I’m an architect, should I still hone my drafting skills?
I find myself quite uncertain here—I think answers will vary quite a lot—but a few considerations come to mind:
Media of communication. Abstract terms and concepts should generally remain important for communication, both with AI systems executing on our behalf and with human collaborators. You’ll need to describe your situation’s constraints; express your idiosyncratic intent; understand and supervise tradeoffs. All this will often benefit from technical language. Domain-specific representations (like music notation) are likely to remain important for high-bandwidth expression.
Media of purpose. Your intent—even internally expressed—will often be framed in terms of complex domain-specific ideas. What kind of understanding would you need to invent Bitcoin, if you had powerful AI to help you? One important element would be a clear sense of what you were trying to achieve. Quite a few digital currencies had been invented, prior to Bitcoin, but they each had some important problem. Often these problems were quite technical. To conceive of something like Bitcoin as a goal, you’d need to understand issues like sybil attacks and tradeoffs around deflationary monetary policy. Another way to look at this is that your evaluation function—your sense of a given solution’s value—will depend in part on your domain knowledge.
Repertoire-building concepts. I’ve suggested that one component of taste is your repertoire of domain ideas, moves, and frames. You can build this repertoire by collecting a lifetime of one-off experiences, but it’s awfully inefficient. Conceptual understanding lets you move up the ladder of abstraction, unifying these isolated elements and suggesting others. For example, in cooking, many people make lots of one-off recipes without building any creative flexibility. But if you study the concept of braising rather than making a few recipes which happen to use that technique, you’ll see the unity in those dishes, and more clearly see their differences. You’ll see braising as an abstract “move” you can deploy in many contexts. Say that your AI cookbook suggests a steamed fennel dish. If it’s a cool night, you might realize that you feel more like braising it—but only if you can think in terms of braising, rather than individual recipes.
Bootstrapping ingredients. Another component of taste is your model of how different moves might impact what you’re working on. When you reach instinctively for one move over another, you’re leaning on patterns observed over long experience. But if an AI solves every problem for you start-to-finish, how will you ever learn to recognize these patterns? This suggests that you need technical knowledge just to bootstrap yourself—to apply basic moves in different contexts so that you can internalize their behavior and, perhaps, defer more to an AI in the future.
Loss of agency. Consider the negation: what would be lost if we outsourced all technical knowledge to AI systems? Here I think of Neil Postman’s Technopoly and Ivan Illich’s Tools for Conviviality. When we understand our tools, we can shape our environment and our society according to our values, rather than (exclusively) being shaped by our tools/environment. By contrast, when tools and systems we depend on are opaque, and few understand them, that can concentrate power among technocratic elites and risks disenfranchising individuals from personal and community autonomy. In a world with strong AI, we might want to maintain detailed technical knowledge just to maintain participation in our own destiny.
Fine: these observations suggest you’ll still have need for some technical knowledge in a future with powerful AI. But which kinds of knowledge? Surely some become less useful and others more so.
One shift I expect is in the value of knowledge involved in certain kinds of well-defined execution—parts of tasks where the expected inputs and outputs can be clearly specified. In a world with Mathematica, there may be value in learning how to factor friendly quadratics into binomials, since that helps illuminate the concept of zeros. But I’m not sure how much value remains in memorizing the quadratic formula. Likewise, with today’s AI (and with my broad domain knowledge) I don’t really need to know the details of a programming language’s syntax to work effectively in it.
Work in highly contingent domains may be difficult to outsource, but it will often include straightforward subtasks with few important tradeoffs. If there’s knowledge like language syntax that’s confined to those subtasks, without affecting the more contingent overall task, we may find we no longer need it. The same observation applies to procedural skills involved only in those subtasks. My optometrist used to operate a complex machine to assess my myopia; now an automated laser does that work. But that’s just one part of my appointments, and I’m glad I can still talk with her about the messy tradeoffs involved in choosing a prescription.
It would be interesting to audit curricula in various domains with this notion in mind. How much could be removed? What else should be added?
One concern I have here is that insights and revelations can appear where they’re not expected. Sometimes when I’m doing fairly mechanical interface design work, I notice an opportunity I hadn’t seen before. A physician doing a routine exam might notice something subtle, something that wasn’t even supposed to be tested. I’m not sure how to think about these tradeoffs.
Meaning in doing
Here’s another bright line we can draw: you’ll still want knowledge and capacity for activities where the meaning comes from doing it yourself.
Even if an AI can synthesize a perfect rendition of a melody performed on cello, lots of people will still want to learn to play cello because there’s tremendous pleasure in producing the sound. Many writers might be happy to use automated spell-check, but wouldn’t want to give up the satisfaction of honing a beautiful phrase.
If you read Aristotle or The Bible now, I expect you’ll still want to read it in a future with powerful AI. The meaning in philosophical and spiritual contemplation comes in large part from within. Some part of that meaning may also come from discussion with others, and perhaps one day an AI can help with that, though for the moment I find that I can’t emotionally connect to its responses in discussions on these topics.
I’d be very happy if our AI-powered future allowed more individuals to experience the pleasures of deep action in more domains. I see glimmers of that in non-artists’ experiences with Midjourney, and in non-programmers’ accounts of using GPT-4 to build software.
Some remaining questions
Hard sciences. So far I’ve been talking about arts and “sciences of the artificial”. What about the hard sciences? What do cellular biologists of the future need to know? Here I find myself quite uncertain. If you’re curious about how a particular cellular mechanism works, that’s a complex, ill-defined question, but I’m not sure how contingent it is. It’s a question about objective reality. It’s true that the best answer may depend on who’s asking and why, but there’s plenty of precedent in the scientific literature for what a good characterization of a cellular mechanism looks like. The cellular biologist needs to know enough to ask meaningful questions. Perhaps that requirement alone entails the rest of a typical syllabus; I’m not sure. As Michael Nielsen points out, AI-driven science will likely be bottlenecked on experiment in domains like biology “where historically contingent facts about the world crucially impact a phenomenon”, but then—is our role only in facilitating the experiment?
The limits of taste. Are my claims about taste always true? Can’t a sufficiently powerful AI learn my taste, then produce exactly what I’d want, given all the messy details of my situation? I think the answer will depend on just how personal, illegible, and demanding my intentions are. If I want my cello concerto to precisely capture the ineffable details of my inner emotional world, it’s hard to imagine an AI producing that unless it can accurately simulate my subjective experience. This seems less true in many professional situations, even ones which currently seem quite contingent. Diagnosticians need to use what I’ve called taste to steer their examinations and reflect on their findings. But a multimodal AI should in principle be able to reach the same diagnosis. This situation is pretty objective. An architect’s work is something of a middle ground—there are objective problems to be solved, but the solutions will also express the architect’s somewhat illegible creative preferences.
Expanding the pie. In this essay I’ve implicitly assumed the present world of professions and activities and kinds of knowledge. But in a world with powerful AI, I expect we’ll spend our time in very different ways. We’ll have new kinds of hobbies, new kinds of art, new scientific fields, new professions, new institutions. Without knowing what those things are, can we say anything about what kinds of learning and growth might be valuable for all these new pursuits?
————————
Thanks to Catherine Olsson, Jason Crawford, Joe Edelman, Laura Deming, Michael Nielsen, Sara LaHue, and Sebastian Garren for helpful conversations. Thanks also to David Chapman for introducing me to the work of Donald Schön.
————————
[1] Yes, fine: some kinds of program translation are theoretically intractable in the general case. For example, suppose you’re translating from Python to a language that is only defined over total functions. We might like our AI to tell us when an input is untranslatable, but by Rice’s Theorem this is impossible in general, because totality is undecidable. Our AI must sometimes emit an invalid program or fail to halt. I can instead claim that we know of no principled reason why there should be a difference between the set of program translation tasks a human can perform and the set which a sufficiently capable AI can perform.
[2] Of course, one of the wonderful things about AI is that it will radically lower the floor to these domains. If you don’t have much taste, generative models will expand your capacity quite a lot. They’ll let you access some coarse result, one you couldn’t have gotten otherwise, and in many shallower situations that’ll be enough.
2024-04-01 05:49:41 +0000 UTC
View Post
Thank you all for a very interesting discussion!
Please don't share this recording publicly.
And here's the chat transcript with some links which were mentioned:
00:12:09.188,00:12:12.188
Fabien (Utopiah): ref to the VUB research I mentioned https://wise.vub.ac.be/index.php/publication/towards-framework-dynamic-data-physicalisation
00:12:18.256,00:12:21.256
Fabien (Utopiah): specifically https://wise.vub.ac.be/index.php/topic/tangible-holograms-tangho
00:12:25.369,00:12:28.369
Andy Matuschak: Gray, Xoromancy: https://www.graycrawford.com/xoromancy
00:12:44.491,00:12:47.491
Fabien (Utopiah): (2017 ... so a long while ago)
00:19:11.043,00:19:14.043
Fabien (Utopiah): on notation that could be useful for XR https://lig-microglyph.imag.fr/ but "just" for motions
00:22:34.017,00:22:37.017
Silka Sietsma: This is also interesting for spoken language translation. What if the tonal/emotional inclination could be translated from a British speaker to an Italian translation.
00:26:10.273,00:26:13.273
Ian Stewart: What are some sources you could recommend about the idea of writing making it easier to think in abstractions?
00:28:41.175,00:28:44.175
Fabien (Utopiah): on writing to think, one my favorite quote trying to illustrate
00:28:45.012,00:28:48.012
Fabien (Utopiah): No, it's not a record, not really. It's working.
You have to work on paper, and this is the paper. Okay?
Richard Feynman in his discussion with Charles Weiner (Gleick, 1992)
Discovered in Supersizing the Mind by Andy Clark
00:28:55.811,00:28:58.811
Ian Stewart: If I had maxed out my gaming PC, it would have been $3500 or more
00:31:27.837,00:31:30.837
Fabien (Utopiah): visualizing and editing my home assistant rules, can literally do that today
00:31:46.120,00:31:49.120
Silka Sietsma: Point at your pet & say feed & and the cat feeder releases food.
00:33:56.173,00:33:59.173
Ian Stewart: the touch bar is back... and this time, it's EVERYWHERE!
00:36:11.723,00:36:14.723
Fabien (Utopiah): "just" a keyboard version https://www.crowdsupply.com/polykybd/polykybd with one screen... per key
00:40:39.007,00:40:42.007
Andy Matuschak: https://folk.computer
00:46:31.775,00:46:34.775
Bill Roberts: @Andy have you been using any of your prototypes for piano practice consistently? Or were they more one-off efforts / experiments?
00:46:38.976,00:46:41.976
Harley Turan: I've spoken to folks who've been typing on the Vision Pro pre0release, and they've said they are able to hit pretty high WPM by rapidly tapping their fingers like a crab with both hands whilst glancing at characters.
00:47:53.374,00:47:56.374
Ian Stewart: I'm trying to learn guitar, and that means I've got six pianos on a stick!
00:48:59.905,00:49:02.905
Harley Turan: Move controller / wands
00:51:41.990,00:51:44.990
Silka Sietsma: do you think voice and gestures is a solution for mode change
00:52:47.987,00:52:50.987
Ian Stewart: like playing a theremin!
00:52:55.574,00:52:58.574
Ozzie: this post on the issues with volumes was helpful: https://www.lunarskydiving.com/blog/volume-limitations/
00:54:22.281,00:54:25.281
Ian Stewart: thank you!!!
2024-03-04 03:00:03 +0000 UTC
View Post
I spent February trying something new: a creative holiday of sorts, occasioned by the Apple Vision Pro’s release. I gave the month over to experimentation in this new medium. To free myself to play, I made a rule that I wasn’t allowed to work on any past ideas or projects. Very coarsely, I wanted to know: what new questions and ideas does this system provoke in me? What new space of mediums becomes possible with high-quality passthrough, world tracking, and gaze-based input? What other affordances must be added—or discarded—to produce a medium of my dreams? A few weeks later, I can’t yet answer these questions to my satisfaction, but I’ll share what I’ve observed so far as a newcomer to this area.
Masks, glasses, and ubiquitous computing
A sacrificial form factor for living in the future
The Vision Pro uses a bristling camera array to show the outside world on the inside. This technique is called “passthrough”. It’s a huge improvement on older headsets, which completely blocked out the world. Still, like many, I don’t like the idea of spending hours each day with an opaque computer on my face. That said: I do wear glasses every waking hour of every day. If all this technology could be someday be compressed into normal-ish glasses, I’d happily wear them everywhere.
So, one way to view passthrough-based devices is as disposable stepping-stones to design and prototype for the glasses they’re supposed to become. Today’s transparent headsets, like Microsoft’s HoloLens, are so limited that I find it tough to get inspired as a designer. Passthrough lets me pretend to live in a possible future with ultra-high-quality transparent displays, and to explore what I might create in that future.
But there’s another possible path, arguably more compelling: the future computer’s not on your face, covering the world in private hallucinations. It’s out there in your environment, through some combination of pervasive chips, screens, sensors, projectors, and actuators. This is the vision of ubiquitous computing (“ubicomp”) and the agendas which sprung from it, with Dynamicland the most recent exemplar:

In this world, you’re surrounded by objects which can sense and respond. The blueprint lets you try and compare alternatives; the sticky notes remember their prior arrangements; your kitchen prep bowls show you what should go where. You’re interacting with physical objects, so you recover the subtle feedback of tactility and the pleasure of embodiment. This vision promotes social richness, not social isolation: the computational material is out there in the world, not hidden on your face, so it can be co-viewed and co-created just like any physical object. In this vision, computers recede into the background, like the electricity running invisibly through your walls, rather than a screen directly mediating your view of reality.
Those who favor the ubicomp agenda usually scorn headsets. In the ’91 paper originating that term, Mark Weiser writes:
Perhaps most diametrically opposed to our vision is the notion of virtual reality, which attempts to make a world inside the computer. Users don special goggles that project an artificial scene onto their eyes; they wear gloves or even bodysuits that sense their motions and gestures so that they can move about and manipulate virtual objects.
…
Even today, people holed up in windowless offices before glowing computer screens may not see their fellows for the better part of each day. And in virtual reality, the outside world and all its inhabitants effectively cease to exist. Ubiquitous computers, in contrast, reside in the human world and pose no barrier to personal interactions.
Mark’s talking about traditional virtual reality headsets, but much of his opposition still applies to augmented reality glasses. He wants a world where people move around and interact with each other and with physical objects: “only when things disappear in this way are we freed to use them without thinking and so to focus beyond them on new goals.” Tactility, sociality, and physical fidelity are all missing in the world of magic glasses.
It’s ironic, then, that one way to view the Vision Pro and its ilk is as a way to prototype for a future along the lines of ubicomp—one where computational surfaces are too cheap to meter, and no one’s wearing anything on their face.
Simulating ubicomp with the headset requires a little more imagination than than simulating augmented glasses. Virtual objects have no tactility, for instance. This leads me in different design directions: instead of floating windows, prototypes in this mindset involve tracking physical objects and projecting simulated behavior onto them using the headset display. You could call that “mixed reality”, but the difference in framing really is powerful. I have different ideas when I think of my explorations in terms of future augmented reality glasses, versus an ubicomp future without headsets.
What passthrough can do that glasses and ubicomp can’t
Now, let’s reverse our examination. In what sense is the passthrough-based form factor not just a “worse-but-tractable” version of future augmented reality glasses or ubicomp environments?
One obvious answer is that only opaque headsets can block out the world completely, whether to render a fully virtual scene, or just to obscure your unpleasant surroundings when in a plane or open-floor-plan office.
More interesting, to me, is that passthrough-based headsets can distort reality. They’re capturing the external world and reprojecting it internally, and so they can change what is reprojected. The Vision Pro already does this in subtle ways: it relights your hands when using immersive environments; it simulates dimmed lighting when viewing media; it casts virtual emissive light from videos onto your walls.
How might it be useful to more dramatically warp the external world?
I play the piano. Recently I’ve been trying some honky tonk, a style which features lots of big jumps in the left hand. The trouble is that I need to look at my hand to aim those leaps while also reading the sheet music. I end up whiplashing my head back and forth. Often I’ll need to memorize a passage—so that I don’t need to see the sheet music anymore—before I can really work on the hand mechanics.
So I built this prototype, which “warps” a view of my hands on the keyboard, superimposing them just under the sheet music:

(view full clip)
It’s not quite workable: the hand tracking is a little too imprecise and the world tracking a little too unstable. But it’s right on the edge. In this prototype, I use hand tracking data to render virtual “hands” and a virtual keyboard, but if I had access to the device’s cameras, I could superimpose the downward view directly into the forward view, simulating a strange sort of prism lens.
Now, some pianists would say that instead of helping myself see my hands, I should practice knowing where they are without looking. One of my teachers built a device for that purpose: a long board which would cover the keyboard, with side pieces to lift it a few inches above my hands. Playing with this really did force me to build stronger proprioception. The trouble was that every ten seconds or so, I’d lose my place on the keyboard. I’d need to reorient myself, but a five foot board is cumbersome to shift for a quick peek. And of course, to move the board, I needed to use at least one hand, “losing” its position.
Another kind of distortion available to passthrough-based devices is subtraction: dynamically redacting portions of the scene. In this prototype, my view of the keyboard is obscured, but if I gaze at my hands, they become visible as I lean towards them:

(view full clip)
I imagine that subtractive interactions like this could be useful for learning to operate other machines by touch, or by sound. One could also use this technique to create progressive scaffolding for physical objects. Suppose you bought a fancy digital camera with many knobs and buttons, but you don’t know how to shoot manual exposures. A training system could blur out everything but the shutter button and the focus dial. Then, once you’re comfortable, it could reveal the aperture dial, and so on.
Actually, subtraction seems more possible for augmented glasses than warping. Maybe you could implement it with thin-film transistors, like those fancy conference rooms with glass walls that become opaque when you flip a switch. I don’t know if those films can be made optically transparent enough when sitting so close to your eyes.
What headsets can do that ubicomp can’t do
It’s tempting to think of all headsets—even futuristic glasses—as a worse version of ubicomp visions, which would replace elaborate private hallucinations with physical dynamic media that we can touch and share, together, out in the world. So it’s helpful to ask: in what ways is this not true? What good can headsets do that ubicomp systems can’t, even if we have holographic displays and computers too cheap to meter?
One obvious answer is privacy. Headsets enforce this as a default much more aggressively than I’d usually want, but privacy does have its place. One ed-tech founder has told me that when students are trying something for the first time, they feel much more comfortable when they can keep their confused work private from their classmates.
A more stimulating consequence of privacy is asymmetry. That is, the same environment can present you and me with different dynamic representations. In card games, you often hide your hand from others; with a headset, you can hide big objects from each other. Asymmetry can make for interesting collaboration: in Keep Talking and Nobody Explodes, one player needs to defuse a complex bomb; the others can’t see or touch it, but they have the information needed to defuse it in a big technical manual; shouting ensues. More practically, if you and I are collaborating on a scale model of an architectural plan, but we have different specialties, each of us might want to show or hide different layers of the plan (framing, plumbing, electrical, etc), even as we work in the same physical space. If two trainees are collaborating on a procedure in an industrial plant, but one is more experienced than the other, it may be important to show different levels of scaffolded overlays on the machines.
Switching gears: gargantuan interface elements are more natural to headsets (though they can sometimes be implemented in ubicomp). For example, in the following sketch with Gray Crawford, we explored his idea of presenting a physical log of a multi-person conversation in an enormous strip stretching up through the ceiling and floor.

Another feature of that prototype is free-floating dynamic 3D elements. Those also seem difficult to achieve in an ubicomp system, unless Star Trek-style projected holograms become physically possible.
Headsets make it possible to anchor interface elements to a user’s head pose. These could be heads-up displays (a clock, a reminder of my next task), or more elaborate elements (keeping my sheet music visible no matter where I look). These kinds of interfaces often appear in science fiction designs, but I’m not sure how useful they actually are. A smartwatch can handle many of these needs, often more naturally. For other cases, in an ubicomp world, you could just make a fancy hat, like a bicyclist’s helmet with attached mirrors.
Interactions in space
Revolutions in computing often coincide with new methods for input (light pen, mouse, touch, gaze?) or output (teletype, bitmap display, smart earbuds?, head-mounted displays?). Spatial computing involves big changes on both sides. We’ve focused on output so far in this discussion; now let’s consider input.
Gaze and hand tracking
The detail which most surprised me at the Vision Pro launch announcement was Apple’s decisive dependence on gaze tracking: the system’s central interaction is look-and-pinch.
This model is more analogous to the mouse’s point-and-click than it is to touch: much of the time it feels like indirect interaction, like moving a pointer (with your eyes) and clicking it (with your hand), rather than iOS’s direct interaction, which feels like reaching out and directly manipulating an element. I think this is because we don’t directly act on objects with our eyes in the world. If I want to press a physical button, I reach out and press it. On iOS, I do the same. On visionOS, I look at it and, with my hand at my side, make a gesture. It’s a method of indirect action, like clicking a mouse at my side. Likewise, if I want to slide a physical sheet of paper, I might move it with my finger. I may or may not be looking at it. Scrolling on iOS works the same way, with 1:1 tracking. By contrast, if I want to scroll a visionOS view of a sheet of paper, I look at it and move my wrist, which forms a sort of loose elastic connection with the content—looser than scrolling on a trackpad.
In terms of creating a direct connection between intention and action, the indirection feels like a step backwards to me. Though, of course, I understand the decision ergonomically. The consequence is that even though the device is “hand controlled”, I’m not really using my hands in the rich sense that I use them in the physical world, or even in the dextrous sense that I use them on a smartphone. The feeling is more like an assistive peripheral, using gaze and hand tracking to implement a Bluetooth mouse. It’s astounding in its straightforwardness; it’s an astounding feat of engineering. Yet I’m left wondering what else we might do with these miraculous sensors.
Might gaze tracking unlock some more alien possibilities?
As a reader, I’d be very interested in leaving traces of my gaze on the pages. If I needed to re-read a sentence multiple times, that would jump right out. If I skipped over some material completely, that would also be clear. Perhaps ideas along these lines could be incorporated into the BookBridge project I’ve previously discussed.
In many types of meditation, one is instructed to adopt a soft, unfocused gaze. I can imagine creating a biofeedback interface which would help meditators enter and remain in that posture. One could also use gaze information to provide biofeedback around certain states of distraction.
Or, imagine falling into a fractal: whenever your gaze holds still on any part of it for a moment, you zoom into that part. The more you look, the more you fall into that spot. Yet anywhere you look, there’s always more detail, always unfolding, forever.

Ken Pfeuffer, lead author on the paper originating the core look-and-pinch interaction in 2017, has proposed a variety of other interesting multi-modal interactions. In “Gaze-Shifting”, he and co-authors suggest how a pen and gaze can be combined to permit complex indirect interactions without “losing” the pen’s location:

And in “PalmGazer”, he demonstrates a gaze-driven menu anchored in the user’s hand, freeing the other hand to perform simultaneous actions in space (like drawing or sculpting).

Head tracking
The headset makes continuous realtime estimates of the user’s head pose in 3D space. This information is mostly used to render virtual interface elements within the environment. If you turn your head, the virtual elements turn along with you.
But your head motion is also itself an input channel on these devices. What interesting new interactions does that make possible?
I was tickled by Matt Webb’s 2022 suggestion that leaning might make a great interaction for headsets. We naturally lean forward when we want to see an object in detail. For virtual objects, that could mean not just optical zoom, as it does in the physical world, but also semantic zoom. That is, objects can change their form as you lean closer to show more or different channels of information.
When I came up with the piano keyboard cover concept I showed earlier, I was at first unsure how I would allow the user to toggle the keyboard cover. A system standard pinch-tap would mean “losing” one hand’s pose. I thought briefly about connecting it to a foot pedal I use to flip pages in sheet music. Then I realized: leaning! To see your hands through the cover, just lean towards them. It feels incredibly natural and direct—much more “native to the medium” than look-and-pinch (which is of course much more flexible).

Leaning is also a continuous interaction. As I move closer to the piano, my hands fade in. If I just need a subtle hint, I can subtly lean. If I’m totally lost, I lean more. I get continuous feedback at every frame. There’s something ideological here for me: I think continuous interactions are fascinating and underexplored. One of my favorite interface designs ever is Alessandro Sabatelli’s magnificent 2013 leveling tool, now integrated into the iPhone’s “Measure” app (see GIF below). Chan Karunamuni’s 2017 gesture-based multitasking system is work of astounding originality and craftsmanship. Of my own collaborations at Apple, my favorite projects involved continuous interaction design: parallax, the back navigation gesture, the 3D page curl, etc. visionOS is almost entirely discrete, and I want more of the continuous.

(view full clip)
The other subtle miracle of head tracking is that it trivializes locomotion in mixed reality spaces. Virtual reality designers and researchers have spent decades trying to figure out how to let users move around in virtual environments. If you tie motion to a joystick, like in a video game, most people quickly become nauseous: the disagreement between the eyes’ perception of motion and the body’s (lack of) physical motion is very uncomfortable. So there’s a zoo of discrete locomotion systems. Often you point to where you want to go, then teleport with a momentary fade-to-black for comfort. Some continuous solutions have managed to subdue nausea by linking motion of the arm to motion of the body (for example, see Gray Crawford’s “waft” locomotion).
My point here is that moving around in VR is a surprisingly non-trivial design problem. But: with a working passthrough-based headset, you can walk around by just physically walking around. Yes, there’s a huge amount of computational complexity hiding there. From a design perspective, though, this “trivial” solution gracefully resolves a decades-long research question for most cases I care about.
In the traditional VR framing, the goal is to allow exploration of an arbitrarily large virtual world. So your home’s walls quickly become a problem, unless you have a baroque omnidirectional treadmill. And you’re blocking out the real world, so you’ll probably trip over furniture. So most VR experiences have you sit physically still, or move only within a small radius, while using indirect interaction to let you virtually move further when necessary.
The miracle of passthrough is not that it solves these problems. Arbitrarily large virtual worlds still require elaborate solutions. But, it turns out, most of what I want to do with head-mounted displays can be happily confined to the bounds of my living room. I’m happy viewing and positioning dynamic media elements in my home: I don’t (usually) need to be transported to some fully virtual environment. I can adapt the virtual environment to the constraints of my physical environment. All this reframes the problem. I don’t need arbitary locomotion. I just need to be able to move about my home, and position elements within it.
Then we can take advantage of the user’s location within the room as an input channel of its own. The prototype below is of an immersive choir experience.

(view full clip)
If you stand in the center of the room, you’re surrounded by voices from all parts. (This, on its own, is a remarkable experience!) But as you move towards any one part, it will become louder and the others much quieter, as if you’re off in a corner with your fellow tenors learning the part together. If there’s a place where your part has an important duet with another, you can move towards them while singing your own part to hear how the harmony shifts. The whole thing is a continuous interaction: you can sing along with the full choir, then shift a little to hear the sopranos better for one bar, then move back, then leap over to the tenors for a tricky passage.
This is the kind of demo you’d often see in historical virtual reality papers. Because headsets were opaque, the designer would have to create a virtual environment for you to move around in. People called this “room-scale VR”, and you’d traditionally prepare for it by piling up your furniture against the walls of your living room, so you didn’t run into anything. But there’s no reason this demo needs to be in a virtual environment. By situating it in my real living room, I don’t need to move any furniture. I naturally avoid any obstacles in my path.
(Incidentally, this demo is another example of warping external reality. Earlier, in the prototype which displayed a view of my hands alongside my sheet music, I warped visual space. In this demo, I warp acoustic space: even though the parts are only a few feet from each other, I exaggerate their effective acoustic distance, so that when you’re near one, it’s as if the others are very far away.)
Cameras and object tracking
Apple assiduously avoids the terms “virtual reality”, “augmented reality”, and “headset”, preferring “spatial computing”. What do Apple’s designers think that phrase means? I think the charitable interpretation is that they aren’t sure yet, or maybe that Apple simply hasn’t shown us yet.
What we see, so far, is that spatial computing is computing as you know it—except situated in your physical space. You can open iPad apps and position them as glowing panels around your home. There’s little meaningful relationship between the app and the physical environment, except the world-tracked position. We could think of this as “spatial” in the sense of “having a lot more space”, but in practice, with the display resolution and gaze precision we have today, it’s tough to productively arrange more interface real estate on this device than on my Mac.
What does “spatial computing” want to become? My provisional answer: I want to break down the walls between the dynamic world and the physical one. I want to imbue objects in my environment with the magic of computation. I want the opposite of arbitrarily-positioned floating windows with no semantic relationship to my environment. I want to interact with dynamic media in physical space, mostly by way of physical objects, using my full body and sensorium, alongside other people. That is, I more or less subscribe to the ubicomp vision, but I don’t have a strong opinion about whether I need to wear a headset to get there.
Very concretely, I’d like to create doing-centric explanatory mediums which involve objects and tasks in the physical world. I’d love to learn how to use machine tools by way of dynamic explanations which respond interactively to my actions, as in AdapTutAR:

Here’s where the cameras come in. Obviously, they’re what make the hand gestures and the world tracking possible. But they also enable the device—in principle—to understand the objects in my environment. They can provide the basic pose information I need to imbue physical objects with dynamic hallucinations. This is “object tracking”, and Apple doesn’t yet offer that functionality on the Vision Pro, though other devices do at various levels of sophistication. My sense is that frontier transformer models like Cutie are close to basically solving this problem, if not quite yet in realtime.
But the cameras can do more than track objects’ poses. In principle, they can see which keys I’m playing on the piano, infer what I’m cutting on my cutting board, spot which page I have open in my book. I want to be able to write on my whiteboard, with a marker, as a colleague writes on their whiteboard a thousand miles away, and to see their marks alongside mine as if we’re writing on the same wall.
For now, these deeper analyses all require purpose-built computer vision pipelines. That doesn’t align well with Apple’s privacy strategy, which tries to limit the need to expose users’ live camera data by offering generic models for analyzing the user’s hands, 2D image tracking, plane detection, and so on. Ideally, the kind of contentful sensing I’ve described would be possible without deep computer vision expertise. That way, we’ll get a more interesting range of experiments. Perhaps future large multimodal models will be able to perform realtime video analysis tasks with a developer prompt like “detect which key I’m touching on the piano.”
Given my interest in deeply fusing physical objects with dynamic media, the Apple Vision Pro is a surprisingly limited platform for experimentation. It just doesn’t expose the data I need to explore most concepts. Happily, I’ve learned enough this month to form a much clearer sense of my interests in this space. I find myself full of new ideas and curiosity—I’ll call that a successful creative holiday.
————————
My thanks to Bryan Clark, Gray Crawford, Laura Deming, Luke Miles, and Michael Nielsen for helpful conversation, and to the Dynamicland team for its deep influence on my thinking in this space (however much they might dislike headsets!)
2024-03-01 06:42:59 +0000 UTC
View Post
As you'll see shortly in this month's essay, I've spent the month experimenting with the Vision Pro, trying to understand the spatial computing medium and what it might want to become.
I thought it would be nice to create a space here for us to discuss observations and possibilities. What questions and ideas does it provoke for you? Or, if you prefer, you can discuss any reactions you might have to my upcoming essay.
Join me (via Google Meet) this Sunday, March 3rd, at 9 AM PST [GCal]. I'll record the discussion and publish it here afterwards.
2024-02-29 00:28:46 +0000 UTC
View Post