Andy Matuschak

A spring flood of projects

Added 2024-05-01 05:01:01 +0000 UTC

I’ve devoted the last few months’ letters to conceptual essays, so I’ll return this month to updates on my various projects. Things are happening!

“How might we learn?”

Haijun Xia and Jim Hollan at UCSD’s Design Lab kindly invited me to give a Design@Large lecture in May. I decided that I’d use this talk as a forcing function to figure out a compelling vision for how learning might be transformed by powerful AI. Or, well, at least to try.

Two forces inspired this effort. First, as I mentioned last month, I simply haven’t updated sharply enough on the remarkable growth of model capabilities. I keep stumbling over heuristics and conclusions which need to be re-evaluated, given where things seem to be headed. It’s unsettling each time I find one of these—a sign that I’m fighting the last war. So I want to aggressively invert the situation and ask: if I take these capabilities very seriously, what do I find myself excited to create? All of this is complicated by my deep moral concerns around AI, but I think there’s a path through.

The second force motivating this work: I find myself irritated at almost everything that almost everyone says about the intersection of learning and AI. But that’s good! That kind of annoyance is a sign that I might have something interesting to say, if I can tease apart the differences in my view.

If I do a good job, I’ll end up looking at my research in a very different way. My agenda may include some of the same projects, but they’ll be subsumed into a different whole, viewed from a new, and hopefully more powerful, perspective.

The title is “How Might We Learn?”. Here’s the description:

When people talk about the most rewarding, high-growth periods of their lives, a pattern emerges: they learned a lot, but learning wasn’t the point. Instead, they were immersed in some purpose with real personal meaning—like a startup, a research project, or a burning question—and they learned whatever was important along the way.
If these experiences are so rewarding, why are they so rare? Why can’t we learn everything by “just diving in”? Why does learning so often fail to work as we hope, leaving us with brittle, fragmentary understanding? In this talk, I’ll propose some paths forward and suggest how AI could help us create powerful new kinds of enabling environments.

The talk is in a week, and I’m deep into hermit mode, working unsustainably. I feel a great deal of pressure—all of it utterly self-induced. I haven’t even told my hosts at UCSD what I’m talking about. Thankfully, I’m not desperate for the approval of academic human-computer interaction researchers. But I do feel the weight of my own expectations, which I set much too high, given the scope of the topic and how little material I started with. I’m slowly internalizing a correction, but that takes some time.

Of interest to some of you: I’m using this opportunity to (hopefully) exorcise myself of the Young Lady’s Illustrated Primer, which has haunted my aspirations here for almost half my life. I understand much better what I want to take from it, and what I want to transcend. I also think I understand how it’s possible that so many technologists have been inspired to “build the Primer”, and yet most are so ignorant of what Stephenson’s book actually says about it. I doubt I’ll actually talk about any of this in the presentation—it all just shapes my point of view—but I’d like to write an appendix essay of sorts on this subject afterwards, for others likewise haunted.

The talk will be recorded and published next week. I’ll share a link here when it’s available. If you’re in San Diego, I believe you can attend in person.

Highlight-driven practice and comprehension support

Late last year, I designed and built a new prototype reading environment that centered on the question: what if highlighting actually did what people seem to wish it did? That is, what if it actually helped you absorb and retain what you read? In this prototype, I give readers a special highlighter that marks material for later practice and review. I also use the highlights as a signal for a reading comprehension support interaction. I adapted a linear algebra textbook to use my design and met in person with 14 readers to watch them study the text.

To summarize my report on that first series of tests: those first sessions were very promising; the interaction mostly worked as I’d hoped. But it was hard to get past a positive first impression in one-hour reading sessions. To really understand how the memory and comprehension support affect learning, I need to observe it in use over time.

So I switched to a depth-first testing mode. I met with one student, Tara (name and gender randomized), as she studied more of the same linear algebra text. We spent another 12 hours together over 6 weeks, working through several more sections of the text and quite a lot of problems.

Keeping in mind that this was an initial uncontrolled experiment, my qualitative impression is that the prototype meaningfully helped Tara become able to solve those problems. In the course of working through a problem, she’d often recite verbatim some detail that was on a prompt she’d practiced. Where she had trouble, it was almost never due to forgetting, and rarely due to comprehension lapses. It was usually due to problems of transfer and schema acquisition. Months later, I’d guess that Tara still remembers most of the conceptual material. And, having seen some lapses in our weeks together, I think she’d probably have forgotten much of it without support.

On the other hand, I wouldn’t say that my prototype’s impact was transformative. Tara still began each section’s problem set with quite a lot of confusion. The good news is that she was consistently able to push through that confusion and emerge with a pretty solid understanding. But it was problem-solving which proximally made that transition happen, not her review sessions. What I can’t tell is whether the review sessions and comprehension support made it possible for problem-solving to have that effect.

By way of comparison, another student I worked with (“Alex”) struggled enough in problem-solving that he usually wasn’t able to push through to the same kind of clarity. Those struggles led to this new design, and with this design, Tara was able to achieve some degree of fluency. But we don’t know if her relative success was because of this design. If so, that would be a big impact, even if Tara’s process still involved a lot of fumbling and difficulty.

I think the next step here is to do a more controlled experiment, one that would surface the counterfactual impact of the design more clearly. Maybe readers who don’t have this extra support are just as likely to end up fluent. Maybe most of them will end up utterly stuck. We’ll have to see.

A first attempt at automation

As a reminder, here’s how the prototype works. I played “expert” with this linear algebra textbook, highlighting all the details I thought were particularly important, and writing one or more practice prompts for each highlight. Then I pulled a wizard-of-oz: as readers highlighted the text, I’d manually map their highlights onto my expert-curated highlights, in realtime. When readers finished a section, all the prompts associated with their highlights would be added to their collection, and the comprehension support interaction would surface any expert-curated highlights which they skipped.

This procedure requires me to be present in realtime with test readers, to make the mapping between their highlights and mine. If I want to run a controlled experiment involving more readers, that would rapidly become quite time-consuming. So I spent some time early this year trying to automate the mapping process.

My initial round of test sessions with readers gave me a conveniently labeled data set of 130 highlight mappings. I used DSPy to bootstrap a 16-shot chain-of-thought prompt for GPT-4, but my best F1 was 0.26. Then I tried fine-tuning long-t5, but my best F1 there was 0.04. At that point, I paused this work to begin my Vision Pro prototyping month.

There are lots more obvious things to try here with more time:

map via embedding and cosine similarity—dumber but maybe better?
train a special-purpose classifier rather than repurposing a general language model
detailed evaluation of the selections—i.e. maybe it’s not choosing the same highlights that I chose, but its choices are close enough?

I expect I’ll pick this up again in the coming months.

Writing with Matthew Siu

For the last year, I’ve been collaborating with research fellowship winner Matthew Siu on a system to help with a familiar kind of sensemaking. It’s for when you’re stuck in a creative project. You have huge piles of raw, unstructured text (notes from meetings, brainstorms, journals, lab notebooks). You’re confident that you could figure out how to proceed if you could extract the right pieces, juxtapose them, and start seeing the whole. Unfortunately, all you have are these enormous text files and some scroll bars.

I’m happy to report that we’re wrapping up this project! We’re now two months into the report-writing phase, and the narrative is coming together. This is my first time coaching someone else through their first big research write-up (Matthew’s the lead author), and that’s been a rewarding learning experience for me.

Our solution is a new kind of transclusion design—one which lets you freely move back and forth between in-context annotation-style interactions (highlighting/commenting) and open-ended text editing for incremental organization and elaboration. I know that description won’t mean much on its own, but it’s hard to say a little more without saying a lot more. I’ll save it for the report.

From BookBridge to BookShots

Last December I shared BookBridge, a design concept I created in collaboration with Derrek Chow. The main idea is to bridge physical and digital reading environments. Using a document camera, we capture your reading sessions with a physical book. You can point at things on the page, gesture expressively, and talk aloud; we record all of that and link it in time and space. Then we produce a high-density digital representation of the reading session for later reference and synthesis.

We’ve done scrappy, smoke-and-mirrors prototypes, but the natural next thing to do is to implement some part of this to a high enough fidelity that I can try using it for some serious reading. I was stuck, for a while, because our prototype has so many complex interconnected parts that every next step I considered was too large to comfortably bite off. The tension is that the next prototype needs to be featureful enough to be legitimately useful and to push forward the design concept, but not so featureful that it becomes an implementation sinkhole.

But thanks to helpful conversation with Rob Ochshorn and Andrew Sutherland, I now have a next step I like, which I’m calling “BookShots”. The idea is that we often take screenshots of books and articles we’re reading on our phone, and share those in discussion on Twitter or privately with friends. The social setting creates a context to engage more deeply with the book and to relate it to other topics on our minds. So, just as I sometimes send voice messages instead of text messages, because they’re more expressive or intimate, what if I could send a little augmented video clip of me pointing at and talking about a passage of a book?

The video would include a spatially-aligned speech transcript, and some time-flattened representation of my gestures. This concept would push me to flesh out the gestural and speech transcript elements of the BookBridge concept, without requiring all the extra affordances necessary to navigate long sessions across the full length of the book. I like that this concept emphasizes expressiveness and intimacy (my hands, my vocal tone, the physical environment surrounding the book) over utility (searching, indexing, density). That feels like it’s pushing on an important difference in my approach versus other projects in this space. I also like that the BookShots notion would naturally prioritize drawing connections to my own ideas outside the book; by comparison, BookBridge focuses more on capturing reactive comments I’d make as marginalia.

All I’ve got so far are some sketches made before other projects took over my attention. I have a lot on my plate right now, and I’m not sure when I’ll circle back to this! Still, it’s nice to have a next step teed up.

————————

This year is feeling extremely dense already! My thanks, as always, for your support.