onivim

Onivim 2 - Update #3: A re-architecture...

Added 2019-06-13 23:53:51 +0000 UTC

Hey all! Been a while since the last update, and was busy with ReactEurope. But I'm back into full Onivim 2 development mode now.

I want to start by saying thank you to everyone who has pre-ordered or supported the project here on Patreon!

Apologies in advance for the wall of text... the TL:DR is that downloadable builds are pushed out to end of July.

Prior to ReactEurope, I was just beginning to use Onivim 2 as my daily editor. There were a few blockers, but the most critical one, and the topic of this post, is about an intermittent crash when switching files.

A key value of this project is Quality - a higher quality bar than we had with our v1 - and intermittent crashes certainly aren't a part of that.

As a refresher... this is the current architecture of Onivim 2:

Neovim is ran as a separate process
We communicate with Neovim via its msgpack RPC protocol.
Commands and input are sent and queued on Neovim's event loop, and responses are sent back to Onivim 2, again via the same msgpack protocol.

In investigating the crash - the root cause of the problem is that we ask Neovim for something (run an Ex command, etc), and we expect a response... but we simply don't get anything. The code path impacted is here.

This happens intermittently, and there wasn't a clear set of repro steps - it would happen after some time using the editor. Occasionally, it would be immediate - even on launch - other times, it would happen later on. The worst kind of bug!

I was thinking through some potential fixes and additional investigation for this:

Where in the pipeline is it breaking down? Is it a deadlock on Neovim's side? Or, more likely, is it a bug in our msgpack handling?
As a workaround, what can we do to unblock?

But in the back of my mind - this wasn't the first bug we've had of this class - we've encountered issues in our msgpack handling before, like #284. Crashes like #296 manifest in our unpacking of msgpack payloads. In our current state, we prefer treating Neovim commands as synchronous - so to get a suitable abstraction, we sometimes 'pretend' calls are synchronous by spin-waiting until we get a result - certainly not ideal.

I could continue to investigate this issue, and find a one-time fix - but I felt it was important to step back and take a look at the bigger picture. I don't want to battle this layer to get a quality product. Why was this layer even necessary for what we are building? Is there a more holistic fix we could investigate?

This isn't the first time we've had challenges with this model in the broader sense... the asynchronicity of the RPC model were at the root of several complex interactions in Onivim 1:

For v1, a feature called Typing Prediction was added to make typing in insert mode feel faster - as soon as a character is typed in insert mode, we'd render it - even before we got a response from Neovim across the RPC layer. This smooths over latency from the RPC round-trip...but it was tough to reliably map the 'input' we sent to Neovim with the result (in other words - to know when the prediction could be cleared). For example, there could be phantom characters left over. An ideal fix for this would be to just process the input quickly and immediately - and not need prediction.
One of the slowest parts of the Onivim 1 codebase was the NeovimWindowManager - it would need to make several RPC calls in sequence to gather enough information about the window metrics to position certain kinds of overlays. This was a high-frequency code-path, being called after most user inputs. Gathering all these metrics was about building a function that lets us map a bufferPosition (line, column) to a screenPosition (characterX, characterY) (and pixelPosition). This was necessary to support some of the UI integrations that we wanted - but in the RPC model - was very expensive and not tenable in a performant way.
We tried to implement features like auto-closing pairs in Onivim 1. However, the asynchronous behavior caused problems for us here too - the feature worked unreliably, because it required multiple RPC calls to facilitate the interop - and there was no guarantee other RPC calls - like input - couldn't come from elsewhere and blow things up.

Part of the problem is we try and strong-arm Neovim into a scenario it wasn't intended - the ideal model for Neovim is to feed it input, and then it sends back the state of the screen via grid/redraw updates. However, Onivim 2 is unique among the Neovim GUIs in that it manages the entire view state itself - and that necessitates a tighter coupling with the internal state.

My Ideal World

Reflecting on this, I've settled on my ideal abstraction for working with Vim as the foundation of an editor - and that is to treat it as a state machine modeled by a pure, synchronous function, agnostic of any terminal dependencies. [1][2]

vim: (currentState, input) => newState

Vim, at its core, is simply a state machine - and each key-press moves it from one state to another. An 'i' key-press in normal mode switches to insert mode, etc. I'd like to be able to model it exactly that way - as a state machine that I can feed key-presses, and then get a new state. For example, after sending an 'i' key into the state machine, I should be able to immediately ask what mode we are in, what the current state of the buffer is, etc.

This model isn't doable with our current architecture. With Neovim, the RPC is inherently asynchronous. To introspect any state also relies on asynchronous calls - and these calls can add up in performance and complexity of code.

What if, instead, we simplified? Instead of RPC - why not just make direct synchronous function calls to interface with the 'vim engine'? Essentially - we could eliminate that entire failure point described above.

In other words, this architecture:

In this world - we'd link that pure, synchronous Vim function directly in the binary - removing the RPC cost and removing the added complexity of dealing with an asynchronous API. Calling C code via the C FFI in OCaml/Reason has relatively low overhead, and can be optimized to have almost no overhead. The core Onivim 2 editor would be a single process.

This has always been on my mind - this idea of integrating a libvim/libnvim directly in the executable. Initially, it was purely for performance reasons - the RPC calls tended to be fast on POSIX, but on Windows I saw a variance of ~4ms for these calls. If there is back-and-forth - this can add up to a significant cost and easily miss render deadlines. In addition, spinning up a process on Windows during the startup is expensive.

You might think, though, that trading an asynchronous model for a synchronous one could be detrimental for performance - but keep in mind that Vim and clients like gVim handle input synchronously, and are considered very fast. Modelling operations as asynchronous is only beneficial when they are not on the critical path for rendering - otherwise, it is just overhead. With Onivim 2's architecture, we have the possibility to even go beyond that - and tuck that synchronous input in a thread that is run in parallel with our rendering.

However, that performance isn't even our bottleneck at the moment (rendering is - we need Skia!) - what's more important to me at this time is reducing complexity - simplifying our code and the surface area of potential problems. I even glossed over this in our MOTIVATION. It was something, though, that I thought could wait - I hadn't planned on taking on this work until much further down the road.

The fact that we had this intermittent crash necessitated revisiting this. So I started down this path. I wanted to use Neovim as the base for this work. I very much appreciate the work the Neovim team has done, and believe that several enhancements in Vim like terminal or jobs would not be here without their hard work. And purely aside from the technical aspects... they've been incredibly supportive of the Onivim 1/2 projects.

The build system we use for OCaml/Reason today, though, is rocky on Windows - it's a Cygwin environment with the MingW cross-compiler toolchain. This can make building dependencies tricky, if they don't account for it correctly. And this was unfortunately the case for Neovim - I estimated it would take ~3 weeks to unblock the set of dependencies and get it building in that environment. Several of the needed dependencies (libuv, lua, etc) didn't build 'out-of-the-box', and also didn't handle that environment correctly. This is no fault of Neovim - the choice of C++ and modern dependencies are great choices - rather, it is a challenge of the toolchain we're using to build. It's the unfortunate reality that pure C code is easier to build cross-platform in our OCaml build environment, today.

Another difficulty with reconciling this synchronous, 'pure functional' Vim model with Neovim is that Neovim has an event loop at its core. In essence, when you send input to Neovim, it gets put on the event loop, and processed at some later point- this would need to be modified and refactored to fit that synchronous model. So even getting Neovim building isn't enough to fit that model - we'd also need to look at modifying the core application lifecycle to get the desired synchronous, functional API.

I decided to try building Vim proper instead - it ended up being very fast to get building in our toolchain (I built it on Windows with our Cygwin/MingW environment in ~5 minutes). I also experimented with what it would take to get a forked Vim to follow this 'pure functional' model. It's a bit challenging, because Vim's model is based on blocking for user input - so there are several places we need to invert control flow to 'feed' it input instead.

Making this decision to switch gears wasn't taken lightly, but more necessitated by our architectural direction. I wanted to fix the crash in a robust way by switching to this 'pure-functional' Vim abstraction, and remove the complexity of asynchronous RPC to get editor state. Either route required refactoring - if I went with Neovim, I'd need to swap out the event loop. If I went with vim, I'd need to refactor the blocking UI. There was work either way!

The deciding factor, in this case, were the build challenges - and to that end, I forked vim and created a C library called libvim - as well as Reason bindings for it - reason-libvim. Documentation is sparse at the moment; but the libvim.h header file is a good place to start, and there is some simple documentation for reason-libvim here.

libvim is intended to be a minimal abstraction of Vim - a buffer-editing engine without any concern about terminals - it's the piece of Vim that handles commands and manipulates buffers (the state machine). Stuff like syntax highlighting, rendering, spellcheck, completion is meant to be left to the consumer (ie, Onivim 2). It gets us pretty close to the pure-functional model described above... (well, the 'state' is still global, so not quite pure... but closer...)

The idea of a minimal Vim abstraction could be useful for other scenarios besides Onivim 2 - it could potentially be interface for a readline style program to emulate Vim input, or easy to get WASM builds... I'd really like to get our Onivim 1 tutorials working on the web!

I always knew that forking Neovim/Vim would eventually be necessary; we would need to do this to maximize our VimL compatibility. For example, since Onivim 2 manages window splits - we'd like to forward ':vsp' calls to the front-end - doing this without forking is pretty hacky! However, I didn't expect to consider this before our MVP release.

Having direct access to Vim's C API opens up some exciting possibilities. We can leverage Vim's logic for line wrapping, or create hooks for indentation to call back into Reason code. Some things that were tricky in our current architecture, like getting search results to highlight, or bracket matching, are now just an API call away, like vimSearchGetMatchingPair. Down the road, we can integrate with other Vim features like signs and marks.. lots of potential.

Every technical decision has trade-offs, and this is no exception. There were of course downsides to taking this on, though:

We needed to switch from Neovim -> Vim.
There is work and a bug tail in the Vim -> libvim refactoring - we will need help testing the builds! There can be bugs in the refactoring that still cause crashes, or hanging input in codepaths that haven't been modified to reflect this inverted control path. The plus side is these crashes tend to be reproducible and deterministic.
It's a lot of work... Vim uses a blocking I/O model, so refactoring it to be 'fed' keys is not always straightforward.

The net result is that the downloadable builds are pushed out to end of July while we stabilize this work. I'm getting close to having libvim powered Onivim 2 in master - but I'd like to spend some time testing it before dropping builds.

But there are also upsides:

We're removing a problematic layer from the code - the RPC/msgpack layer - that caused us problems.
We now have a synchronous model for input and state queries, which simplifies our logic.
Potential for faster runtime performance (no overhead of RPC - instead we use direct C calls).
Potential for faster startup (no need to spin up a separate process).
No versioning issues with various versions of installed Neovim.
Opens the door for better integration with some Vim features (signs, marks, search highlights, etc).

So certainly a wild ride the past couple weeks: learning the Vim codebase, creating libvim, and hooking it up to Onivim... but this foundational work is tremendously important for us to deliver on our vision of a fast, high-quality, modern Vim-based code editor: Onivim.

In the wake of this, I've created a few new projects:

libvim - The core C abstraction of Vim. The documentation is limited, but the libvim.h and some test cases like: normal_mode_curswant.c might help see how it can be used.
reason-libvim - ReasonML bindings for libvim, which also has some cursory documentation

And finally, the work being done to integrate reason-libvim into Onivim 2 is in PR 326.

I'm really sorry about the delay of builds - it's never fun to push a deadline back. In the meantime, though, if you want to follow our progress, you can build Onivim 2 from source. (and we'd love to have your help testing & stabilizing!)

As an FYI, the plan will be as before to raise the pre-order price once the builds drop end of July; but until then - we'll keep running the pay-what-you-want promotion - if you've pre-ordered or contributed any dollar amount to the project (via pre-order/Patreon/etc), you have a lifetime license. I'll update more as we get closer to having the builds.

Next steps for the project are:

- June: Stabilize and integrate libvim/reason-libvim into Onivim 2 master
- July: Stabilization + downloadable builds
- Aug - Oct: VSCode Extension Host integration push

Cheers & thank you for reading this! Hit us up on twitter, discord, or leave a comment here if you have any questions/feedback/ideas.

- Bryan

Notes:

[1] Note that having Vim modeled as synchronous doesn't necessitate the entire app being single-threaded. The plan for Onivim 2 is to have a 'state thread' and a 'render thread', where the vim commands would be run on the state thread, in parallel with other operations like rendering and syntax highlighting.
[2] The description above is simplified and doesn't account for side effects - like user messages, but those would need to be modeled as well (ie, via callbacks).