Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Lies I was told about collaborative editing, Part 2: Why we don't use Yjs (moment.dev)
271 points by antics 5 days ago | hide | past | favorite | 161 comments
 help



I'm actually in the middle of rewriting the y-prosemirror binding with Kevin Jahns as we speak, we hope to address a number of the fundamental design choices that were made 6 years ago. I did a presentation on this at FOSDEM this year if anyone is interested in some specifics to the approach we are taking for this: https://fosdem.org/2026/schedule/event/8VKQXR-blocknote-yjs-...

I feel like this post is overly hyperbolic about the choices that an Open Source maintainer made years ago, and no one seemed to care enough to pay him to rewrite it.


I watched the presentation, but I'm getting the impression that you're still using a hierarchical XML-like structure for the document, which sounds like it still cannot merge or split nodes (without recreating the content after the merge or split point). Will issues like https://github.com/yjs/y-prosemirror/issues/205 be addressed by this? (Which not only breaks ProseMirror's position mapping, but also your own, and which I consider a serious enough bug that recommending y-prosemirror for anything feels unserious.)

Yes, we are still using XML to represent the document content, to represent the document as a flat list would require some research into something like Automerge's Block markers https://automerge.org/docs/reference/under-the-hood/rich-tex... which we don't have the funding to look into right now, but hope to work towards in the future.

We will have better support for ProseMirror's position mapping, but I'm not confident that we will make any progress on this specific issue, since it requires a much larger refactor than what we are contracted for (suggestions & versioning). For selection mapping specifically, we can of course make an effort here to fix it, but the underlying limitations of the split node operations would still be there.


I believe that node-splitting in y-prosemirror can be solved.

But I hope that you also can appreciate that mapping prosemirror to a CRDT structure is a very complex thing to do. Your schema implementation and node-splitting behaviors are extremely hard to map to existing conflict resolution libraries.

Other editors, like Quill, map better to CRDT structures.

This is, of course, not your problem. But if you - or any editor author for that matter - ever end up creating another editor library, I'd love to work with you on adaptability to existing conflict resolution libraries. There is a lot to gain from being compatible to the wider ecosystem.

After all, we shouldn't expect our users to set up a tailor made backend for each single collaborative component.

Encrypted editing apps like Proton Mail Docs wouldn't be possible with your solution.


I'll be the first to admit that ProseMirror's change representation is a bit messy, but I don't think that's the root of the issue here. The problem is about generalized tree-structured document representation (as opposed to the flat-string model of Quill). I believe such a representation has value, and is the appropriate choice for a system like ProseMirror or its successor. Joining and splitting blocks is not a weird quirk of ProseMirror—it is a basic, essential editing operation. But there doesn't appear to be a known appropriate way to map that structure to a CRDT in a way that can properly express such splitting and joining. And that's fundamentally limiting the approach taken in y-prosemirror.

This is not a fundamental issue. Node splitting can be represented in CRDTs. It's just really hard to map correctly to ProseMirror as the merge logic is complex and bound to a schema.

I don't blame you for ProseMirror for being how it is. I'm just offering my feedback for your next editor library.


But Kevin, you never really answered the question of the article. Unless I need a truly-masterless p2p topology, why would I do all this stuff, including throw away editor intent around things like block split, just to use Yjs? prosemirror-collab and prosemirror-collab-commit already seem to do all the things the Yjs docs claim to do (unbounded offline writes that reconcile automatically, optimistic updates, tolerant of all kinds of failures), and they work with 100% fidelity to the underlying model. AFAICT, the only thing that you need Yjs for, is true p2p editing.

This is a serious question, and the question of the article. I am here to learn what you mean, please explain.


Yjs is about making things easy. It is a good abstraction to make anything collaborative (not everyone can implement something like prosemirror-collab).

I'd take the slight performance overhead any day if I get guaranteed syncs. Network protocols are not as reliable as you think they are. Detecting random drops of messages is hard. At scale, you are going to appreciate the sync guarantees.

prosemirror-collab doesn't give you offline editing either. Because, guess what - if there is no central server you can't edit the same doc from multiple tabs.

I once had a customer that accidentally deleted part of their database containing Yjs docs. Few of his users noticed, because their docs synced through y-indexeddb.

And it's fun. You can Yjs on anything. There is a company that syncs Ydocs through QR codes.

As a generic collab library, it does a very good job. CRDTs really are a fun thing to use. A lot of people feel that way.

If you want to use something else, that's totally fine! Write an article about how great prosemirror-collab is.


But Kevin, prosemirror-collab does give me offline editing. I use it literally every day, entirely without issue. I write offline on different devices, and whenever I come online, it all syncs up. No issues.

It does not give me p2p topology. Is that what you mean?


The y-prosemirror rewrite is super exciting!

To speak on yjs: We use yjs over at LegendKeeper. We're not a huge app, but our users do worldbuilding for D&D, and have amassed over 30+ million collaborative documents, ranging from rich text to fantasy maps to fantasy timelines. Is yjs technically overkill when you have a central tie-breaker? Sure, but the DX is fantastic, and personally I love the idea of my application being truly local-first, even if our core value prop is not necessarily tied to being offline. It also gives me a legacy support plan for our users in case I ever get hit by a bus. :)

On the tech side, you save a lot of cognitive overhead when you can just do:

applyUpdate(docA, update1) applyUpdate(docB, update1)

and now docA and docB are in the same state, no matter what the context. For centralization, adding in a "well, they'll converge once we add a third party" absolutely increases the cognitive complexity of reasoning about your code, and limits your ability to write clean tests. Centralization buys you a lot, too. I don't think one is correct over the other.

There are tradeoffs. There's a memory and CPU cost, and yes, sometimes the "Technically merged state" of a yjs-prosemmirror document is not what's expected. Over seven years and 150,000+ users, we've never had a single person complain about it.


Hi, the author of Yjs here. Thanks, Nick, for chiming in!

As this article is blowing up now, I want to address a few points.

I, too, feel the need for simplicity over overly complex solutions - and I found it in CRDTs. They beautifully allow me to reason about conflicts - so that my users don't have to. Very few people can design a custom conflict resolution algorithm for an application. Yjs is a general-purpose framework that enables you to make EVERYTHING collaborative. That's the goal.

It's fine if you want to explore different solutions. I don't understand the need to put down one framework in favor of another. It doesn't have to be "OT vs CRDT". Hey, if you found something that works for you - great! But let me tell you that neither solution magically makes everything simpler. There is still a lot to learn.

Different solutions to conflict resolution have different tradeoffs. It's unfortunate that the author of the article attributes all complexity to Yjs. It's just that collaborative editing is a very complex problem and requires a lot of attention to detail. In many regards, Yjs has done very well for the larger ecosystem. In other regards there is room for improvement.

The only thing I acknowledge from the article is the criticism about y-prosemirror "replacing the whole document". Unfortunately, the author extrapolated some false assumptions. This is not a performance issue. y-prosemirror runs at 60 fps even on large documents. It's like arguing React is slow because it replaces the whole document with every edit. We leverage ProseMirror's behavior to do identity checks on the nodes before updating the DOM. However, it's true that this breaks positions for some plugins (e.g. a comment plugin).

Instead of Prosemirror positions, we encourage plugins to use Yjs-based positions, which are more accurate in case of conflicts. Marijn talks about this as well [1]. The collab implementation in Prosemirror does not guarantee that positions always converge. That means, comments could end up in different places for different collaborators. This works in most cases, but in some it doesn't - which is one of the reasons why I prefer CRDTs as a framework to think about conflicts.

But as Nick said, we are currently working on a new y-prosemirror binding that works better with existing plugins.

I'm curious about the section "CRDTs are much, much harder to debug" which ironically talks about how hard prosemirror-collab is to debug. You won't find any such bugs in Yjs. The conflict-resolution algorithm is quite simple and has been battle tested. Before every release, Yjs undergoes extensive fuzz testing for hours in simulated scenarios. I'm very happy to show anyone how to debug a CRDT. It requires some background information, but it ultimately is easier.

To address another unfounded claim by the author: I bet OP $1000 that the GC algorithm in Yjs is correct even in offline-editing scenarios. He won't be able to reproduce the issues he is talking about.

[1]: https://marijnhaverbeke.nl/blog/collaborative-editing-cm.htm...


> I don't understand the need to put down one framework in favor of another.

I didn't take the article as "putting down Yjs", just suggesting it's not the best solution for ProseMirror-backed product use cases patterned after their own.

> y-prosemirror runs at 60 fps even on large documents

Did the OP claim Yjs was slow? Have you created a ProseMirror-backed product of the complexity of Confluence's editor with 16ms frame time targets? The challenge isn't the collab algorithm as much as the CPU time of the plugins, "smart" nodes, and other downstream work triggered by updates. It's incredibly useful to have control over the granularity of updates and that is IMHO easier when dealing more closely with ProseMirror steps and transactions .

> I bet OP $1000 that the GC algorithm in Yjs is correct even in offline-editing scenarios.

Did the OP claim the Yjs algorithm is incorrect?

> You won't find any such bugs in Yjs. The conflict-resolution algorithm..

I didn't take the debugging section to indicate an issue with Yjs convergence.. Nor do most people encounter "bugs" with prose-mirror collab; last update 3 years ago? The debugging challenge is typically around "how did we arrive at this document state"(what steps got us here, where did they come from) and how steps/updates interact with plugins. IMHO discarding the original steps and dealing with a different unit of change complicates that greatly. Especially when dealing with a product in production and a customers broken document that needs to be fixed and root caused..


Hi Kevin, author here, are you sure we are on the same page about what I'm saying in this article?

For example, the debugging section is NOT about debugging problems inside Yjs, it's about debugging one's own bugs while one is using something like Yjs or prosemirror-collab. I'm normally a happy gambler but I actually think the Yjs GC algorithm probably does do what it says... indeed, my complaint is that I specifically do not want that behavior. :)

Likewise, the point I'm making about performance is NOT that Yjs is "slow". I'm saying that it's empirically very challenging to meet the 16ms perf budget even in the simplest possible realistic collab scenario... and that because of this, it is (1) very unappealing and (2) in our experience, challenging to attempt to do this when you also have to do a pile of extra things that are unrelated to the task at hand, like translate `Transaction` to and from operations on an XML doc, and deal with all the consequences of messed up positions passed to plugins. I do understand you have your own benchmarks that give you the confidence to (without qualifications) claim that y-prosemirror "runs at 60fps". You really are not curious about why we think that's not the case?

If we can't get to a shared understand what is being said here, it's going to be very hard to talk about it at all. And on the two material points at stack in your response, I believe you have the precise opposite understanding of what was written. I'm happy to keep discussing but it feels like we're starting from scratch after this response, again.


Am I correctly understanding that you (Moment) have chosen to use Prosemirror and that with that using Yjs was the hard part? Or did you mean to say in the article that you used Yjs directly? It would be less prone to misunderstanding if it read "why we don't use y-prosemirror" and you would lose a lot of potential audience for the post.

I tried to understand what was wrong in Yjs, as I'm using it myself, but your point is not really with Yjs it seems but on how the interaction is with Prosemirror in your use case. I can see why you're bringing up your points against Yjs and I'm having a hard time understanding why you don't consider alternatives to Prosemirror directly. Put another way, "because this integration was bad the source system must also be bad". I do not condone this part of your article. Seems like a sunken cost fallacy to me and reasoning about it at anothers expense, but perhaps not. Hoping to hear back from you.


So, we are basically making two points.

First, the fact that Yjs bindings, by design (for ~6 years), replace the entire document, does in my opinion, indicate a fundamental misunderstand what rich text editors need to perform well in any circumstance, not just collaborative ones. As I say in the article... I hope to be able to write another article that this has changed and they now "get" it, but for now I do not think it is appropriate to trust Yjs with this task for production-grade editors. I'm sorry to write this, but I do think it's true! I'm not trying to bag on anyone!

Second, and more material: to deploy Yjs to production in the centralized case, I think you are very much swimming against the current of its architecture. Just one example is permissions. There is no established way to determine which peers in a truly-p2p architecture have permissions to add comments vs edit, so you will end up using a centralized server for that. But that's not free, CRDTs are mechanically much more complicated! For example, you have to figure out how to disallow a user to make mark-only edits if they have "commenter" access, but allow editing the whole doc for "editor" access. This is trivial in `prosemirror-collab` (say) but it's very hard in Yjs because you have to map it "through" their XML transformations model.

I'm happy to talk more about this if it's helpful. But yes, we are trying to say some stuff about Yjs specifically, and some stuff about CRDTs generally.


You misunderstand how the "document replacement" in y-prosemirror works. It's like arguing that React is bad because it performs a complete document replacement on every change. The diffing part makes it fast.

That said, it's not without problems - I acknowledge that. But it's not as bad as you make it sound. You didn't list one concrete case when you had issues with it.

I'm very happy that Nick and I finally found funding to make a rewrite happen. It really did take 6 years to make this happen, because it's hard to find funding for open source projects.


Just tying up loose ends here, in this other comment I suggest that Kevin and I are not on the same page about the point I was making in the article: https://news.ycombinator.com/item?id=47422455

I will additionally note that I'm not making any point about performance in this comment either, though. "Perform" in this context is not about 60fps, it's about whether, like, plugins work.


FWIW, I'm literally working on rewriting the y-prosemirror binding today with Kevin Jahns, the creator of Y.js and wrote the initial binding. Yes, the current binding has it's flaws, but we hope to flush out the most egregious of them with a completely different design which I made a presentation about at FOSDEM this year: https://fosdem.org/2026/schedule/event/8VKQXR-blocknote-yjs-...

Thanks Nick. I am aware. The blog post directly links to the PR you merged some days ago that (as I understand it) kicks off the effort. I also mention specifically I know you're working on it.

https://github.com/disarticulate/y-webrtc/blob/master/src/y-... has a validMessage function passed into the room. This allows you to validate any update and reject them. It might be "costly", but it lets you inspect the next object. Since Yjs doesn't care about order or operations, it doesn't really matter how long validation takes.

Not sure what the error conditions loop like, but you could probably bootstrap message hashes in a metadata array in the object, along with encryption signatures to prevent unwanted updates to objects.


Just use OT like normal people, it’s been proven to work. No tombstones, no infinite storage requirements or forced “compaction”, fairly easy to debug, algorithm is moderate to complex but there are reference open source implementations to cross check against. You need a server for OT but you’re always going to have a server anyway, one extra websocket won’t hurt you. We regularly have 30-50k websockets connected at a time. CRDTs are a meme and are not for serious applications.

Author here, I did not specifically mention OT in the article, since our main focus was to help people understand the downsides of the currently-most-popular system, which is built on CRDTs.

BUT, since you mention it, I'll say a bit here. It sounds like you have your own experience, and we'd love to hear about that. But OUR experience was: (1) we found (contrary to popular belief) that OT actually does not require a centralized server, (2) we found it to be harder to implement OT exactly right vs CRDTs, and (3) we found many (though not all) of the problems that CRDTs have, are also problems in practice for OT—although in fairness to OT, we think the problems CRDTs have in general are vastly worse to the end-user experience.

If there's interest I'm happy to write a similar article entirely dedicated to OT. But, for (3), as intuition, we found a lot of the problems that both CRDTs and OT have seem to arise from a fundamental impedance mismatch between the in-memory representation of the state of a modern editor, and the representation that is actually synchronized. That is, when you apply an op (CRDT) or a transform (OT), you have to transform the change into a (to use ProseMirror as an example) valid `Transaction` on an `EditorState`. This is not always easy in either case, and to do it right you might have to think very hard about things like "how to preserve position mappings," and other parts of editor state that are crucial to (say) plugins that manage locations of comment marks or presence cursors.

With all of that said, OT is definitely much closer to what modern editors need, in my opinion at least. The less-well-known algorithm we ended up recommending here (which I will call "Marjin Collab", after its author) is essentially a very lightweight OT, without the "transformation" step.


I feel we were quite lucky back in 2015 when we started rewriting CKEditor with RTC as a core requirement. At the time, OT seemed like the only viable option, so the choice was simple :)

What definitely helped too was having a very specific use case (rich-text editing) which guided many of our decisions. We focused heavily on getting the user experience right for common editing scenarios. And I fully agree that it's not just about conflict resolution, but also things like preserving position mappings. All these mechanisms need to work together for the experience to make sense to the end user.

This is an older piece (from 2018), but we shared more details about our approach here: https://ckeditor.com/blog/lessons-learned-from-creating-a-ri...

One clear issue with OT that we still face today is its complexity. It's nearly 8 years since we launched it, and we still occasionally run into bugs in OT – even though it sits at the very core of our engine. I remember seeing a similar comment from the Google Docs team :D


> (1) we found (contrary to popular belief) that OT actually does not require a centralized server

In theory, yes, but in practice, any OT that operates without a central server (or master peer) essentially ends up being a CRDT. A CRDT is a subset of OT, specifically one that adds the requirement of P2P support.

> (2) we found it to be harder to implement OT exactly right vs CRDTs

I would say that each has its own complexity in different areas. CRDT's complexity lies in its data structure and algorithm, while OT's lies in its sync engine (since it must handle race conditions and guarantee deterministic ordering). In my opinion, OT is simpler overall. Hopefully DocNode and DocSync will make OT even easier.

> (3) we found many (though not all) of the problems that CRDTs have, are also problems in practice for OT

Oh, definitely not! OT has many benefits[1]. I think the misconception stems from the common belief that OT should be positional, rather than id-based. In the first case, operations are transformed on other operations. In the second case, operations can also be transformed on the current document (O(1)), eliminating the problems commonly associated with OT. This is the approach I use in DocNode.

> the problems CRDTs have in general are vastly worse to the end-user experience.

This is 100% correct.

____

https://www.docukit.dev/docnode#how-does-it-compare-to-yjs


Having a central server is not necessary, but we have one anyway and we use it, especially if you have a permissions system. It lets us use the "Google wave" algorithm which vastly simplifies things.

https://svn.apache.org/repos/asf/incubator/wave/whitepapers/...

> This is not always easy in either case, and to do it right you might have to think very hard about things like "how to preserve position mappings," and other parts of editor state that are crucial to (say) plugins that manage locations of comment marks or presence cursors.

Maintaining text editor state is normal. Yes you do need to convert the OT messages into whatever diff format your editor requires (and back), but that's the standard glue code.

The nice thing about OT is that you can just feed the positions of marks into the OT algorithm to get the new positional value. Worst case, you just have the server send the server side position when sending the OT event and the client just displays the server side position.


Josh eloquently explains how Google Wave's DACP (Distributed Application Canceling Protocol) works:

https://www.youtube.com/watch?v=4Z4RKRLaSug


I always mentally slotted prosemirror-collab/your recommended solution in the OT category. What’s the difference between the “rebase” step and the “transformation” step you’re saying it doesn’t need?

Great question. Matt has a comment about this here, and he has an actual PhD on the subject! So rather than doing a worse job explaining I will leave it to him to explain: https://news.ycombinator.com/user?id=mweidner

One way to minimize impedance mismatch is to work with DOM-like or JSON-like structures mostly immune to transient bugs, which I am doing currently in the librdx project. It has full-CRDT RDX format[1] and essentially-JSON BASON[2] format. It does not solve all the problems, more like the set of problems is different. On the good side, it is really difficult to break. On the bad side, it lacks some of the rigor (esp BASON) that mature CRDT models have. But, those models are way more complex and, most likely, will have mismatching bugs in different implementations. No free lunch.

[1]: https://github.com/gritzko/librdx/tree/master/rdx [2]: https://github.com/gritzko/librdx/tree/master/json


> The less-well-known algorithm we ended up recommending here (which I will call "Marjin Collab", after its author) is essentially a very lightweight OT, without the "transformation" step.

You talk about "rebasing" changes in the article. Does that not imply a "transformation" step?


The rebasing step is indeed a transformation. Some info in the "rebasing" link here [1].

Unlike traditional Operational Transformation, though, there are no "transformation properties" [2] that this rebasing needs to satisfy. (Normally a central-server OT would need to satisfy TP1, or else users may end up in inconsistent states.) Instead, the rebased operations just need to "make sense" to users, i.e., be a reasonable way to apply your original edit to a slightly-further-ahead state. ProseMirror has this sort of rebasing built in, via its step mappings, which lets the collaboration-specific parts of the algorithm look very simple - perhaps deceptively so.

[1] https://prosemirror.net/docs/guide/#collab [2] https://en.wikipedia.org/wiki/Operational_transformation#Tra...


Author here, just chiming in to say that Matt has an actual PhD on the subject so rather than explain it worse, I will just let him say the probably-actually-correct thing here.

> CRDTs are a meme and are not for serious applications.

You don't think Figma is a serious application?

By all means, use OT. I worked on OT software for many years - and my work on OT types, ShareJS and ShareDB is still in production all over the place. But I don't think there's anything you can do with OT that you can't do just as well with CRDTs.

The only real benefit of OT is that its simpler to reason about. Maybe that's enough.


> You don't think Figma is a serious application?

I don't know where this popular belief came from. The Figma blog literally says "Figma isn't using true CRDTs"[1].

> The only real benefit of OT is that its simpler to reason about.

That's incorrect. When you free yourself from the P2P restriction that CRDTs are subject to, there's a huge amount of metadata you can get rid of, just to mention one benefit.

[1] https://www.figma.com/blog/how-figmas-multiplayer-technology...


> I don't know where this popular belief came from.

It may be worth reading the whole paragraph of the blog you referenced...

> Figma isn't using true CRDTs though. CRDTs are designed for decentralized systems where there is no single central authority to decide what the final state should be. There is some unavoidable performance and memory overhead with doing this. Since Figma is centralized (our server is the central authority), we can simplify our system by removing this extra overhead and benefit from a faster and leaner implementation.

> It’s also worth noting that Figma's data structure isn't a single CRDT. Instead it's inspired by multiple separate CRDTs and uses them in combination to create the final data structure that represents a Figma document (described below).

So, it's multiple CRDTs, not just one. And they've made some optimizations on top, but that doesn't make it not a CRDT?


Nowhere does it say it's multiple CRDTs. It says "isn't a single CRDT" and that "it's inspired by multiple separate CRDTs." A bit confusing, I agree.

By the way, I work at Figma.


> When you free yourself from the P2P restriction that CRDTs are subject to...

CRDT stands for "Conflict-free replicating data type". They're just, any data type with an idempotent, commutative merge function defined on the entire range of input. Technically, CRDTs have nothing to do with networks at all.

The simplest "real" CRDT is integers and the MAX function. Eg, you think of an integer. I think of an integer. We merge by taking the max of our integers. This is a CRDT. Would it have eventual consistency? Yes! Would it work in a server-to-client setup? Of course! Does it need any metadata at all? Nope! It doesn't even need versions.

There's no such thing as a "P2P restriction". Anything that works p2p also works server-to-client. You can always treat the server and client as peers.

> there's a huge amount of metadata you can get rid of, just to mention one benefit.

Can you give some examples of this metadata? In my experience (10-15 years with this stuff now), I've found you can get the same or better performance out of CRDTs and OT based systems if you're willing to make the same set of tradeoffs.

For example, OT has a tradeoff where you can discard old operations. The cost of doing so is that you can no longer merge old changes. But, you can do exactly the same thing in CRDTs, with the same cost and same benefits. Yjs calls this "garbage collection".

In my own eg-walker algorithm, there's 3 different ways you can do this. You can throw away old operations like in OT based systems - with the same cost and benefit. You can keep old metadata but throw away old data. This lets you still merge but you can't see old versions. Or you can keep old edits on the server and only lazy-load them on the client. Clients are small and fast, and you have full change history.

CRDTs generally give you more options. But more options = more complexity.

I'm no p2p idealist. Central servers definitely make some things easier, like access control. But CRDTs still work great in a centralised context.


Ok, replace "P2P restriction" with "idempotent, commutative restriction".

> For example, OT has a tradeoff where you can discard old operations. The cost of doing so is that you can no longer merge old changes.

Why wouldn’t you be able to? My server receives operations, applies them to the document, and discards them. It can receive operations as old as it wants.

___

> Can you give some examples of this metadata?

Yes, it depends on the CRDT, but if we're talking about lists or tree structures with insert and delete operations, these can come in the form of thombstones, or operation logs, or originRight or originLeft, or DAG. Even with a garbage collector, the CRDT needs to retain some of this metadata.

Yes, you can optimize by not bringing it into memory when it’s not needed. But they’re still there, even though they could be avoided entirely if you assume a central server that guarantees a deterministic ordering of operations.


> Why wouldn’t you be able to? My server receives operations, applies them to the document, and discards them. It can receive operations as old as it wants.

OT needs to do operation transformation. If you get passed old operations, you need to transform them before they can be applied. This requires keeping extra data around. I mean, this is entirely the point of the metadata you describe in the second part of your comment:

> these can come in the form of thombstones, or operation logs, or originRight or originLeft, or DAG. Even with a garbage collector, the CRDT needs to retain some of this metadata.

> But they’re still there, even though they could be avoided entirely if you assume a central server that guarantees a deterministic ordering of operations.

A central server doesn't remove the need for this data though! Lets assume a completely deterministic ordering of operations on the server, like Jupiter. If the server is at version 1000, and I send the server an operation at version 900, the server needs to use information about operations 900-1000 to be able to apply the change. This is true in Jupiter OT - which uses the actual operations. Or in Yjs or Diamond Types or any other collaborative text editing system. We need some of that information to figure out how to transform the incoming change and order it correctly.

At least, I've never seen or heard of any scheme which doesn't need some data like this.


> A central server doesn't remove the need for this data though!

Yes, it's possible. The problem is that we're using different definitions of what OT means. This conversation has converged on the same point we started in another thread; I suggest we continue it there:

https://news.ycombinator.com/item?id=47436249


Are there any major libraries for OT? I've been looking into this recently for a project at work, and OT would be completely sufficient for our use case, and does look simpler overall, but from what I could tell, we'd need to write a lot of stuff ourselves. The only vaguely active-looking project in JS at least seems to be DocNode (https://www.docukit.dev/docnode), and that looks very cool but also very early days.

Author here. I think it depends what you're doing! OT is a true distributed systems algorithm and to my knowledge there are no projects that implement true, distributed OT with strong support for modern rich text editor SDKs like ProseMirror. ShareJS, for example, is abandoned, and predates most modern editors.

If you are using a centralized server and ProseMirror, there are several OT and pseudo-OT implementations. Most popularly, there is prosemirror-collab[4], which is basically "OT without the stuff you don't need with an authoritative source for documents." Practically speaking that means "OT without T", but because it does not transform the ops to be order-independent, it has an extra step on conflict where the user has to rebase changes and re-submit. This is can cause minor edit starvation of less-connected clients. prosemirror-collab-commit[5] fixes this by performing the rebasing on the server... so it's still "OT without the T", but also with an authoritative conflict resolution pseudo-T at the end. I personally recommend prosemirror-collab-commit, it's what we use, and it's extremely fast and predictable.

If you just want something pedogocically helpful, the blessed upstream collaborative editing solution for CodeMirror is OT. See author's blog post[1], the @codemirror/collab package[2], and the live demo[3]. In general this implementation is quite good and worth reading if you are interested in this kind of thing. ShareJS and OTTypes are both very readable and very good, although we found them very challenging to adopt in a real-world ProseMirror-based editor.

[1]: https://marijnhaverbeke.nl/blog/collaborative-editing-cm.htm...

[2]: https://codemirror.net/docs/ref/#collab

[3]: https://codemirror.net/examples/collab/

[4]: https://github.com/ProseMirror/prosemirror-collab

[5]: https://github.com/stepwisehq/prosemirror-collab-commit


When I was starting my research into collaborative editing as a PhD student 20+ years ago, rebase-and-resubmit was well known. It was used in one Microsoft team collab product (I forgot the name). It is 100% legit algo except intermittently-connected clients may face challenges (screw them then).

Unless you have to support some complicated scenarios, it will work. I believe Google Docs initially used something of the sort (diff-match-patch based). It annoyed users with alerts "lets rebase your changes", esp on bad WiFi. So they borrowed proper OT from Google Wave and lived happily since (not really).

One way to think about it: how many users will your product have and how strange your data races / corner cases can get. At Google's scale, 0.1% users complaining is a huge shit storm. For others, that is one crazy guy in the channel, no biggie. It all depends.

TLDR: people invented OT/CRDT for a reason.


First of all, thanks for chiming in! I wish someone would collect stuff like this and write it down in some sort of "oral history of collab editing."

Second of all, I actually think we're more aligned than it seems here. What we're really advocating for is being super clear about what your end-user goals are, and deriving technology decisions from them, instead of the reverse. Our goals for this technology are (1) users should be able to predict what happens to their data, (2) the editor always run at 60fps, and (3) we are reasonably tolerant of transient periods of disconnection (up to, say, 30s-1m).

Because of (1) in particular, a lot of our evaluation was focused on understanding which situations users would be unable to predict what was going to happen to their data. This is only our own experience, but what we found (and the impetus for part 1 of this series) is that almost 100% of the time, when there is a direct editing conflict, users interpret the results of the dominant CRDT and OT implementations as silently corrupting their data. So, the name of the game is to decrease the likelihood of direct editing conflicts, e.g. presence carets in the live-collab case. In particular, we did not notice a meaningful difference between how users view reconciliations of OT and CRDT implementations.

Since our users could not tell the difference, and in fact viewed all options as equally bad ("disastrous" as one user said), this freed us up to consider a much broader class of algorithms, including prosemirror-collab and prosemirror-collab-commit.

I know there is a rich history of why OT is OT, but our final determination was made pretty simple by the fact that the source of the majority of race conditions in our view come from the difficulty of integrating CRDTs and OT directly into modern editing stacks, like ProseMirror. As far as I am aware, prosemirror-collab-commit behaves as good or better on every dimension than, say, an OTTypes implementation would... and mostly that is because it is native to the expressive `Transaction` model of the modern editor. If we had to do interop I think we would have shipped something noticably worse, and much slower.

If you have a different experience I would love to hear about it, as we are perennially in the market for new ideas here.


> I wish someone would collect stuff like this and write it down in some sort of "oral history of collab editing.

I'd be very happy to contribute to this if someone wanted to do some storytelling.

I also interviewed Kevin Jahns, the author of Yjs several years ago to get his take on how Yjs works and how he thinks about it all [1]. The conversation was long but I very much enjoyed it.

> This is only our own experience, but what we found (and the impetus for part 1 of this series) is that almost 100% of the time, when there is a direct editing conflict, users interpret the results of the dominant CRDT and OT implementations as silently corrupting their data.

That's not been my experience. Have edits been visible in realtime? I think about it as if there's essentially 2 use cases where collaborative editing shows up:

1. Realtime collab editing. So long as both users' cursors are visible on screen at the same time, users are often hesitant to type at the same place & same time anyway. And if any problems happen, they will simply fix them.

2. Offline (async) collab editing. Eg, editing a project in git. In this case, I think we really want conflict markers & conflict ranges. I've been saying this for years hoping someone implements conflicts within a CRDT, but as far as I know, nobody has done it yet. CRDTs have strictly more information than git does about what has changed. It would be very doable for a CRDT to support the sort of conflict merging that git can do. But, nobody has implemented it yet.

[1] https://www.youtube.com/watch?v=0l5XgnQ6rB4


Hi Joseph! Good to see you here.

Sorry, I see why this was confusing. What I'm saying is that because users perceive the results of OT, CRDTs, prosemirror-collab, etc. as data corruption, they require presence carets as a UI affordance, to steer them away from direct edit conflicts.

If you can't have presence carets, yes... likely best to have a diff view. And in our case, we use git/jj for this, rather than CRDTs.

For the "history of" stuff... in spite of the fact that we have our disagreements, I actually think it would be very nice to have Kevin and everyone else on the record long-form talking about this. Just because I am a non-believer doesn't mean it wasn't worth trying!


> For the "history of" stuff... in spite of the fact that we have our disagreements, I actually think it would be very nice to have Kevin and everyone else on the record long-form talking about this. Just because I am a non-believer doesn't mean it wasn't worth trying!

I'd enjoy that too. Hit me up if you wanna chat about this stuff - I'd enjoy the opportunity to convince you over video. Maybe even recorded, if people are interested in that.


> Maybe even recorded, if people are interested in that.

I would be. IIRC you interviewed the Yjs creator years ago in a YT video I watched? This post has been fun spicey discourse not withstanding. It's not often a lot of the people in the collab space are together in the same spot, and the clash of academics and product builders is valuable.

As an aside I'd put Marjin in the product builder side. A lot of people dabble in collab algorithms and have a hobby editor on the side, but he created and maintains the most popular rich text editor framework on the planet(made up statistic, but seems true!).


lol ok we can talk and record it. I want to make it clear that this is NOT a debate, though. I actually want to be convinced!

In our case, we're not using a text editor, but instead building a spreadsheet, so a lot of these collab-built-into-an-editor are, like you say, pedagogically useful but less helpful as direct building blocks that we can just pull in and use. But the advice is very useful, thank you!

Interesting! I am building a spreadsheet and the next few months will be building the collaborative side of it. I think many of the things that work for text don't necessarily translate for spreadsheets.

We made a spreadsheet on top of OT several years ago. Most OT related documentation doesn't talk about how to do this. But it worked pretty well for us.

Cheers for plugging prosemirror-collab-commit! Nice to see it's getting used more.


Author of DocNode here. Yes, it’s still early days. But it’s a very robust library that I don’t expect will go through many breaking changes. It has been developed privately for over 2 years and has 100% test coverage. Additionally, each test uses a wrapper to validate things like operation reversibility, consistency across different replicas, etc.

DocSync, which is the sync engine mainly designed with DocNode in mind, I would say is a bit less mature.

I’d love it if you could take a look and see if there’s anything that doesn’t convince you. I’ll be happy to answer any questions.


I've looked through the site, and right now it's probably the thing I'd try out first, but my main concerns are the missing documentation, particular the more cookbook-y kinds of documentation — how you might achieve such-and-such effect, etc. For example, the sync example is very terse, although I can understand why you'd like to encourage people to use the more robust, paid-for solution! Also just general advice on how to use DocNode effectively from your experience would be useful, things like schema design or notes about how each operation works and when to prefer one kind of operation or structure over another.

All that said, I feel like the documentation has improved since the last time I looked, and I suspect a lot of the finer details come with community and experience.


Thanks! I've recently made some improvements to the documentation. I agree the synchronization section could be improved more. I'll keep your feedback in mind. If you'd like to try the library, feel free to ask me anything on Discord and I'll help you.

Agreed. In my limited experience, conflict resolution rules are very domain specific, whereas CTDTs encourage a lazy attitude that "if it's associative and commutative it must be correct".

What is OT?


What does OT stand for so I can learn more?

Operational Transformation

"CRDTs are a meme and are not for serious applications."

That is one hot take!


Let's balance the discussion a bit.

https://josephg.com/blog/crdts-are-the-future/


Great link! Thanks for sharing.

I remember reading Part 1 back in the day, and this is also an excellent article.

I’ve spent 3+ years fighting the same problems while building DocNode and DocSync, two libraries that do exactly what you describe.

DocSync is a client-server library that synchronizes documents of any type (Yjs, Loro, Automerge, DocNode) while guaranteeing that all clients apply operations in the same order. It’s a lot more than 40 lines because it handles many things beyond what’s described here. For example:

It’s local-first, which means you have to handle race conditions.

Multi-tab synchronization works via BroadcastChannel even offline, which is another source of race conditions that needs to be controlled.

DocNode is an alternative to Yjs, but with all the simplicity that comes from assuming a central server. No tombstones, no metadata, no vector clock diffing, supports move operations, etc.

I think you might find them interesting. Take a look at https://docukit.dev and let me know what you think.


Hello again Germán! Since the product we make is, basically, a local-first markdown file editor, I would humbly suggest that the less-well-known algorithm we recommend is thus also local-first. But, I fully believe that you do a ton of stuff that we don't, and if we had known about it at the time, we very definitely would have taken a close look! We did not set out to do this ourselves, it just kind of ended up that way.

Cool! We also build client-server sync for our local-first CMS: https://github.com/valbuild/val Just as your docsync, it has to both guarantee order and sync to multiple types of servers (your own computer for local dev, cloud service in prod). Base format is rfc 6902 json patches. Read the spec sheet and it is very similar :)

Looks really cool, I would love to use it in my DollarDeploy project. Documentation could be a bit better still, it is not clear, are content is pure markdown or it is typescript files? Which GitHub repo it synchronizes to? I prefer monorepo approach.

Awesome feedback! Will update the docs! The content is TS files. You can chose which repo GitHub you want to synchronize to - monorepo also works!

Should add: you can read more docs here: https://val.build/docs/create

Tiny fail at undo: insert 1 before E, Ctlr+Z, move left/right: left editor moves around E, right editor moves around the nonexistent 1

And for real "action" there should be a delay/pause button to simulate conflicts like the ones described in the blog


Yes, the undo issue is a known bug in the website demo because it's messing with Lexical's undo functionality. It's not actually a DocNode bug. I'll fix it soon.

The feedback about the delay/pause button is also good, thanks!


Part 1 discussion, December 2024: https://news.ycombinator.com/item?id=42343953

At least Yjs, Loro and Automerge must handle some degree of operation reordering. Either causally-consistent or virtually any order.

Back around 2000 or 2001, I got the idea for a collaborative editor that also would have had some UI fanciness in it. I abandoned it when I couldn't find a GUI toolkit that had an acceptable level of quality for that UI fanciness, without itself becoming a multi-year project. So I never even got to the point of playing with the actual collaborative editing aspects.

Having watched that space now for the last nearly 25 years... of all the projects I've abandoned over the years, that is the one that I am most grateful I gave up on. The gulf between "hey what if we could collaboratively edit live" and what it takes to actually implement it is one of the largest mismatches between intuition and reality I know of. I had no idea.


And let's not forget that the official paper on Yjs is just plain wrong, the "proofs" it contains are circular. They look nice, but they are wrong.

This was my impression as well. If you ignore the paper and just look at the source code - and carefully study Seph Gentle's Yjs-like RGA implementation [1] - I believe you find that it is equivalent to an RGA-style tree, but with a different rule for sorting insertions that have the same left origin. That rule is hard to describe, but with some effort one can prove that concurrent insertions commute; I'm hoping to include this in a paper someday.

[1] https://josephg.com/blog/crdts-are-the-future/


Yes, I think it would be a good paper.

I made a tiny self contained implementation of this algorithm here if anyone is curious:

https://github.com/josephg/crdt-from-scratch/blob/master/crd...

FugueMax (or Yjs) fit in a few hundred lines of code. This approach also performs well (better than a tree based structure). And there's a laundry list of ways this code can be optimised if you want better performance.

If anyone is interested in how this code works, I programmed it live on camera in a couple hours:

https://www.youtube.com/watch?v=_lQ2Q4Kzi1I

This implementation approach comes from Yjs. The YATA (yjs) academic paper has several problems. But Yjs's actual implementation is very clever and I'm quite confident its correct.


Could you elaborate on that or share a source? It sounds like it'd be not just interesting but important to learn.

https://dl.acm.org/doi/epdf/10.1145/2957276.2957310

Try to understand 3.1-3.4 in this paper, and you'll find that the correctness proof doesn't prove anything.

In particular, when they define <_c, they do this in terms of rule1, rule2, and rule3, but these are defined in terms of <_c, so this is just a circular definition, and therefore actually not a definition at all, but just wishful thinking. They then prove that <_c is a total order, but that proof doesn't matter, because <_c does not exist with the given properties in the first place.


It's disingenuous to suggest that "Yjs will completely destroy and re-create the entire document on every single keystroke" and that this is "by design" of Yjs. This is a design limitation of the official y-Prosemirror bindings that are integrating two distinct (and complex) projects. The post is implying that this is a flaw in the core Yjs library and an issue with CRDTs as a whole. This is not the case.

It is very true that there are nuances you have to deal with when using CRDT toolkits like Yjs and Automerge - the merged state is "correct" as a structure, but may not match your scheme. You have to deal with that into your application (Prosemirror does this for you, if you want it, and can live with the invalid nodes being removed)

You can't have your cake and eat it with CRDTs, just as you can't with OT. Both come with compromises and complexities. Your job as a developer is to weigh them for the use case you are designing for.

One area in particular that I feel CRDTs may really shine is in agentic systems. The ability to fork+merge at will is incredibly important for async long running tasks. You can validate the state after an agent has worked, and then decide to merge to main or not. Long running forks are more complex to achieve with OT.

There is some good content in this post, but it's leaning a little too far towards drama creation for my tast.


You can split CRDT libs and compose them however you want, but most teams never get past the blessed bindings, because stitching two moving targets together by hand is miserable even if you know both codebases. Then you're chasing a perf cliff and weird state glitches every time one side revs.

In theory you can write better bindings yourself. In practice, if the official path falls over under normal editing, telling people to just do more integration work sounds a lot like moving the goalposts.


Author here, sorry if this was not clear: that specific point was not supposed to be an indictment of all CRDTs, it was supposed to be much more narrow. Specifically, the Yjs authors clearly state that they purposefully designed its interface to ProseMirror to delete and recreate the entire document on every collab keystroke, and the fact that it stayed open for 6 YEARS before they started to try to fix it, does in my opinion indicate a fundamental misunderstanding of what modern text editors need to behave well in any situation. Not even a collaborative one. Just any situation at all.

I think it's defensible to say that this point in particular is not indicting CRDTs in general because I do say the authors are trying to fix it, and then I link to the (unpublicized) first PR in that chain of work (which very few people know about!), and I specifically spend a whole paragraph saying I hope that I a forced to write an article in a year about how they figured it all out! If I was trying to be disingenuous, why do any of that?


> sorry if this was not clear

It's easy to make that mistake reading your post because of sentences like

> I want to convince you that all of these things (except true master-less p2p architecture) are easily doable without CRDTs

> But what if you’re using CRDTs? Well, all these problems are 100x harder, and none of these mitigations are available to you.

It sure sounds a lot like you're calling CRDTs in general needlessly complex, not just the yjs-prosemirror integration.


To be clear, we ARE arguing CRDTs needlessly complex for the centralized server use case. What I am describing in the "delete and replace all on every keystroke" problem is the point at which it became clear to me that the project did not understand what modern text editors need to perform well in any circumstance, let alone a collab one.

I think this is still reasonable to say because the final paragraph in that section is 100% about how they might fix the delete-all problem, and I hope they do, so that I can write about that, too. But also, that the rest of the article is going to be about how you have to swim upstream against their architecture to accomplish things that are either table stakes or trivial in other solutions.


> To be clear, we ARE arguing CRDTs needlessly complex for the centralized server use case.

I've been working in the OT / CRDT space for ~15 years or so at this point. I go back and forth on this. I don't think its as clear cut as you're making it out to be.

- I agree that OT based systems are simpler to program and usually simpler to reason about.

- Naive OT algorithms perform better "out of the box". CRDTs often need more optimisation work to achieve the same performance.

- But with some optimisation work, CRDTs perform better than OT based systems.

- CRDTs can be used in a client/server model or p2p. OT based systems generally only work well in a centralised context. Because of this, CRDTs let you scale your backend. OT (usually) requires server affinity. CRDT based systems are way more flexible. Personally I'd rather complex code and simpler networking than the other way around.

- Operation based CRDTs can do a lot more with timelines - eg, replaying time, rebasing, merging, conflicts, branches, etc. OT is much more limited. As a result, CRDT based systems can be used for both realtime editing or for offline asyncronous editing. OT only really works for online (realtime) editing.

(For anyone who's read the papers, I'm conflating OT == the old Jupitor based OT algorithm that's popular in google docs and others.)

CRDTs are more complex but more capable. They can be used everywhere, and they can do everything OT based systems can do - at a cost of more code.

You can also combine them. Use a CRDT between servers and use OT client-to-server. I made a prototype of this. It works great. But given you can make even the most complex text based CRDT in a few hundred lines anyway[1], I don't think there's any point.

[1] https://github.com/josephg/egwalker-from-scratch


The algorithm in prosemirror-collab-commit is inspired by Google Wave, and implemented as a slight tweak to the prosemirror-collab system. The tweak is in the name.

I'm not sure about classical OT, and it's been a really long time since I wrote prosemirror-collab-commit, but.. On the authority it's more like nth triangle where n is the number of concurrent commits being processed based off the same document version for mapping. So 50 clients sending in commits at the same based off the same doc version would be (50 * 51) / 2. Applying has a different, potentially larger, cost and that's O(n).

You don't have to have server affinity, but I'd be cooler if you did. Locks you need.

It works offline, sure. For some definition of "works". The further histories diverge the greater the chance of losing user intent. But that really depends on the nature of the divergence. Some inspection and heuristics could probably be used to green-light a LOT of "offline" scenarios before falling back on an interactive conflict resolution strategy.

I'm not sure what magic CRDTs exist today, but in the case of Yjs and ProseMirror allowing histories to drift too far will absolutely risk stomping all over user intent when they are brought back together.


The magic of CRDTs does not prevent this. They are in exactly the same boat as OT, prosemirror-collab and prosemirror-collab-commit. It can't be prevented. The problem is worse with CRDTs because they instantly destroy user intent in the conversion to/from their underlying representation, which is the XML document. See discussion with Marijn about, e.g., splitting blocks above.

Heya, cheers. I'm actually intimately familiar with the node splitting issue. I've created y-prosemirror backends(complete with edit history, snapshots, etc) and "rewrote" y-prosemirror in TypeScript heavily refactoring and modifying it for some crazy use cases.

Those use cases hit a wall with, and I'm a bit fuzzy, the Yjs data structure and y-promirror diffing algorithm destroying and creating new XML nodes; black-holing anything else that occurred in them or duplicating content.


Actually, I think I agree with most of this... except the part where you think it's not clear-cut, ha ha. I meant that not as a comparison between OT and CRDTs, but as a comparison to prosemirror-collab. My opinion is that in the centralized server case, it is (unfortunately) basically better in every dimension.

> But with some optimisation work, CRDTs perform better than OT based systems.

I read your paper and I think this is a mistake. You assume that OT has quadratic complexity because you're considering classic operation-based OT. But OT can be id-based, in which case operations are transformed directly on the document, not on other operations. This is essentially CRDT without the problems of supporting P2P, and therefore the best CRDT will never perform better than the best OT.

> CRDTs let you scale your backend. OT (usually) requires server affinity. CRDT based systems are way more flexible. Personally I'd rather complex code and simpler networking than the other way around.

All productivity apps that use these tools in any way shard by workspace or user, so OT can scale very well.

If you don't scale CRDT that way, by the way, you'd be relying too much on "eventual consistency" instead of "consistency as quickly as possible."

> (For anyone who's read the papers, I'm conflating OT == the old Jupiter-based OT algorithm that's popular in Google Docs and others.)

Similar to what I said before. I think limiting OT to an implementation that’s over three decades old doesn’t do OT justice.


> I think limiting OT to an implementation that’s over three decades old doesn’t do OT justice.

I haven't kept up with the OT literature after a string of papers turned out to "prove correctness" in systems which later turned out to have bugs. And so many of these algorithms have abysmally bad performance. I think I implemented an O(n^4) algorithm once to see if it was correct, but it was so slow that I couldn't even fuzz test it properly.

> You assume that OT has quadratic complexity because you're considering classic operation-based OT. But OT can be id-based, in which case operations are transformed directly on the document, not on other operations.

If you go down that road, we can make systems which are both OT and CRDT based at the same time. Arguably my eg-walker algorithm is exactly this. In eg-walker, we transform operations just like you say - using an in memory document model. And we get most of the benefits of OT - including being able to separately store unadorned document snapshots and historical operation logs.

Eg-walker is only a CRDT in the sense that it uses a grow-only CRDT of operations, shared between peers, to get the full set of operations. The real work is an OT system, that gets run on each peer to materialise the actual document.

> This is essentially CRDT without the problems of supporting P2P, and therefore the best CRDT will never perform better than the best OT.

Citation needed. I've published plenty of benchmarks over the years from real experiments. If you think I'm wrong, do the work and show data.

My contention is that the parts of a CRDT which make them correct in P2P settings don't cost performance. What actually matters for performance is using the right data structures and algorithms.


> Citation needed

It seems to me the burden of proof is on you. You were the one who claimed that “CRDTs perform better than OT-based systems.” I’m simply denying it. My reasoning is that CRDTs require idempotence and commutativity, while OTs do not. What requirement does OT have that CRDT does not? Because if there isn’t one, then by definition your claim can’t be correct. And if there is one, that would be new to me, although I suspect you might be using a very particular definition of OT.


> It seems to me the burden of proof is on you. You were the one who claimed that “CRDTs perform better than OT-based systems.”

Ah, I assumed we were talking about Jupiter based OT systems - which are outperformed by their newer cousins (like eg-walker). Like you say, these use a different data structure to transform changes and that's why they're faster.

> My reasoning is that CRDTs require idempotence and commutativity, while OTs do not.

The only property not required by a centralized OT system is the OT TP2 property. Ie, T(op3, op1 + T(op2, op1) == T(op3, op2 + T(op1, op2)). Central servers also give you a single global ordering.

If you discard TP2 and add global ordering, does that open the door to new optimisations? I don't know, and I certainly can't prove the absence of any such optimisations. So I think the burden of proof is on you.


The root of our misunderstanding or debate is clear: although CRDT is fairly well defined, I don’t think the same is true for OT.

What I have in mind is what I mentioned earlier:

> OT can be id-based, in which case operations are transformed directly on the document, not on other operations.

This is exactly what I do in my library DocNode[1], which I describe as “id-based OT”.

With this model, it’s not even necessary to satisfy TP1. In fact, the concept T(o1, o2) doesn’t exist, because operations aren’t “transformed” against other operations, but against the document. Maybe the word “transform” is a bit misleading, and “apply” would be more appropriate. The problem is that there is still a slight transformation. For example, if a client inserts a node between nodes A and B, but by the time it reaches the server B has been deleted, the effective operation might become “insert between A and C”.

The server is append-only. The client has several options to synchronize with the server: rebasing, undo-do-redo, or overwriting the document.

Maybe I’m the one who shouldn’t describe DocNode as “[id-based] OT” and should instead coin a new term. Operational Application (OA)? Operations Without Transformation (OWT)? Operations Directly Transformed (ODT)? Operational Rebasing (OR)? Not sure. What would you recommend?

[1] https://www.docukit.dev/docnode


> the fact that it stayed open for 6 YEARS before they started to try to fix it...

This is all opensource software, provided for free by volunteers. If you want better bindings, go write them. Or pay someone else to do so.


Just to be extremely clear: we pay for a lot of OSS software. We pay one individual project more than $10,000 a year. We would have paid for Yjs too, if we thought it was a good use of resources!

The biggest evidence against collaborative editing working and being useful is that programmers don't use it. We go through the pain of having git branches and manual merges.

We're the nerdiest bunch in the world, absolutely willing to learn and adapt the most arcane stuff if it gives us a real of percieved advantage, yet the fact that Google Docs style CRDTs have completely elided the profession speaks volumes about their actual usefulness.


> The biggest evidence against collaborative editing working and being useful is that programmers don't use it. We go through the pain of having git branches and manual merges.

Hmm -- this seems a bit apples and oranges to me: collaborative editing is sync; git branches, PRs, etc. are all async. This is by design! You want someone's eyes on a merge, that's the whole rationale behind PRs. Collab editing tries to make merges invisible.

Totally different use case, no?


Collaborative coding is a niche but possibly interesting use case. I’m thinking of notebook cells with reactive inputs and outputs. Actually not dissimilar to a spreadsheet in many ways.

The biggest evidence for collaborative editing is the immense popularity of Google Docs, Notion and Figma.

Just because programming code isn't a good use case for automated conflict resolution doesn't mean everything else isn't.

Just imagine non-technical people using git to collaborate on a report, essay, or blog post. It's never going to happen.


> The biggest evidence against collaborative editing working and being useful is that programmers don't use it. We go through the pain of having git branches and manual merges.

But git branches are collaborative editing! They're just asynchronous collaborative editing.

It would be possible to build a git clone on top of CRDTs, which had the same merge conflict behaviour. The advantage of a system like that would be that you could use the same system for both kinds of collab. editing - realtime collab editing and offline / async collab editing. Its just, nobody has built that yet.

> the fact that Google Docs style CRDTs have completely elided the profession speaks volumes about their actual usefulness.

Software engineers still rely on POSIX files for local cross-app interoperability. Eg, I save a file in my text editor, then my compiler reads it back and emits another file, and I run that. IMO the real problem is that this form of IPC is kind of crappy. There's no good way to get a high fidelity change feed from a POSIX file. To really use CRDTs we'd need a different filesystem API. And we'd need to rewrite all our software to use it.

That isn't happening. So we're stuck with hacks like git, which have to detect and reconstruct all your editing changes using diffs every time you run it. This is why we don't have nice things.


> There's no good way to get a high fidelity change feed from a POSIX file.

Personally, my main point of frustration with git is the lack of a well-supported ast-based diff. Most text in programming are actually just representations of graphs. I’m sure there is a good reason why it hasn’t caught on, but I find that line-based diffs diverging from what could be a useful semantic diff is the main reason merge conflicts happen and the main reason why I need to stare hard at a pull request to figured out what actually changed.


>Google Docs style CRDTs

Google Docs is OT though.


Normal documents don't have broken builds when lines are incomplete. It's a completely different situation and makes sense why manually controlling it in chunks is better.

Also note that our use case is much simpler. The programming language tells you whether your merge created a valid document.

I've never seen that information actually being used in any merge tool, with the notable exception of Visual Studio/C# (where you get symbol resolution for the merged doc, but even there the autogenerated result is a bit hit and miss)

I think the reason is that the algorithms want to be content-agnostic.

But it's of course weird — as a user — to see a conflict resolution tool confidently return something that's not even syntactically valid.


Fantastic article. I was particularly interested because WordPress has been working to add collaborative editing and the implementation is based on yjs. I hope that won't end up being an issue...

It would have been nice if the article compared yjs with automerge and others. Jsonjoy, in particular, appears very impressive. https://jsonjoy.com/


The transport for collaborative editing in Wordpress 7.0 is HTTP polling. Once per second, even if no one else is editing. It jumps to 4 requests/sec if just two people are editing. And it's enabled by default on all sites, though that might not be the case when it leaves beta.

The transport is a completely different concern... (though there's also a websocket implementation).

They use Yjs: https://make.wordpress.org/core/2026/03/10/real-time-collabo...


The PowerSync folks and I worked on a different approach to ProseMirror collaboration here: https://www.powersync.com/blog/collaborative-text-editing-ov... It is neither CRDT nor OT, but does use per-character IDs (like CRDTs) and an authoritative server order of changes (like OT).

The current implementation does suffer from the same issue noted for the Yjs-ProseMirror binding: collaborative changes cause the entire document to be replaced, which messes with some ProseMirror plugins. Specifically, when the client receives a remote change, it rolls back to the previous server state (without any pending local updates), applies the incoming change, and then re-applies its pending local updates; instead of sending a minimal representation of this overall change to ProseMirror, we merely calculate the final state and replace with that.

This is not an inherent limitation of the collaboration algorithm, just an implementation shortcut (as with the Yjs binding). It could be solved by diffing ProseMirror states to find the minimal representation of the overall change, or perhaps by using ProseMirror's built-in undo/redo features to "map" the remote change through the rollback & re-apply steps.


Hi Matt! Good to see you here. For those who don't know, Matt also wrote a blog about how to do ProseMirror sync without CRDTs or OT here: https://mattweidner.com/2025/05/21/text-without-crdts.html and I will say I mostly cosign everything here. Our solution is not 100% overlap with theirs, but if it had existed when we started we might not have gone down this road at all.

Your part 1 post was one of the inspirations for that :)

Specifically, it inspired the question: how can one let programmers customize the way edits are processed, to avoid e.g. the "colour" -> "u" anomaly*, without violating CRDT/OTs' strict algebraic requirements? To which the answer is: find a way to get rid of those requirements.

*This is not just common behavior, but also features in a formal specification [1] of how collaborative text-editing algorithms should behave! "[The current text] contains exactly the [characters] that have been inserted, but not deleted."

[1] http://www.cs.ox.ac.uk/people/hongseok.yang/paper/podc16-ful...


Great article - you mentioned "two most popular families of collab editing [...] OT and CRDT". One thing you should look into is the work of https://braid.org - Its combining crdt with ot. Work that is inspired by that build the basis of loro which allows to prune history (helping with the tombstone issue you mentioned)

Couldn't agree more with the gist of the argument, especially in the context of ProseMirror.

That's why I created prosemirror-collab-commit.


Alternatively, a much simpler CRDT solution is to flatten our tree and build a LWW underneath it. This makes it easy to debug, save, and delete the history. { “id:1”: { “parent_id”: “root”, “type”: “p” }, “id:2”: { “parent_id”: “id:1”, “type”: “text”, “content”: “text”, "position": 1 } }

Or internally: [ [HLC, “id:2”, ‘parent_id’, “id:1”], [HLC, “id:2”, ‘type’, “text”], ... ]

Merging is easy, and allows for atomic modifications without rebuilding the entire tree, as well as easy conflict resolution. We add the HLC (clock, peer id). If the time difference between the two clocks is significant, we create a new field [HLC, id, “conflict:” + key, old_value]


Hi folks, author here. I thought this was dead! I'm here to answer questions if you have them.

EDIT: I live in Seattle and it is 12:34, so I must go to bed soon. But I will wake up and respond to comments first thing in the morning!


Hi Alex, I'm the author of prosemirror-collab. I agree with your point that CRDTs are not the solution they often claim to be, and that CRDT editor integrations (at least the ones for Yjs and Automerge) are often shockingly sloppy.

But, seeing how I've had several people who read your article write me asking about this miraculous collab implementation, I want to push back on the framing that ProseMirror's algorithm is 'simple' or '40 lines of code'. The whole document and change model in ProseMirror was designed to make something like prosemirror-collab possible, so there is a lot of complexity there. And it has a bunch of caveats itself—the server needs to hold on to changes as long as there may be clients to need to catch up, and if you have a huge amount of concurrent edits, the quadratic complexity of the step mapping can become costly, for example. It was designed to support rich text, document schemas, and at least a little bit of keeping intentions in mind when merging (it handles the example in the first post of your series better, for example), but it's not a silver bullet, and I'd hate for people to read this and go from thinking 'CRDT will solve my problems' to 'oh I need to switch to ProseMirror to solve my problems'.


Ok Marijn I understand. I'm sorry I caused you an inconvenience. Of course, I know that implementing server-side prosemirror-collab is not entirely without problems (since we have done it) and take your point, which is correct. If I was to do this again I'd find a different way to say this than "40 lines of code."

With that said... I do not agree it is not "simple." Or at least, I think it is about as simple as it can possibly be.


Just wanted to say thanks! This is a great write up and resonates with issues I encountered when trying to productionise a yjs backed feature.

YJS works perfectly. I use it for years on PlayCode. But you are talking about the specific plugin for prosemiror.

Yes, here I agree: yjs core is well written, while plugins are “nice to have”.


From the "40 line CRDT replacement":

    const result = step.apply(this.doc);
    if (result.failed) return false;
I suspect this doesn't work.

Author here. I'll actually defend this. Most of the subtlety of this part is actually in document schema version mismatches, and you'd handle that at client connect, generally, since we want the server to dictate the schema version you're using.

In general, the client implementation of collab is pretty simple. Nearly all of the subtlety lies in the server. But it, too, is generally not a lot of code, see for example the author's implementation: https://github.com/ProseMirror/website/tree/master/src/colla...


Sorry that I am too stupid to understand what Moment is.

It is a collaborative markdown file that also renders very fast. So far so good.

And then... it somehow adds Javascript? And React? And somehow AI is involved? I truly don't understand what it is, and I am (I think) the end customer...

edit: I tried it and I just get "Loading..." forever. So, anyway, next time.


Hey karel-3d, I'm one of the engineers working on Moment and would love to help figure out the issue you're running into. Would you mind reaching out via our Discord or email (trey@moment.dev)?

I would like to know if you plan to open source anything, and how much. https://github.com/orgs/moment-eng/ looks a bit empty

OK I will be happy to help. I didn't mean to be dismissive! Will ping you tomorrow

unfortunately I cannot reproduce the "Loading..." issue. Now everything works. (I still don't fully understand Moment. But reading Agents.md ironically helps me understand it a bit.)

I think Y.js 14 and the new y-prosemirror binding fix a lot of the encountered issues

It might fix the replace-everything bug. It definitely does not fix any of the other issues I mentioned. Even just taking the permissions problem: Yjs is built for a truly p2p topology you as a baseline will have a very hard time establishign which peers are and aren't allowed to make which edits. You can adopt a central server, but then the machinery that makes Yjs amenable to p2p is uselessly complicated. And if you cross that bridge, you'll still have to figure out how to let some clients do mark-only edits to the document for things like comments, while other can edit the whole text. That can be done but it's not at all straightforward, because position mapping is very complicated in the Yjs world.

It should be noted that this is about text editing specifically, and for other use-cases YJS is using other code pathways/algorithms, but you have to be careful how you design your data structure for atomic updates.

I'm curious how these approaches compare with MRDTs implemented in Irmin

https://gowthamk.github.io/docs/mrdt.pdf


Collaborative editing looks deceptively simple until you deal with real-world concurrency and network issues. Operational transforms and CRDTs both introduce their own tradeoffs.

we're about to implement collaborative editing at Mintlify and were considering yjs so this couldn't have come at a better time

Author here, my personal mission is for people implementing this to have clear, actionable advice. Which is something we did not when we started. If you want to chat about it I'm happy to help, just email me: clemmer.alexander@gmail.com

Replacing CRDT with 40 lines of code. Amazing.

It appears Moment is producing "high-performance, collaborative, truly-offline-capable, fully-programmable document editor" - https://www.moment.dev/blog

There seems to be a conflict of interest with describing Yjs's performance, which basically does the same thing along with Automerge.


Author here. To be clear, we do not in ANY WAY compete with Yjs! We are a potential customer of Yjs. This article explains why we chose not to be a customer of Yjs, and why we don't think most people building real-time collaborative text editors should be, either.

You have an amazing tagline. This is the first time I read a tagline and thought: this is exactly what I was looking for.

But the product seems much more narrow than an actual tool run the whole business in markdown. I was hoping to see Logseq on steroids, and it feels like a tool builder primarily. I love the tool building aspect, but the fundamentals of simply organizing docs (docs, presentations, assets etc, the basics of a business) are either not part of the core offering or not presented well at all.

I love the idea of building custom tools on top of MD and it's part of my wishlist, but I feel little deceived by your tagline so I wanted to share that :)


This is great feedback, thank you. I will say that IS our goal... but we only really launched last week and are still figuring out what resonates with people and what they really want! It sounds like you're saying that the organization aspects are not there, which is very helpful to know... I am not quite sure I understand if you also think the toolbuilding is lacking?

If you are open to it, I'd love the opportunity to hear more. Here or email (alex@moment.dev) or our Discord (bottom right of our website) or Twitter/X... or whatever you prefer.


No, the tool building looks very sophisticated and powerful and I love that it hinges very much on the new era of building your own custom tools with the help of agents. The live collaboration on top of md files is also exactly what I was looking for!

If you're saying that Logseq on steroids is what you're aiming for, then, my immediate feedback would be to emphasize more: - the writing experience: at the end of the day, writing and taking notes will be the most common activity - the file organisation: tags, templates, media files, does it do the basics? - the sharing and access mechanism: can I easy share a doc with a partner / client?

Those are the basics of daily business tasks for my consultancy, and so the first thing I'm looking for. I really wish to get off Google drive, but those points need to be solved for that to sound feasible.

As for the tool building it looks very powerful, but the first example you presented (on-call dashboard), was a bit too much from the get go to wrap my head around the building blocks of your system. I've been building custom tools/wrappers of varied complexity on top of markdown for my team, from a custom revealJS skill that follows our design guide, to a form builder to a project/client DB that wraps duckdb (for yaml frontmatter parsing) with a semantic layer. I've watched your intro video but I'm still not sure whether your service would help me more closely integrate those tools to my company's knowledge base or not.

But once again, if your vision matches your tagline, then I'm really looking forward to hear more from you


That doesn't make sense. If you are a customer that implies you pay for it, so people can be users of Yjs which is free and open-source, but not customers.

The logic that makes sense is you are using your own framing (Moment.dev will later be paid and people will be customers) to interpret Yjs.

Moreover, the 'social proof' posted by the following later on by 'auggierose' and 'skeptrune': - https://news.ycombinator.com/item?id=47396154 - https://news.ycombinator.com/item?id=47396139

Appears, to me, to be manufactured. The degree of consolidation in this 'SF/Bay Area tech cult' which I've noticed, although I am unsure if others are aware, that tries to help other members at the expense of quality, growing network wealth through favoritism rather than adherence to quality, is counterpoint to users whose interest is high quality software without capture.

While you may not like me describing this, it is not in your own interest to do this because it catabolizes the base layer that would sustain you. Social media catabolizes actual social networks, as AI catabolizes those who write information online. Behavior like this ruins the public commons over time.


I'm not sure I fully understand, but to be clear, we actually do voluntarily pay for the Free and OSS software we use. For example, we support `react-prosemirror` directly with monetary compensation. And if we used Yjs, we would have paid for that too. So in that sense, I do think of us as customers!

It's hard to tell, but I think you also might be saying that criticizing the FOSS foundations of our product actually hurts the ecosystem. I actually am very open to that, and it's why we took so much time writing it since part 1 came out. But the Yjs-alternative technology we use is all also F/OSS, and we also do directly support it, with actual money from our actual bank account. All I'm recommending here is that others do the same. Sorry if that was not clear.

The rest of your reply, I'm not sure I grok. I think you might be suggesting that we are sock-puppeting `auggierose` or `skeptrune`, and that we are part of some (as you put it) "cult" of the Bay area! Let me be clear that neither of these things true. I don't know anyone at Mintlify personally, and in any event we are from Seattle not the Bay!


No, you're not sock-puppeting it yourself. But you all are probably friends and cross-promoting. It's a common business strategy these days, but to some underhanded seeming compared to straightforward ways.

Anyhow, we just have different norms of being. I still stand by my above statements and observations, which you reject but has plausible deniability, so we'll just leave it as is.


Reminds me a bit of google-mobwrite. I wonder why that fell out of favour.

I just read part 1 as well as part 2, for me it raises an interesting question that wasn't addressed. I correctly guessed the question posed about the result of the conflict, and while it's true that's not the end result I'd probably want, it's also important because it gives me visibility of the other user's change. Both users know exactly what the other did - one deleted everything, the other added a u. If you end up with an empty document, the deleting user doesn't know about the spelling correction that may need to be re-applied elsewhere. Perhaps they just cut and pasted that section elsewhere in the document.

But there's another issue that the author hasn't even considered, and possibly it's the root cause why the prosemirrror (which I'd never heard of before btw) does the thing the author thinks is broken... Say you have a document like "请来 means 'please go'" and independently both the Chinese and English collaborators look at that and realise it's wrong. One changes it to "请走 means 'please go'" and the other changes it to "请来 means 'please come'". Those changes are in different spans, and so a merge would blindly accept both resulting in "请走 means 'please come'" which is entirely different from the original, but just as incorrect. Depending on how much other interaction the authors have, this could end up in a back and forth of both repeatedly changing it so the merged document always ended up incorrect, even though individually both authors had made valid corrections.

That example seems a bit hypothetical, but I've experienced the same thing in software development where two BAs had created slightly incompatible documents stating how some functionality should work. One QA guy kept raising bugs saying "the spec says it should do X", the dev would check the cited spec and change the code to match the spec. Weeks later, a different QA guy with a different spec would raise a bug saying "why is this doing X? The spec says it should do Y", a different dev read the cited spec, and changed the code. In this case, the functionality flip-flopped about 10 times over the course of a year and it was only a random conversation one day where one of them complained about a bug they'd fixed many times and the other guy said "hey, that bug sounds familiar" and they realised they were the two who'd been changing the code back and forth.

This whole topic is interesting to me, because I'm essentially solving the same problem in a different context. I've used CRDT so far, but only for somewhat limited state where conflicts can be resolved. I'm now moving to a note-editing section of the app, and while there is only one primary author, their state might be on multiple devices and because offline is important to me, they might not always be in sync. I think I'm probably going to end up highlighting conflicts, I'm not sure. I might end up just re-implementing something akin to Quill's system of inserts / deletes.


I see someone has downvoted my actually relevant post. Not sure why, but anyway.

I also tried out the behaviour of their example. Slowing the sync time down to 3 seconds, and then typing "Why not" and then waiting for it to sync before adding " do this?" on client A and " joke?" on client B. The result was "Why not do this? joke?" when I'd have hoped that this would have been flagged as a conflict. Similarly, starting with "Why not?" and adding both " do this" and " joke" in the different clients produced "Why not do this joke?" even though to me, that should have been a conflict - both were inserting different content between "t" and "?".

Finally, changing "do" to "say" in client A and THEN changing "do" to "read" in client B before it updated, actually resulted in a conflict in the log window and the resultant merge was "Why not rayead this joke?" Clearly this merge strategy isn't that great here, as it doesn't seem to be renumbering the version numbers based on the losing side (or I've misunderstood what they're actually doing).


Component library page in the docs gives 404

I read both parts. Well written, I agree with a lot of stuff.

I am long-time CKEditor dev, I was responsible for implementing real-time collaboration in the editor and the OT implementation.

Regarding the first part of your article. Guess what - CKEditor would output "" :). And even better, if the user who deleted all does undo, you'd get "u" where it was typed originally.

However, I fully agree, that for every algorithm, you will be able to find a scenario where it fails to resolve conflict in a way expected by the user. But we cannot ask user to resolve a conflict manually every time it happens.

Offline editing, as you correctly observed, is more difficult, because the conflicts pile up, and multiple wrong decisions can result in a horrifying final result. I fully agree, that this is not only an algorithmic problem but also a UX problem. Add to this, that in many apps, you will also have other (meta)data that has to be synced too (besides document data).

CKEditor is, in theory, ready for offline editing. From algorithm POV, offline is no different than very very very slow connection (*). In the end, you receive a set of operations to transform against other set of operations. However, currently we put the editor in read-only state when the connection breaks. We are aware, that even if all transformations resolve as expected, then the end result may still be "weird". And even if the end result is actually as expected, the amount of changes may be overwhelming to a person who just got the connection back, so it still may be good to provide some UI/UX to help them understand what happened.

(*) - that is, unless the editing session on the server ended already, and, simply saying, you don't have anything to connect to (to pull operations from).

Regarding OT. I have a feeling that one mistake most people make, is that they take OT as it is described in some papers or article, and don't want to iterate over this idea. To me, this is not just one algorithm, rather an idea of how to think about and mange changes happening to the data.

For CKEditor, from the very beginning, we were forced to innovate over typical OT implementations. First of all we focused on users intentions. Second of all, we needed to adapt it to tree data structure. These challenges shaped my way of thinking - OT is "an idea", you need to adapt it to your project. Someone here asked if there's library for OT, because they want to use it for spreadsheets. I'll say -- write it on your own and adapt it to spreadsheets. You'll discover that maybe you don't need some operations, or maybe you need new operations dedicated for spreadsheets. This is what we ended up doing. @Reinmar already posted this link here, but we describe our approach here: https://ckeditor.com/blog/lessons-learned-from-creating-a-ri....

Circling back to your example with typing and removing whole sentence. This is how you innovate over OT. To us, such deletion is not deleting N singular characters starting from position P. The intention is to remove some continuous range of text. If someone writes inside the range, it just changes the boundary of stuff to remove, but surely we don't want to show some random letters after the deletion happens. We account for that and make modifications in our OT implementation.

Similarly with positions in document. In CKEditor, you can use LivePositions and LiveRanges, which are basically paths in tree data structure. Every position is transformed by operation too. Many features we have base on that.

So, my take here is -- don't bash OT because you based your experience on some simple implementations. Possibly the same is with Yjs. Don't bash CRDTs because Yjs is doing something badly?

And some final words regarding the second part.

We also follow the same pattern as your diagram shows in "How the simple thing works" section. As I was reading through the article, and looking at provided examples, it's hard for me not to think, that what's happening is some kind of an OT-variant, maybe simplified, or maybe adapted to some specific cases. But there are strong similarities between what you described and CKEditor 5, and we use OT. Like, looking at this from top-level view, I could say, "well, we do the same". We have the same loop with conflict resolution, we just call "rebase" a "transformation", and instead "steps" we have "operations".

Also, you say it is 40LOCs, but how much magic happens in `step.apply()`? How much the architecture was made to make it possible? Even Marijn makes this comment here: https://news.ycombinator.com/item?id=47409647.

For comparison, this is CKEditor's file that includes the OT functions to transform operations: https://github.com/ckeditor/ckeditor5/blob/master/packages/c.... It's 2600LOCs (!), but at least most of it are comments :). Again, the basic idea for OT is very simple (and this implementation could be simpler, we also learned a lot in the process). It's up to you how much you want to delve into solving "user intention" issues.


> Also, you say it is 40LOCs, but how much magic happens in `step.apply()`?

Right, but if you are already using ProseMirror that infrastructure is in place if you are taking advantage of it directly or bolting Yjs on top..


(Xpost from my lobsters comment since the Author's active over here):

I really disagree with this article - despite protestation, I feel like their issue is with Yjs, not CRDTs in general.

Namely, their proposed solution:

    1. For each document, there is a single authority that holds the source of truth: the document, applied steps, and the current version.
    2. A client submits some transactional steps and the lastSeenVersion.
    3. If the lastSeenVersion does not match the server’s version, the client must fetch recent changes(lastSeenVersion), rebase its own changes on top, and re-submit.
    (3a) If the extra round-trip for rebasing changes is not good enough for you, prosemirror-collab-commit does pretty much the same thing, but it rebases the changes on the authority itself.
This is 80% to a CRDT all by itself! Step 3 there, "rebase its own changes on top" is doing a lot of work and is essentially the core merge function of a CRDT. Also, the steps needed to get the rest of the way to a full CRDT is the solution to their logging woes: tracking every change and its causal history, which is exactly what is needed to exactly re-run any failing trace and debug it.

Here's a modified version of the steps of their proposed solution:

    1. For each document, every participating member holds the document, applied steps, and the current version.
    2. A client submits (to the "server" or p2p) some transactional steps and the lastSeenVersion.
    3. If the lastSeenVersion does not match the "server"/peer’s version, the client must fetch recent changes(lastSeenVersion). The server still accepts the changes. Both the client and the "server" rebase the changes of one on top of the other. Which one gets rebased on top of the other can be determined by change depth, author id, real-world timestamp, "server" timestamp, whatever. If it's by server timestamp, you get the exact behavior from the article's solution.
If you store the casual history of each change, you can also replay the history of the document and how every client sees the document change, exactly as it happened. This is the perfect debugging tool!

CRDTs can store this casual history very efficiently using run-length encoding: diamond-types has done really good work here, with an explanation of their internals here: https://github.com/josephg/diamond-types/blob/master/INTERNA...

In conclusion, the article seems to be really down on CRDTs in general, whereas I would argue that they're really down on Yjs and have written 80+% of a CRDT without meaning to, and would be happier if they finished to 100%. You can still have the exact behavior they have now by using server timestamps when available and falling back to local timestamps that always sort after server timestamps when offline. A 100% casual-history CRDT would also give them much better debugging, since they could replay whatever view of history they want over and over. The only downside is extra storage, which I think diamond-types has shown can be very reasonable.


I know it seems that way, but it's actually not 80% of the way to a CRDT because rich text CRDTs are an open research problem. Yjs instead models the document as an XML tree and then attempts to recreate the underlying rich text transaction. This is much, much harder than it looks, and it's inherently lossy, and this fundamental impedance mismatch is one of the core complaints of this article. Some progress is being made on rich text CRDTS, e.g., Peritext[1]. But that only happened a few years ago.

Another important thing is that CRDTs by themselves cannot give you a causal ordering (by which I mean this[2]), because definitionally causal ordering requires a central authority. Funnily enough, the `prosemirror-collab` and `prosemirror-collab-commit` do give you this, because they depend on an authority with a monotonically increasing clock. They also also are MUCH better at representing the user intent, because they express the entirety of the rich text transaction model. This is very emphatically NOT the case with CRDTs, which have to pipe your transaction model through something vastly weaker and less expressive (XML transforms), and force you to completely reconstruct the `Transaction` from scratch.

Lastly for the algorithm you propose... that is, sort of what `prosemirror-collab-commit` is doing.

[1]: https://www.inkandswitch.com/peritext/

[2]: https://www.scattered-thoughts.net/writing/causal-ordering/


The actual point of the post: Y.js is slow and buggy.

Very likely AI slop, very hard to read. Too many indications. HN should have another rule: explicitly mention if article was written (primarily) by AI.

I'm the author. Literally 0% of this was written with AI. Not an outline, not the arguments, not a single word in any paragraph. We agonized over every aspect of this article: the wording, the structure, and in particular, about whether we were being fair to Yjs. We moved the second and third section around constantly. About a dozen people reviewed it and gave feedback.

EDIT: I will say I'm not against AI writing tools or anything like that. But, for better or worse, that's just not what happened here.


Apologies. Was it at all edited by an AI?

I do not know how to say this another way. No. Nothing in this article was created, edited, outlined, pre-drafted, suggested, or in any other way influenced by AI. AI did not write the words, it did not review the words, it was not a part of a pre-writing discussion that influenced the words. There is NO AI IN THIS ARTICLE OR AS AN INPUT TO THIS ARTICLE. If the article is an apple, then 0% of the apple is AI or AI influenced.

And let this be a lesson to you. You apparently do not have any ability to distinguish between these two kinds of thing.


It doesn’t strike me as AI. The writing is reasonably information-dense and specific, logically coherent, a bit emotional. Rarely overconfident or vague. If it is AI then there was a lot more human effort put into refining it than most AI writing I’ve read.

Funnily enough I had 2 HN tabs open, this one and https://news.ycombinator.com/item?id=47394004



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: