Note that CRDT stands for Conflict free Replicated Datatype and there are a few sub-acronym expansions: Convergent Replicated Datatype and Commutative Replicated Datatype.
[EDIT]
I wanted to add that CRDTs were mentioned here because in order for them to provide the kind of guarantees they do they must satisfy the law of associativity (or, be a semigroup), commutativity, and idempotency. So, technically, you have to go further than semigroups with commutative semigroups.
[/EDIT]
At Plum we made heavy use of CRDT's on-top of an eventually consistent whole-state replication technology called Plumtree, no relation to the company's name, just coincidence, within the internet-connected dimmer we built called the Lightpad.
The primary design goal was to enable advanced configurations: Arbitrary groups of Lightpads could be controlled from any other one on the same network by binding it to a specific finger gesture. We also did not want this configuration to be dependent upon any single master, they had to be truly master-less.
In-general, Eventual Consistency is pretty scary when you're deploying it to embedded internet-connected devices where you can't immediately shell onto the device like you can with a server in a data center. I have lots of stories here I'll write about sometime of the pains I encountered while developing this solution and how I ended up on CRDTs.
Strong Eventual Consistency (typically, CRDTs deployed on-top of an eventually consistent substrate), though, is very safe and provides a lot of guarantees about the concurrent behavior of the algorithm as it acts upon your data.
Using a CRDT (specifically the ORSWOT - Observed Removal Set Without Tombstones) eliminated the majority of our pains and concerns with data-loss, conflicting concurrent writes, and integrity in the face of flaky consumer-grade home networks (Plumtree is highly-available and partition tolerant - perfect for our needs where we have to be able to continue moving forward even if a majority or more of the Lightpads go offline and never come back up).
It's always good to hear about IOT companies using CRDTs because it seems like the perfect
fit for it, yet you don't hear very much about it. Also interesting hearing you mention you
were using plumtree for your replication backend.
To elaborate more on your point, CRDTs are super helpful while trying to distribute a lot of devices,
but they only get you so far. While building Lasp[0][1], a distributed data-flow language using CRDTs,
we found out a lot of scalability problems with naive usage of CRDTs. We are aiming to reach 10k-20k nodes
in the near future, so we are focusing a lot on reducing network usage.
State-based CRDTs send their full state to all their peers, which works ok when your states are small, but
they introduce lots of overhead in any other case. Operation-based CRDTs only send the actions performed
on it (add 1, or rmv 2, for example), but these are not idempotent and require a lot of guarantees from
the underlying distribution backend.
We are focusing on using Delta CRDTs, that combine the low network usage of operation-based structures,
with the idempotence of state-based approaches.
Using plumtree for your backend makes it resilient to network failures, but using the default full
membership protocol makes it almost unusable when you're dealing with a big number of nodes. Using
alternative protocols like Hyparview greatly reduces the number of protocol messages in your network.
Finally, since Lasp is a data-flow language, we are applying control-flow analysis to select and remove
unused or intermediate values in the program, thus also reducing network usage.
Yeah I'm excited to see what you guys come up with on delta CRDTs.
Our node cluster sizes are fairly manageable so we can tolerate the inefficiency right now but it will be nice to have a solution in-place when we want to optimize.
Can you point me to a nice tutorial or a foundational paper? Your comment just opened up a lot of questions, my distributed systems course did not talk about any of this :)
I would checkout Christopher Meiklejohn's website, he's a researching working on this stuff and doing some cool OSS work: https://christophermeiklejohn.com/
Note that CRDT stands for Conflict free Replicated Datatype and there are a few sub-acronym expansions: Convergent Replicated Datatype and Commutative Replicated Datatype.
[EDIT] I wanted to add that CRDTs were mentioned here because in order for them to provide the kind of guarantees they do they must satisfy the law of associativity (or, be a semigroup), commutativity, and idempotency. So, technically, you have to go further than semigroups with commutative semigroups. [/EDIT]
At Plum we made heavy use of CRDT's on-top of an eventually consistent whole-state replication technology called Plumtree, no relation to the company's name, just coincidence, within the internet-connected dimmer we built called the Lightpad.
The primary design goal was to enable advanced configurations: Arbitrary groups of Lightpads could be controlled from any other one on the same network by binding it to a specific finger gesture. We also did not want this configuration to be dependent upon any single master, they had to be truly master-less.
In-general, Eventual Consistency is pretty scary when you're deploying it to embedded internet-connected devices where you can't immediately shell onto the device like you can with a server in a data center. I have lots of stories here I'll write about sometime of the pains I encountered while developing this solution and how I ended up on CRDTs.
Strong Eventual Consistency (typically, CRDTs deployed on-top of an eventually consistent substrate), though, is very safe and provides a lot of guarantees about the concurrent behavior of the algorithm as it acts upon your data.
Using a CRDT (specifically the ORSWOT - Observed Removal Set Without Tombstones) eliminated the majority of our pains and concerns with data-loss, conflicting concurrent writes, and integrity in the face of flaky consumer-grade home networks (Plumtree is highly-available and partition tolerant - perfect for our needs where we have to be able to continue moving forward even if a majority or more of the Lightpads go offline and never come back up).