Most of his examples have an identity, giving a monoid.
* (string, concat): identity ''
* numbers with addition/multiplication: identity 0/1
* numbers with min/max: Numbers in computers are
finite, so this is quite useful. stdint.h defines
INT32_MAX, INT32_MIN, etc.
* booleans with && / || : true / false
* function composition: identity function f(x) = x
* frequency map: {} or {A: 0, B:0, ...} depending on
how you define it
Comparators: Is this a bug in the post? It doesn't seem to follow the pattern because the arity of the functions is different. In a semigroup or monoid all the elements are supposed to be drawn from the same set.
Other examples I can think of:
* Python's dict with respect to member .update()
(if you think of it as an operator rather than
mutating member). Identity is {}
* Unix Pipes: identity 'cat' (note that the shell
supports point-free programming)
As you say, most semigroups are just monoids where you forget the identity. Here's an MO thread on examples of monoids that aren't trivial restrictions:
If I take everyone's meanings correctly, what you describe (a monoid with the identity removed) is different from what cottonseed (https://news.ycombinator.com/item?id=12110936) describes, which is a monoid with the identity forgotten. That is, it's still there, but we don't require it to be there axiomatically.
Note that every monoid is a semigroup (because it satisfies a stronger collection of axioms), but not every monoid with identity removed is a semigroup (because it's possible to combine two non-identity elements to get the identity element): consider, for example, the integers without 0.
Comparators are not composed by ordinary function composition, but by pointwise composition in the domain. If you can read Haskell
instance Monoid a => Monoid (b -> a) where
f <> g = \x -> f x <> g x
Here they type a is the set `{ EQ, LESS, GREATER }` which is in turn composed in a left-biased way with `EQ` as an identity.
If you have two comparators, `lastNameComparator` and `firstNameComparator` and compose them into a comparator that compares first by firstName, then by lastName, surely that is a closed operation. It is also associative, and comparators do have an identity, so, yes they are also monoids.
For the comparators I think that what he means is that for any given set X the set of comparators on X forms a semigroup where the composition is "compare using the first comparator and then the second". Of course this too is a monoid with the identity being the comparator that says all things are equal.
I'm not sure how useful this is in practice. For example, addition in actual computer languages isn't necessarily associative and compilers often can't treat them as associative.
Floating point numbers added in the wrong order can result in catastrophic cancellation. So for example, as far as JavaScript is concerned, numbers aren't associative.
Integers may overflow. Twos-complement addition is associative, but only because if you get the same bogus answer in the case of overflow. If you actually detect overflow, you may get overflow or not depending on the order in which you add the numbers.
Oftentimes we ignore these issues, but it seems like if you're talking about taking advantage of associative rules to reorder calculations, you have to think about it. Thinking in terms of semigroups instead of concrete datatypes seems like a likely source of bugs when your model makes assumptions that aren't actually true.
At a certain level you are correct. A compiler certainly should not be reordering FP calculations, for instance. However, at the level these abstractions are typically used they are absolutely fine. For example, if you want to add up the number of hits your web site has received you will be absolutely fine to use a 64-bit integer and treat addition as associative. There is no possibility of overflow. Similarly, if you train a machine learning model using gradient descent you can compute your gradients in parallel and then sum them, taking advantage of associativity, and this will work out fine. (See Hogwild! for some proofs that #YOLO is an acceptable strategy for SGD https://people.eecs.berkeley.edu/~brecht/papers/hogwildTR.pd...)
So, basically, semigroups and other abstractions are a-ok where they are typically used.
Integer addition is associative, even for fixed-bitwidth integers (which constitute "the integers modulo 2^n" for some bitwidth n; e.g. 8-bit bytes under addition form the group of integers modulo 2^8 (i.e., integers modulo 256)).
Note that both the full integers as well as the integers modulo n are groups under addition. As groups are more restrictive than semigroups, all groups are also semigroups (groups are semigroups that also have an identity element and where every element is invertible).
Integer multiplication also forms a semigroup (and in fact a monoid, due to identity), but not a full group due to zero not having an inverse. So, it's usually said that "the non-zero integers" form a group.
> Integer multiplication also forms a semigroup (and in fact a monoid, due to identity), but not a full group due to zero not having an inverse.
Wait, what? Very few integers have multiplicative inverses (in the integers). If you meant to speak of modular inverses, then even that is false for composite moduli like 2^n (with n > 1), which may be why im4w1 (https://news.ycombinator.com/item?id=12113934) asked if you meant to work modulo a prime.
I really wish the applications section had more detail or perhaps an "editorial". I don't really follow the consequence of a frequency map being associative.
Yeah, the application section is pretty poor. The number one large-scale practical application of semigroups (and, more often, their extension the monoid) is in parallel and concurrent systems. Associativity says that order of operations doesn't matter, which in turn implies you can distribute computations however you like across multiple machines and still get the correct result. This is the big idea leveraged in systems like Summingbird and other big data frameworks (e.g. https://arxiv.org/abs/1304.7544) and in CRDTs.
Note that CRDT stands for Conflict free Replicated Datatype and there are a few sub-acronym expansions: Convergent Replicated Datatype and Commutative Replicated Datatype.
[EDIT]
I wanted to add that CRDTs were mentioned here because in order for them to provide the kind of guarantees they do they must satisfy the law of associativity (or, be a semigroup), commutativity, and idempotency. So, technically, you have to go further than semigroups with commutative semigroups.
[/EDIT]
At Plum we made heavy use of CRDT's on-top of an eventually consistent whole-state replication technology called Plumtree, no relation to the company's name, just coincidence, within the internet-connected dimmer we built called the Lightpad.
The primary design goal was to enable advanced configurations: Arbitrary groups of Lightpads could be controlled from any other one on the same network by binding it to a specific finger gesture. We also did not want this configuration to be dependent upon any single master, they had to be truly master-less.
In-general, Eventual Consistency is pretty scary when you're deploying it to embedded internet-connected devices where you can't immediately shell onto the device like you can with a server in a data center. I have lots of stories here I'll write about sometime of the pains I encountered while developing this solution and how I ended up on CRDTs.
Strong Eventual Consistency (typically, CRDTs deployed on-top of an eventually consistent substrate), though, is very safe and provides a lot of guarantees about the concurrent behavior of the algorithm as it acts upon your data.
Using a CRDT (specifically the ORSWOT - Observed Removal Set Without Tombstones) eliminated the majority of our pains and concerns with data-loss, conflicting concurrent writes, and integrity in the face of flaky consumer-grade home networks (Plumtree is highly-available and partition tolerant - perfect for our needs where we have to be able to continue moving forward even if a majority or more of the Lightpads go offline and never come back up).
It's always good to hear about IOT companies using CRDTs because it seems like the perfect
fit for it, yet you don't hear very much about it. Also interesting hearing you mention you
were using plumtree for your replication backend.
To elaborate more on your point, CRDTs are super helpful while trying to distribute a lot of devices,
but they only get you so far. While building Lasp[0][1], a distributed data-flow language using CRDTs,
we found out a lot of scalability problems with naive usage of CRDTs. We are aiming to reach 10k-20k nodes
in the near future, so we are focusing a lot on reducing network usage.
State-based CRDTs send their full state to all their peers, which works ok when your states are small, but
they introduce lots of overhead in any other case. Operation-based CRDTs only send the actions performed
on it (add 1, or rmv 2, for example), but these are not idempotent and require a lot of guarantees from
the underlying distribution backend.
We are focusing on using Delta CRDTs, that combine the low network usage of operation-based structures,
with the idempotence of state-based approaches.
Using plumtree for your backend makes it resilient to network failures, but using the default full
membership protocol makes it almost unusable when you're dealing with a big number of nodes. Using
alternative protocols like Hyparview greatly reduces the number of protocol messages in your network.
Finally, since Lasp is a data-flow language, we are applying control-flow analysis to select and remove
unused or intermediate values in the program, thus also reducing network usage.
Yeah I'm excited to see what you guys come up with on delta CRDTs.
Our node cluster sizes are fairly manageable so we can tolerate the inefficiency right now but it will be nice to have a solution in-place when we want to optimize.
Can you point me to a nice tutorial or a foundational paper? Your comment just opened up a lot of questions, my distributed systems course did not talk about any of this :)
I would checkout Christopher Meiklejohn's website, he's a researching working on this stuff and doing some cool OSS work: https://christophermeiklejohn.com/
I think summingbird and scalding introduced, for me, the notion of having functional types that make it easier to combine things together for distributed computing/big data.
I've never read that paper before, it's really good. I like how the author's objective is to formalize map-side aggregations. But I'm wondering if that's a natural extension of map-reduce itself? ie. don't move data if it's not required given that's the constraint?
"The form of parallelism induced by the associativity of the semigroup operation is heavily relied upon in the Map-Reduce programming model. We’ll expound on this in more detail in a later article, as this is more easily developed algebraically after defining the concept of a Monoid."
noelwalsh also included a bit about CRDTs, which he didn't expand upon, but is pretty different from "the Map-Reduce programming model" in a few ways. I expanded on the CRDT comment above.
Might be slightly off topic but in the recently released Glasgow Haskell Compiler, semigroups are now built into the base library. No more dependencies when using it.
No. Guaranteeing typeclass laws like associativity is hard. People have looked into various ways to do that, but haven't found anything completely satisfying. It's either hard for the programmer (dependent typing) or hard to implement (automatic theorem proving). I hope that'll change, but for now Haskell does not have any facilities for specifying or enforcing typeclass laws.
As it is, it works more or less exactly like interfaces from other languages. When you implement Comparable in Java you implicitly expect certain laws (ie that it's a valid ordering), but there is also no way to specify or verify them.
Clueless suggestion: What about the programmer (who implements the semigroup) also providing a proof (that can be automatically checked) that it satisfies associativity? Is that method perhaps already covered by one of the examples you mentioned?
OK. I'm afraid I don't know enough to make the connection. Actually, I never understood the point of dependent types. They seem to do away with the neat order that types introduce. ¯\_(ツ)_/¯
Incidentally the typical mathematical curriculum does not even bother with monoids and semigroups when it comes to the initial introduction. The introductory material jumps right into the definition of a group and then mentions on the side that removing a few key pieces leaves you with a monoid and a semigroup. Same with fields, rings, rigs, etc. The most constrained piece is introduced first and then dropping requirements leaves you with something less constrained for which you can prove fewer theorems.
I learned groups and fields first and then learned about the more relaxed versions. I'm not sure which approach is better though. Groups are certainly more fun to play with but monoids probably show up in more contexts.
http://mathworld.wolfram.com/Monoid.html
Most of his examples have an identity, giving a monoid.
Comparators: Is this a bug in the post? It doesn't seem to follow the pattern because the arity of the functions is different. In a semigroup or monoid all the elements are supposed to be drawn from the same set.Other examples I can think of: