Algebraic patterns – Semigroup

chubot · on July 17, 2016

I think of these programming constructs as Monoids rather than semigroups. A monoid is just a semigroup with an identity element:

http://mathworld.wolfram.com/Monoid.html

Most of his examples have an identity, giving a monoid.

    * (string, concat): identity ''
    * numbers with addition/multiplication: identity 0/1
    * numbers with min/max: Numbers in computers are 
      finite, so this is quite useful.  stdint.h defines 
      INT32_MAX, INT32_MIN, etc.
    * booleans with && / || : true / false
    * function composition: identity function f(x) = x
    * frequency map: {} or {A: 0, B:0, ...} depending on 
      how you define it

Comparators: Is this a bug in the post? It doesn't seem to follow the pattern because the arity of the functions is different. In a semigroup or monoid all the elements are supposed to be drawn from the same set.

Other examples I can think of:

    * Python's dict with respect to member .update() 
     (if you think of it as an operator rather than
     mutating member).  Identity is {}
    * Unix Pipes: identity 'cat' (note that the shell
      supports point-free programming)

cottonseed · on July 17, 2016

As you say, most semigroups are just monoids where you forget the identity. Here's an MO thread on examples of monoids that aren't trivial restrictions:

http://math.stackexchange.com/questions/460/are-there-any-in...

moomin · on July 17, 2016

Even with these examples, the absence of an identity can be useful. e.g. Non-empty lists form a semigroup.

JadeNB · on July 18, 2016

> e.g. Non-empty lists form a semigroup.

If I take everyone's meanings correctly, what you describe (a monoid with the identity removed) is different from what cottonseed (https://news.ycombinator.com/item?id=12110936) describes, which is a monoid with the identity forgotten. That is, it's still there, but we don't require it to be there axiomatically.

Note that every monoid is a semigroup (because it satisfies a stronger collection of axioms), but not every monoid with identity removed is a semigroup (because it's possible to combine two non-identity elements to get the identity element): consider, for example, the integers without 0.

alipang · on July 22, 2016

Comparators are not composed by ordinary function composition, but by pointwise composition in the domain. If you can read Haskell

    instance Monoid a => Monoid (b -> a) where
       f <> g = \x -> f x <> g x

Here they type a is the set `{ EQ, LESS, GREATER }` which is in turn composed in a left-biased way with `EQ` as an identity.

If you have two comparators, `lastNameComparator` and `firstNameComparator` and compose them into a comparator that compares first by firstName, then by lastName, surely that is a closed operation. It is also associative, and comparators do have an identity, so, yes they are also monoids.

OscarCunningham · on July 17, 2016

For the comparators I think that what he means is that for any given set X the set of comparators on X forms a semigroup where the composition is "compare using the first comparator and then the second". Of course this too is a monoid with the identity being the comparator that says all things are equal.

skybrian · on July 17, 2016

I'm not sure how useful this is in practice. For example, addition in actual computer languages isn't necessarily associative and compilers often can't treat them as associative.

Floating point numbers added in the wrong order can result in catastrophic cancellation. So for example, as far as JavaScript is concerned, numbers aren't associative.

Integers may overflow. Twos-complement addition is associative, but only because if you get the same bogus answer in the case of overflow. If you actually detect overflow, you may get overflow or not depending on the order in which you add the numbers.

Oftentimes we ignore these issues, but it seems like if you're talking about taking advantage of associative rules to reorder calculations, you have to think about it. Thinking in terms of semigroups instead of concrete datatypes seems like a likely source of bugs when your model makes assumptions that aren't actually true.

noelwelsh · on July 17, 2016

At a certain level you are correct. A compiler certainly should not be reordering FP calculations, for instance. However, at the level these abstractions are typically used they are absolutely fine. For example, if you want to add up the number of hits your web site has received you will be absolutely fine to use a 64-bit integer and treat addition as associative. There is no possibility of overflow. Similarly, if you train a machine learning model using gradient descent you can compute your gradients in parallel and then sum them, taking advantage of associativity, and this will work out fine. (See Hogwild! for some proofs that #YOLO is an acceptable strategy for SGD https://people.eecs.berkeley.edu/~brecht/papers/hogwildTR.pd...)

So, basically, semigroups and other abstractions are a-ok where they are typically used.

GFK_of_xmaspast · on July 17, 2016

Is unsigned arithmetic non-associative?

groovy2shoes · on July 18, 2016

Integer addition is associative, even for fixed-bitwidth integers (which constitute "the integers modulo 2^n" for some bitwidth n; e.g. 8-bit bytes under addition form the group of integers modulo 2^8 (i.e., integers modulo 256)).

Note that both the full integers as well as the integers modulo n are groups under addition. As groups are more restrictive than semigroups, all groups are also semigroups (groups are semigroups that also have an identity element and where every element is invertible).

Integer multiplication also forms a semigroup (and in fact a monoid, due to identity), but not a full group due to zero not having an inverse. So, it's usually said that "the non-zero integers" form a group.

JadeNB · on July 18, 2016

> Integer multiplication also forms a semigroup (and in fact a monoid, due to identity), but not a full group due to zero not having an inverse.

Wait, what? Very few integers have multiplicative inverses (in the integers). If you meant to speak of modular inverses, then even that is false for composite moduli like 2^n (with n > 1), which may be why im4w1 (https://news.ycombinator.com/item?id=12113934) asked if you meant to work modulo a prime.

groovy2shoes · on July 19, 2016

Ah, you're right. I don't know what I was thinking. Thanks for the correction!

im4w1l · on July 18, 2016

You mean integer multiplication in Z_p?

groovy2shoes · on July 18, 2016

Well, Z_n in this case, but yes.

vikiomega9 · on July 17, 2016

I really wish the applications section had more detail or perhaps an "editorial". I don't really follow the consequence of a frequency map being associative.

noelwelsh · on July 17, 2016

Yeah, the application section is pretty poor. The number one large-scale practical application of semigroups (and, more often, their extension the monoid) is in parallel and concurrent systems. Associativity says that order of operations doesn't matter, which in turn implies you can distribute computations however you like across multiple machines and still get the correct result. This is the big idea leveraged in systems like Summingbird and other big data frameworks (e.g. https://arxiv.org/abs/1304.7544) and in CRDTs.

Ixiaus · on July 17, 2016

This is a good reply and I'll add to it.

Note that CRDT stands for Conflict free Replicated Datatype and there are a few sub-acronym expansions: Convergent Replicated Datatype and Commutative Replicated Datatype.

[EDIT] I wanted to add that CRDTs were mentioned here because in order for them to provide the kind of guarantees they do they must satisfy the law of associativity (or, be a semigroup), commutativity, and idempotency. So, technically, you have to go further than semigroups with commutative semigroups. [/EDIT]

At Plum we made heavy use of CRDT's on-top of an eventually consistent whole-state replication technology called Plumtree, no relation to the company's name, just coincidence, within the internet-connected dimmer we built called the Lightpad.

The primary design goal was to enable advanced configurations: Arbitrary groups of Lightpads could be controlled from any other one on the same network by binding it to a specific finger gesture. We also did not want this configuration to be dependent upon any single master, they had to be truly master-less.

In-general, Eventual Consistency is pretty scary when you're deploying it to embedded internet-connected devices where you can't immediately shell onto the device like you can with a server in a data center. I have lots of stories here I'll write about sometime of the pains I encountered while developing this solution and how I ended up on CRDTs.

Strong Eventual Consistency (typically, CRDTs deployed on-top of an eventually consistent substrate), though, is very safe and provides a lot of guarantees about the concurrent behavior of the algorithm as it acts upon your data.

Using a CRDT (specifically the ORSWOT - Observed Removal Set Without Tombstones) eliminated the majority of our pains and concerns with data-loss, conflicting concurrent writes, and integrity in the face of flaky consumer-grade home networks (Plumtree is highly-available and partition tolerant - perfect for our needs where we have to be able to continue moving forward even if a majority or more of the Lightpads go offline and never come back up).

ergl · on July 17, 2016

It's always good to hear about IOT companies using CRDTs because it seems like the perfect fit for it, yet you don't hear very much about it. Also interesting hearing you mention you were using plumtree for your replication backend.

To elaborate more on your point, CRDTs are super helpful while trying to distribute a lot of devices, but they only get you so far. While building Lasp[0][1], a distributed data-flow language using CRDTs, we found out a lot of scalability problems with naive usage of CRDTs. We are aiming to reach 10k-20k nodes in the near future, so we are focusing a lot on reducing network usage.

State-based CRDTs send their full state to all their peers, which works ok when your states are small, but they introduce lots of overhead in any other case. Operation-based CRDTs only send the actions performed on it (add 1, or rmv 2, for example), but these are not idempotent and require a lot of guarantees from the underlying distribution backend.

We are focusing on using Delta CRDTs, that combine the low network usage of operation-based structures, with the idempotence of state-based approaches.

Using plumtree for your backend makes it resilient to network failures, but using the default full membership protocol makes it almost unusable when you're dealing with a big number of nodes. Using alternative protocols like Hyparview greatly reduces the number of protocol messages in your network.

Finally, since Lasp is a data-flow language, we are applying control-flow analysis to select and remove unused or intermediate values in the program, thus also reducing network usage.

[0]: http://lasp-lang.org/ [1]: https://github.com/lasp-lang/lasp

Ixiaus · on July 18, 2016

Yeah I'm excited to see what you guys come up with on delta CRDTs.

Our node cluster sizes are fairly manageable so we can tolerate the inefficiency right now but it will be nice to have a solution in-place when we want to optimize.

I did not know about HyParView. Thank you!

vikiomega9 · on July 18, 2016

Can you point me to a nice tutorial or a foundational paper? Your comment just opened up a lot of questions, my distributed systems course did not talk about any of this :)

Ixiaus · on July 18, 2016

I would checkout Christopher Meiklejohn's website, he's a researching working on this stuff and doing some cool OSS work: https://christophermeiklejohn.com/

vikiomega9 · on July 18, 2016

Excellent! Thank you!

vit_tucek · on July 17, 2016

Associativity says that the order of __evaluation__ of operations doesn't matter. You are talking about commutativity.

noelwelsh · on July 17, 2016

I meant associativity, but was not precise in my description. Commutativity is also useful.

vikiomega9 · on July 18, 2016

I think summingbird and scalding introduced, for me, the notion of having functional types that make it easier to combine things together for distributed computing/big data.

I've never read that paper before, it's really good. I like how the author's objective is to formalize map-side aggregations. But I'm wondering if that's a natural extension of map-reduce itself? ie. don't move data if it's not required given that's the constraint?

Do you have more of such papers handy?

deepnet · on July 17, 2016

And yet the article concludes :

"The form of parallelism induced by the associativity of the semigroup operation is heavily relied upon in the Map-Reduce programming model. We’ll expound on this in more detail in a later article, as this is more easily developed algebraically after defining the concept of a Monoid."

noelwelsh · on July 17, 2016

Right. So my HN comment has more information on this than the entire blog post, which is kinda the point.

deepnet · on July 17, 2016

Absolutely fair point and Jeremy Lin's Monoidify paper is intruiging, many thanks.

Although OP, philipnilsson is taking it step by step which I greatly appreciate as well

Ixiaus · on July 17, 2016

noelwalsh also included a bit about CRDTs, which he didn't expand upon, but is pretty different from "the Map-Reduce programming model" in a few ways. I expanded on the CRDT comment above.

kccqzy · on July 17, 2016

Might be slightly off topic but in the recently released Glasgow Haskell Compiler, semigroups are now built into the base library. No more dependencies when using it.

Bromskloss · on July 17, 2016

How does that work in Haskell? Is the user of a semigroup somehow guaranteed that its operation is associative?

tikhonj · on July 17, 2016

No. Guaranteeing typeclass laws like associativity is hard. People have looked into various ways to do that, but haven't found anything completely satisfying. It's either hard for the programmer (dependent typing) or hard to implement (automatic theorem proving). I hope that'll change, but for now Haskell does not have any facilities for specifying or enforcing typeclass laws.

As it is, it works more or less exactly like interfaces from other languages. When you implement Comparable in Java you implicitly expect certain laws (ie that it's a valid ordering), but there is also no way to specify or verify them.

Bromskloss · on July 17, 2016

Clueless suggestion: What about the programmer (who implements the semigroup) also providing a proof (that can be automatically checked) that it satisfies associativity? Is that method perhaps already covered by one of the examples you mentioned?

reuben364 · on July 17, 2016

That is the dependent typed aspect mentioned. This is actually done in Idris with VerifiedSemigroup and other verified instances.

Bromskloss · on July 17, 2016

OK. I'm afraid I don't know enough to make the connection. Actually, I never understood the point of dependent types. They seem to do away with the neat order that types introduce. ¯\_(ツ)_/¯

wereHamster · on July 17, 2016

The person who implements Semigroup for a data type must ensure that the laws hold.

haimez · on July 17, 2016

Which is where property based testing really shines. Still requires the implementer to be a good citizen and write those tests though.

kristianp · on July 18, 2016

AKA QuickCheck. https://en.wikipedia.org/wiki/QuickCheck

_qc3o · on July 17, 2016

Incidentally the typical mathematical curriculum does not even bother with monoids and semigroups when it comes to the initial introduction. The introductory material jumps right into the definition of a group and then mentions on the side that removing a few key pieces leaves you with a monoid and a semigroup. Same with fields, rings, rigs, etc. The most constrained piece is introduced first and then dropping requirements leaves you with something less constrained for which you can prove fewer theorems.

I learned groups and fields first and then learned about the more relaxed versions. I'm not sure which approach is better though. Groups are certainly more fun to play with but monoids probably show up in more contexts.