Evaluated it, and ultimately rejected it for a couple of reasons: - You must pic...

sabraham · on Jan 19, 2016

We use Riemann at Two Sigma to monitor/alert/heal our Mesos cluster [1], precisely because of above reasons to reject.

>- You must pick up Clojure to understand and configure Riemann (we're not a Clojure shop, so this is a non-trivial requirement) >- Config file isn't a config file, it's an executed bit of Clojure code

This is actually great -- static files quickly become their own franken-languages, with code generating config files.

>- Riemann is not a replacement for an alerting mechanism, it's another signal for alerting mechanisms (though since it's Clojure and the configuration file is a Clojure script, you can absolutely hack it into becoming an alerting system) >- Riemann is not a replacement for a trend graphing mechanism.

You probably don't want another alerting mechanism; you probably already have pagerduty or something else -- what you want is a rich way to create the alert.

[1] https://github.com/twosigma/satellite

Moocar · on Jan 19, 2016

> You probably don't want another alerting mechanism; you probably already have pagerduty or something else -- what you want is a rich way to create the alert

This is the heart of why we use Riemann. When we first started using it 2 years ago, we had thousands of different types of error emails per day (due to monitoring thousands of retail stores, all with their quirks). Because Riemann config is just code, we were able to build systems and abstractions on top of it for describing the various error types and their semantics. E.g If 500s are being returned from service A, only alert us if > 1% of those requests failed in the last 2 minutes. You can get these kinds of rules in something like Nagios, but if you want customization, you have to deal with plugins. Here, it's just code. If we don't like it, we change it. The result is that there's no excuse to setup gmail filters. You can ensure that all errors are actionable.

dozzie · on Jan 19, 2016

> Config file isn't a config file, it's an executed bit of Clojure code

For stream processing engines, configuration will be code. Unfortunate, but unavoidable.

> Riemann is not a replacement for an alerting mechanism

> Riemann is not a replacement for a trend graphing mechanism.

Indeed it is not. It's misadvertised as a monitoring solution, while it's a stream processing engine.

What I think of it is that you're supposed build a monitoring system on top of stream processing engine. It's a pity Riemann doesn't allow to subscribe to its streams from the outside, so to add any message destination you need to update its config.

AdamN · on Jan 19, 2016

It seems like having a small tool to turn yaml files into basic clojure code for easy rulesets would be an easy extension. It might encourage Bad behavior and of course it couldn't do everything ... just an idea.

retrogradeorbit · on Jan 20, 2016

That would actually be quite easy to do in clojure without the need for an external tool by writing a clojure macro that reads the other format and emits the s-expressions that represent it.

falcolas · on Jan 19, 2016

> For stream processing engines, configuration will be code. Unfortunate, but unavoidable.

I honestly don't think it's unavoidable, so long as you separate the configuration (i.e. hosts, thresholds, outputs, etc) from your processing logic. Of course, this requires additional development work from within the "configuration" file.

dozzie · on Jan 19, 2016

But for a stream engine the processing logic is configuration.

There aren't many examples of when code is a configuration parameter for service (generic RPC server for sysadmins being another example I've encountered), but there are some.

yid · on Jan 20, 2016

> For stream processing engines, configuration will be code. Unfortunate, but unavoidable.

How so?

Kafka is a stream processing engine that uses plain old Zookeeper data structures for config.

Edit: Kafka also seems to have the missing features you mentioned if Riemann should be taken seriously as a general-purpose stream processing engine.

brian_cloutier · on Jan 20, 2016

I would argue Kafka isn't a stream processing engine so much as a stream shipping engine. Kafka barely looks at the content of your messages.

I'd also argue that Zookeeper nodes are anything but "plain" :)

agentgt · on Jan 19, 2016

That was my problem with Riemann. I love its core but I really want something built on top of it. Basically a Jenkins of monitoring (since clojure is JVM). I contemplated building it (ie taking Jenkins plugin system as inspiration) but it was just way to much work.

rhizome · on Jan 20, 2016

Go any further than that and you have Yahoo Pipes or IFTTT

lmm · on Jan 19, 2016

IME configs often end up being turing-complete; if so, better to have them in a real programming language where you at least have tools available to manage the complexity.

falcolas · on Jan 19, 2016

The problem with configs (and this is with my operations hat on), is that they are rarely as well secured (or reviewed) as regular code, so code based configuration files pose a significant privilege escalation threat on production servers.

With my programmer's hat on, they're also harder to populate programmatically, so I have a hard time justifying their use.

lmm · on Jan 19, 2016

> The problem with configs (and this is with my operations hat on), is that they are rarely as well secured (or reviewed) as regular code, so code based configuration files pose a significant privilege escalation threat on production servers.

Any complex config file runs that kind of risk though, whether it's in a well-known programming language or an ad-hoc DSL. My preferred approach is to include most of the config in the regular code (subject to the normal review/release process), with the only thing on the server being a one-line "which config to use" setting (e.g. dev/stag/prod). Of course that has its own problems.

> With my programmer's hat on, they're also harder to populate programmatically, so I have a hard time justifying their use.

Not at all true in the case of Clojure - it's just S-expressions, very easy to write, parse or modify programatically. I agree that a config structure should have good programmatic access, but to my mind that's an argument for using a language with a good metamodel rather than anything else.

falcolas · on Jan 19, 2016

> Any complex config file runs that kind of risk though, whether it's in a well-known programming language or an ad-hoc DSL.

The major difference is that Clojure (Python, Lua, Perl, et al) gives you all the tools right out of the box, whereas with a DSL you should be severely restricted from doing things like reading/writing to disk, making network calls, or executing other binaries.

Granted, there are possibly ways to break out of the sandbox, but it's the difference between giving the thief a set of master keys and $50 for a U-Haul and making them work to enter every safe you have on the premises.

/me takes off the tinfoil hat

lmm · on Jan 19, 2016

That sounds like a security-through-obscurity approach to me. (No doubt others would call it defense in depth).

jsmthrowaway · on Jan 19, 2016

How is "don't give a config file an arbitrary writable open() call" security by obscurity? What is being hidden? That's not really how that term works. I also don't understand your invocation of defense in depth or the (wrong) comparison you are trying to make. Can you reframe your rebuttal without loaded security terms that don't fit what you're saying?

The point GP is making, and with which I agree, is that executable configurations can be dangerous if not sandboxed and even then still carry an elevated risk versus a parser. We are speaking relatively; it is absolutely still a risk to parse user input as a config, but less so than a full programming environment being immediately available to a malicious config writer.

Stepping back and identifying the malicious vector is worth it here, though, as there's a case to be made that configurations are the domain of administrators and should be secured accordingly via external means. Then the problem is recentered.

xorcist · on Jan 19, 2016

If it's code, the only way to evaluate it is to run it. That makes it very hard to reason about at scale. ("Which URLs go to load balancer x with SSL".) Not a great idea, in my opinion.

sciurus · on Jan 19, 2016

I'd be really hesitant recommending Skyline, since Etsy has declared it and the rest of Kale a failure.

https://vimeo.com/131581331

falcolas · on Jan 19, 2016

We use it (well, a derivative of it) to great success. The trend monitoring has proven invaluable at early detection of problems, at a level where pure thresholds would produce much more noise than signal.

23david · on Jan 19, 2016

What derivative are you using?

falcolas · on Jan 19, 2016

The falcolas derivative. Working on getting it open sourced and released... gotta love bureaucracy.

agentgt · on Jan 19, 2016

We rejected Riemann as well because there was just too much overlap with other tools and the user interface was not very good.

I actually think Clojure is a huge selling point.. seriously you should see the crap that Rackspace has https://www.rackspace.com/knowledge_center/article/alarm-lan... which I'm ashamed to say we use it (the lua monitors though are cool and its free monitoring infrastructure).. and yes its not the same as Riemann as Riemann is not exactly just an alerting tool.

And that is sort of the problem.. Riemann is a tool that does one thing really well but has not that good of a UI.. sadly we want prettier graphs and less granular tool.. a better nagios.

djhworld · on Jan 19, 2016

I spent over a day wrestling to get Skyline to work, but it's so outdated, and all of its dependencies have moved on since it was launched, that it's an absolute nightmare to even get running.

to be fair though it does say it's no longer maintained.

avodonosov · on Jan 19, 2016

> Config file isn't a config file, it's an executed bit of Clojure code

What exactly here is the problem for you? That config is not a config, or that it's specifically Clojure?

pram · on Jan 19, 2016

skyline is abandoned though lol