The dedicated server deployments I worked on at smallish software companies 10+ years ago wound up being really annoying. I enjoy sysadmin type stuff and this idea has tempted me, but I think it’s a false economy in most cases.
The incremental cost of being on a cloud is totally worth it to me to have managed databases, automatic snapshots, hosted load balancers, plug and play block storage etc.
I want to worry about whether our product is good, not about middle of the night hardware failures, spinning up a new server and having it not work right because there was no incentive to use configuration management with a single box, having a single runaway task cause OOM or full disk and break everything instead of just that task’s VM, fear of restarting a machine that has been up 1000 days etc.
100%. There are definitely cases where managing your own dedicated server infra (not hardware, just the software part) makes sense.
For me, the allure of the cloud is not some FAANG scalability fad when its not required, but automatic OS patching, automatic load balancing, NAT services, centralized, analyzable logs, database services that I don't have to manage, out of the box rolling upgrades to services, managing crypto-keys and secrets, and of course, object storage. (These are all the things we use in our startup)
I'd go on a limb and say dedicated servers may be very viable/cost effective when we reach a certain scale, instead of the other way round. For the moment, the cloud expenses are worth it for my startup.
The key is to use some kind of config management, even for a single box. The problem you need to solve is basically to separate your configurations from the system default and make it easy to bring the system into a known good state.
I do think there's something in there about the immediacy of a replacement if a currently-live box falls over. Rough rule of thumb says it's hours for a physical box, minutes for a VM, seconds for a k8s container, a hopefully smaller number of seconds for a serverless cold start. Each of those has ergonomics different enough that it might matter even with config management that rhymes across the options.
How so? You should still be doing those things with cloud deployment anyway. The alternative is (the pretty terribly named) "click ops" where devops folks have to fiddle with some cloud UI settings to make production changes or go through wizards to attempt to replicate an environment.
If you are doing your environment config as code it ultimately shouldn't matter if your target is a dedicated server, a vm or some cloud specific setup configured via an api.
But most cloud deployments have config management too. Using Ansible/Terraform/Kubernetes on bare metal isn't much different than using Ansible/Terraform/Kubernetes in the cloud.
yes, or Puppet, Aviary, Shell, CFEngine, Salt, Capistrano, etc...
Doesn't really matter which one, the point is that you have one that can build you a new box or bring a misbehaving box back to a know good state by simply running a command.
Aaand that's why k8s seems like a good step in the right direction. It has all - okay, absolutely not all, but a good number of the important ones - of these concepts "built-in".
Sure, the actual pile of Go(o) that drives it can be improved (and it is indeed improving).
That said, the hard truth is that your approach is the correct one, almost all businesses (startups!) overbuild instead of focusing on providing value, identifying the actual niche, etc. (Yes, there's also the demand side of this, as the VC money inflated startups overbuild they want to depend on 3rd parties that can take the load, scale with them, SLAs and whatnot must be advertised.)
> k8s seems like a good step in the right direction
Kubernetes is fantastic! Whenever I see a potentially competing business start using Kubernetes, I am immediately relieved, as I know I don't have to worry about them anymore. They are about to disappear in a pile of unsolvable technical problems, trying to fix issues that can't be traced down in a tech stack of unbelievable complexity and working around limitations imposed by a system designed for businesses with traffic two-three orders of magnitude larger. Also, their COGS will be way higher than mine, which means they will need to price their product higher, unless they are just burning through VC money (in which case they are a flash in the pan).
k8s is almost the polar opposite of this idea for most companies. At minimum, a half dozen daemons spread across multiple hosts for.. what? Making the easy things built-in and everything else much more complex than it needs to be?
(I haven't used it in a few years, when I tried to deploy OpenStack with it, so if this is outdated, please, correct me!)
Ansible is imperative, it can work toward a static state, and that's it. If it runs into a problem it throws up its SSH connection, and cries tons of python errors, and gives up forever. Even with its inventory it's far from declarative.
k8s is a bunch of control loops that try to make progress toward their declared state. Failure is not a problem, it'll retry. It'll constantly check its own state, it has a nice API to report it, and thanks to a lot of reified concepts it's harder to have strange clashes between deployed components. (Whereas integrating multiple playbooks with Ansible is ... not trivial.)
Yes, ... and? Those are much more "standardized" than whatever else any team cooks up. (And k8s along with Go is steadily improving, so I don't see this as "let's use WordPress because its the platform that has the most answers on StackOverflow".)
And even if k8s puts on too many legacy-ness, there are upcoming slimmer manifestations of the core ideas. (Eg. https://github.com/aurae-runtime/aurae )
You need to solve a bunch of SRE/DevOps problem, fearless deploys, rollbacks, canary rollout, blue/green deployment, HA/LB, backups, dev environments, supporting multiple teams, yadda-yadda.
So of course you can implement a minimal complexity solution or you can use something "off the shelf".
k8s is complexity and some of it is definitely unneeded for your situation, but it's also rather general, flexible, well supported, etc.
> you’re suggesting that people use a “simpler” non “standard” implementation?
What I suggested is that if k8s the project/software gets too big to fail, then folks can switch to an alternative. Luckily the k8s concepts and API is open source, so they can be implemented in other projects, and I gave such an example to illustrate that picking k8s is not an Oracle-like vendor lock-in.
I think there's a lot of room to bring the best of both worlds together. Some are moving on this already, for example Hetzner Cloud load balancers can also have dedicated servers as targets.
If you've got your configuration properly automated, nothing prevent you from redeploying your bare-metal nodes every weeks. The opposite can actually also be true, it may become risky at some point to reboot a VM with a high uptime.
I'm completely clueless about any server administration beyond a LAMP stack on a single machine serving a few dozen people
I always hear about all the great stuff you get practically for free from cloud providers. Is that stuff actually that easy to set up and use? Any time I tried to set up my LAMP stack on a cloud service it was such a confusing and frightening process that I ended up giving up. I'm wondering if I just need to push a little harder and I'll get to Cloud Heaven
It's a mixed bag. It's more standardised, so once you know how to set up one app, you can mostly just repeat it multiple times as needed. Even for custom things, I can't remember the last time I started a CloudFormation stack from scratch - it's mostly copy&customise.
Being able to say "I want a clustered MySQL here with this kind of specs" is much better (time-wise) than doing it on your own. The updates are also nicely wrapped up in the system, so I'm just saying "apply the latest update" rather than manually ensuring failovers/restarts happen in the right order.
So you pay to simplify things, but the gain from that simplification only kicks in with larger projects. If you have something in a single server that you can restart while your users get an error page, it may not be worth it.
The big problem for server operations is not the damned servers but all of the infrastructure you need to keep them running, like power cooling network/storage and internet connectivity.
The cloud is not easy but damn trying to get cooling and power efficiency of an small server room anywhere near the efficiency levels most big data-center publish is next to impossible as is multi vendor internet connectivity.
With the cloud all of that kind of goes away as it's managed by whatever data-center operator that cloud is running on but what people forget is that that is also true for old fashioned colocation services which is often offering a better cost/value then cloud.
And while it's definitely harder to manage stuff like AWS or Azure because it bleeds a lot of abstractions small scall vpc providers hide from you or that you dont really get with a single home server, it's not hard on the scale of having to run a couple of racks worth of vmware servers with SAN based storage.
I think the problem is configuration. Every thing that has its own configuration is basically a new language you must be able to speak perfectly. Configuration languages are not statically typed so when you make an error it typically does not tell you you made a mistake, and where, and thy it is a mistake.
With Cloud stuff you have more configuration to do because it is about configuring virtual servers etc. Instead of carrying the PC in a box to your room you must "configure it" to make it available.
Laravel Forge + Digital Ocean (or Hetzner, your own machine, Vultr ...) is a pretty good solution. You can up & downgrade as you see fit or spin up a new server in a few min (any specs you need, from scratch or using an image of one of your existing machines, it installs what you need and you can of course add whatever is still missing).
DO databases have backups you can configure to your liking, store them on DO Spaces (like S3). DB user management is easy. There's also cache servers for Redis.
You can add a load balancer and connect it to your various web servers.
I think it took me about 30 min to setup 2x web servers, a DB server, cache server, load balancer, a storage server and connect them all as needed using a few simple forms. Can't really beat that.
By your own estimate -- how confident do you feel that these servers and services are secure? "Setting up" web servers to perform their function is rather easy, in my experience. Ensuring those servers can withstand standard-issue hacking attempts, not so much, especially within just 30 minutes.
The added value of these types of service, I think, is that they're fairly well set up with their provisioning script. You're paying for this service after all and if it appears they can be easily compromised "by default" then ... there would be many problems.
If you have any more info or opinions then please do share.
I'm no expert on Linux security but what I've seen when provisioning a VPS from anybody is a system with the base OS installed and that's it. SSH on port 22. Root account active. Accounts not needing a private key to login. No firewall. Known vulnerabilities in base packages unpatched.
Lots of articles around the internet about hardening a Linux server, the ones I've tried take a bit more than 30 min to follow the steps, a lot longer if I'm trying to actually learn and understand what each thing is doing, why it's important, what the underlying vulnerability is, and how I might need to customize some settings for my particular use case.
As someone who also runs a Laravel site on DO (but doesn't use Forge so can't comment on whether it handles any additional security configurations) buying a cheap DO droplet is very much handling your own infrastructure, it comes with reasonable enough defaults for most people out of the box but if you're expecting any decent amount of traffic it definitely needs to be hardened a little more. Also other administrative tasks like adding swap space if you're like me and trying to get as much performance as possible from as little spend as possible.
a bash script that auto installs everything will do. It has to be maintained though. You also lock down and harden the server. Running a stable OS helps.
I've run a few debian servers for years with little issues. Set up auto-update (very easy) for security updates. There's really little to go wrong.
I'm sure you can find example setup scripts online (configure autoupdates, firewall, applications, etc.), should be a matter of running 'curl $URL' and then 'chmod +x $FILE' and 'bash $FILE'. I didn't need configuration management (I do use my provider's backup service which is important I guess).
Totally, though I think in a fast paced/under-resourced environment incentives matter, and ephemeral cloud vms incentive admins to use configuration management, whereas it's easy to take a shortcut in the moment and quickly manually install something/edit config on a rarely-changing dedicated host.
Obviously the same can be said for long running VMs, and this can be solved by having a disciplined team, but I think it's generally more likely in an environment with a single long running dedicated machine.
This here is the trick. I like to do blue/green deployments. This workflow forces you to script (and implicitly test ) the full server+application deployment. With everything scripted, it just becomes a matter of deciding to rebuild a new machine every week/month/n-deploys which makes you comfortable. Plus, you get a hot spare for free.
> The incremental cost of being on a cloud is totally worth it to me to have managed databases, automatic snapshots, hosted load balancers, plug and play block storage etc.
> With booked daily backup or the backup included in the type of server, all data is backed up daily and retained for a maximum of 14 days. Recovery of backups (Restore) is possible via the konsoleH administration interface.
But i get the impression that the databases on managed servers are intended for use by apps running on that server, so there isn't really a concept of failover.
EC2 or on-premise you need monitoring to detect issues before they become issues.
A single drive on a single server failing should never cause a production outage.
A lot of configuration issues can be tracked down to self-contained deployments. Does that font file or JRE really need to be installed on the whole server or can you bundle it into your deployment.
Our deployments use on-premise and EC2 targets. The deployment script isn't different, only the IP for the host is.
Now, I will say if I can use S3 for something I 100% will. There is not an on-premise alternative for it with the same feature set.
HackerNews went down because two SSDs in a mirror went down at the same time. This was due to a hours on SSD bug. While Raid-Z2 works, and provides great reliability, two-servers replicated, both with raid-z2 and a nightly offsite backup and as long as you are testing your backups it's a solid plan.
What about simply running k8s on metal? You wouldn’t be able to auto-scale (unless you automated buying and selling the hardware provision from the DS) but it would be nice.
What PaaS providers / platforms do you like? I want to move some personal shell scripted boxes to PaaS, but dropped the idea with the decline of Heroku. Tried Dokku and struggled, but it seems like I should give it another try.
I'd love if you jumped into our Discord/Slack and brought up some of the issues you were seeing so we can at least make the experience better for others using Dokku. Feel free to hit me up there (my nick is `savant`).
Thank you for the kind words and I'll check out your Discord when I next take a pass at it.
Let me preface and say that I'm an application dev with only a working knowledge of Docker. I'm not super skilled at infra and the application I struggled with has peculiar deployment parameters: It's a Python app that at build-time parses several gigs of static HTML crawls to populate a Postgres database that's static after being built. A Flask web app then serves against that database. The HTML parsing evolves fast and so the populated DB data should be bundled as part of the application image(s).
IIRC, I struggled with structuring Dockerfiles when the DB wasn't persistent but instead just another transient part of the app, but it seemed surmountable. The bigger issue seemed to be how to avoid pulling gigs of rarely changed data from S3 for each build when ideally it'd be cached, especially in a way that behaved sanely across DigitalOcean and my local environment. I presume the right Docker image layer caching would address the issue, but I pretty rapidly reached the end of my knowledge and patience.
Dokku's DX does seem great for people doing normal things. :)
I’m in a weird position where I work with both extremely large enterprises and very small startups and not much in between. The enterprises in my area lean towards Azure so it’s still my cloud provider of choice. I haven’t had occasion to look for PaaS providers outside of the major cloud vendors yet. On the startup side, most of these companies want to be cloud first so it’s much easier to start them the right way and avoid the legacy mess that most enterprises deal with today.
Anything of note between the two which you found favorable? I am early-days into designing a Dokku backed system. Dokku has a lot going for it that makes it appealing to me, but happy to learn what I am missing.
The only problem with this is I don’t want something running on my own server that I barely understand. I’d rather use say fly.io that way if there is a problem I can get support from them.
The incremental cost of being on a cloud is totally worth it to me to have managed databases, automatic snapshots, hosted load balancers, plug and play block storage etc.
I want to worry about whether our product is good, not about middle of the night hardware failures, spinning up a new server and having it not work right because there was no incentive to use configuration management with a single box, having a single runaway task cause OOM or full disk and break everything instead of just that task’s VM, fear of restarting a machine that has been up 1000 days etc.