> One cool thing we learned was how quickly you can Hibernate and wake up x86 EC2 instances. That ended up being a game-changer for us.
Could you talk more about it? Are you keeping a cache of hibernated EC2 instances and re-launching them per request? What sort of relaunch latency profile do you see as function of instance memory size?
A specific Ec2 instance is always serving one customer max. And builds are highly cachable, so the ec2 instance has an EBS volume on it with a big cache that Earthly uses to prevent rework.
That instance is just sitting around waiting for gRPC requests that tell it to run another build. If it's idle for 30 minutes, it hibernates and then if another call comes back in a gRPC proxy wakes it back up.
I don't know if the wake up time increases per the size of the cache in memory, I can check with Brandon but its much faster starting up an instance cold, mainly because buildkit is designed for throughput and not a quick startup.
how do you prevent build 1 from modifying the VM in a way that impacts build 2?
if a build 1 happens to install a specific libc, do you un-install that libc before running build 2?
if you just say that stuff is the responsibility of the user, okay, but then the artifacts produced by this system aren't deterministic, which seems like a problem?
Good question, so the builds are specified in Earthfiles, which are run by our buildkit backend.
Buildkit runs the builds in runC, so basically containers are used to keep things deterministic, but the buildkit backend isn't shared, each is in own EC2 instance.
Could you talk more about it? Are you keeping a cache of hibernated EC2 instances and re-launching them per request? What sort of relaunch latency profile do you see as function of instance memory size?