Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> The direct corollary is that any successful compromise of the host can give an attacker access to the complete memory of every VM running on that node. Keeping the host secure is therefore critical.

> In that context, hosting a web service that is directly reachable from any guest VM and running it on the secure host side created a significantly larger attack surface than I expected.

That is quite scary



It is kind of a fundamental risk of IMDS, the guest vms often need some metadata about themselves, the host has it. A hardened, network gapped service running host side is acceptable, possibly the best solution. I think the issue is if your IMDS is fat and vulnerable, which this article kind of alludes to.

There’s also the fact that azure’s implementation doesn’t require auth so it’s very vulnerable to SSRF


You could imagine hosting the metadata service somewhere else. After all there is nothing a node knows about a VM that the fabric doesn’t. And things like certificates comes from somewhere anyway, they are not on the node so that service is just cache.


Hosting IMDS on the host side is pretty much the only reasonable way to provide stability guarantees. It should still work even if the network is having issues.

That being said, IMDS on AWS is a dead simple key-value storage. A competent developer should be able to write it in a memory-safe language in a way that can't be easily exploited.


“No, there is another”—Yoda, The Empire Strikes Back :)

What you describe carries the risk that secrets end up in crash dumps and be exfiltrated.

Imagine an attacker owns the host to some extent and can do that. The data is then on disk first, then stored somewhere else.

You probably need per-tenant/per-VM encryption in your cache, since you can never protect against someone with elevated privileges from crashing or dumping your process, memory-safe or not.

Then someone can try to DoS you, etc.

Finally it’s not good practice to mix tenant’s secrets in hostile multi-tenancy environments, so you probably need a cache per VM in separate processes…

IMHO, an alternative is to keep the VM's private data inside the VMs, not on the host.

Then the real wtf is the unsecured HTTP endpoint, an open invitation for “explorations” of the host (or the SoC when they get there) on Azure.

eBPF+signing agent helps legitimate requests but does nothing against attacks on the server itself; say, you send broken requests hoping to hit a bug. It does not matter if they are signed or not.

This is a path to own the host, an unnecessary risk with too many moving parts.

Many VM escapes abuse a device driver, and I trust the kernel guys who write them a lot more than the people who write hostable web servers running inproc on the host.

Removing these was a subject of intense discussions (and pushbacks from the owning teams) but without leaking any secret I can tell you that a lot of people didn’t like the idea of a customer-facing web server on the nodes.


Of course, putting the metadata service into its own separate system is better. That's how Amazon does it with the modern AWS. A separate Nitro card handles all the networking and management.

But if you're within the classic hypervisor model, then it doesn't really matter that much. The attack surface of a simple plain HTTP key-value storage is negligible compared to all other privileged code that needs to run on the host.

Sure, each tenant needs to have its own instance of the metadata service, and it should be bound to listen on the tenant-specific interface. AWS also used to set the max TTL on these interface to 1, so the packets would be dropped by routers.


>negligible attack surface of a simple-plain HTTP…

…unless you use a general-purpose web server with its own set of challenges as far as policies and configuration. I’ll leave it there.


Ah yes great point, awesome article by the way —- thought provoking, shocking, really crazy stuff. Hopefully some good comes of it, godspeed.


This is well documented: https://learn.microsoft.com/en-us/azure/virtual-machines/ins...

Why would an Azure customer need to query this service at all? I was not aware this service even exists- because I never needed anything like it. AFAI can tell, this service tells services running on the VM what SKU the VM is. But how is this useful to the service? Any Azure users could tell how they use IMDS? Thanks!


> Why would an Azure customer need to query this service at all? I was not aware this service even exists- because I never needed anything like it.

The "metadata service" is hardly unique to Azure (both GCP & AWS have an equivalent), and it is what you would query to get API credentials to Azure (/GCP/AWS) service APIs. You can assign a service account² to the VM¹, and the code running there can just auto-obtain short-lived credentials, without you ever having to manage any sort of key material (i.e., there is no bearer token / secret access key / RSA key / etc. that you manage).

I.e., easy, automatic access to whatever other Azure services the workload running on that VM requires.

¹and in the case of GCP, even to a Pod in GKE, and the metadata service is aware of that; for all I know AKS/EKS support this too

²I am using this term generically; each cloud provider calls service accounts something different.


Mainly for getting managed-identity access tokens for Azure APIs. In AWS you can call it to get temporary credentials for the EC2’s attached IAM role. In both cases - you use IMDS to get tokens/creds for identity/access management.

Client libraries usually abstract away the need to call IMDS directly by calling it for you.


Thank you, and everyone else who responded. So then this type of service seems to be used by other cloud providers (AWS). What makes this Azure service so much more insecure than its AWS equivalent?

Thanks again!

[edited phrasing]


Having it running on host (!), and the metadata for all guest VMs stored and managed by the same memory/service (!!), with no clear security boundary (!!!).

It's like storing all your nuke launch codes in the same vault, right in the middle of Washington DC national mall. Things are okay, until they are not okay.


Lovely explanation :)


I use GCP, but it also has the idea of a metadata server. When you use a Google Cloud library in your server code like PubSub or Firestore or GCS or BigQuery, it is automatically authenticated as the service account you assigned to that VM (or K8S deployment).

This is because the metadata server provides an access token for the service account you assigned. Internally, those client libraries automatically retrieve the access token and therefore auth to those services.


There is a bunch of things a VM needs when first starting from a standard image. Think certificates and a few other things.


Managed identity is enabled via that endpoint, for example.


We run a significant amount of stuff on spot-instances (AKS nodes) and use the service detect, monitor and gracefully handle the imminent shutdown on the Kubernetes side.

https://learn.microsoft.com/en-us/azure/virtual-machines/lin...


To have a new vm configure itself at boot


What happens when someone asks an AI model to fuzz test that...


Scary is the understatement of the day. I can't imagine the environment where someone think that architecture is a good idea.


And yet, there we are.


Instead of zero trust, it is 110% trust.


Well I have zero trust in Microsoft, so they've achieved that at least.


Like, what did the OP expect?




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: