Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Running Cosmos-Reason2-2B on 8GB Jetson Orin Nano (huggingface.co)
1 point by Embedl-Wilhelm 67 days ago | hide | past | favorite | 2 comments


NVIDIA released Cosmos-Reason2 last month, targeting physical AI workloads (video reasoning, robotics planning, event detection), with official support for DGX Spark, H100, GB200 and Jetson AGX Thor.

We quantized the 2B model to W4A16 and optimized it further to run across the full Jetson lineup, including the most constrained Orin Nano 8GB Super (8 GB).

Model, setup instructions, and benchmarks: https://huggingface.co/embedl/Cosmos-Reason2-2B-W4A16

Interested in feedback from others deploying VLMs on Jetson, especially around serving stacks (vLLM vs TensorRT-LLM vs other approaches) and practical bottlenecks!


Quickstart (vLLM Jetson container):

-gpu-memory-utilization and --max-num-seqs should be adapted to system specifications (i.e., available RAM).

docker run --rm -it \ --network host \ --shm-size=8g \ --ulimit memlock=-1 \ --ulimit stack=67108864 \ --runtime=nvidia \ --name=vllm-serve \ ghcr.io/nvidia-ai-iot/vllm:latest-jetson-orin \ vllm serve "embedl/Cosmos-Reason2-2B-W4A16" \ --max-model-len 8192 \ --gpu-memory-utilization 0.75 \ --max-num-seqs 2




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: