Skip to main content

LLM Serving Pack

Add LLM serving to your Nebari cluster so your team can run large language models behind a managed API. The pack handles model downloading, serving, routing, and per-model access control, with rate limiting and token counting included so usage stays accountable.

For install and configuration, see the upstream README.