High-throughput open-source LLM serving library using PagedAttention for efficient inference.
Get the latest AI resources and insights delivered to your inbox