Andreessen-Backed Inferact Emerges from Stealth with $150M to Commercialize vLLM AI Inference Engine

Artificial intelligence infrastructure startup Inferact launched with $150 million in seed funding at an $800 million valuation to commercialize vLLM, the open-source AI inference acceleration framework developed at UC Berkeley.

The round was co-led by Andreessen Horowitz and Lightspeed Venture Partners, with participation from Databricks Ventures, UC Berkeley Chancellor’s Fund, Sequoia Capital, Altimeter Capital, Redpoint Ventures, and ZhenFund.

Inferact’s founding team includes Databricks co-founder and UC Berkeley computer science professor Ion Stoica, who directs the university’s Sky Computing Lab where vLLM originated in 2023. The project has since attracted contributions from more than 2,000 developers globally.

CEO Simon Mo stated, “We see a future where serving AI becomes effortless. Today, deploying a frontier model at scale requires a dedicated infrastructure team. Tomorrow, it should be as simple as spinning up a serverless database.”

vLLM optimizes AI model inference – the production deployment phase where models generate responses, through innovations like PagedAttention memory management, which eliminates GPU memory fragmentation. The technology enables models to generate multiple tokens simultaneously rather than one at a time, reducing loading times for users.

Co-founder Woosuk Kwon wrote in the announcement, “The complexity doesn’t disappear; it gets absorbed into the infrastructure we’re building,” describing Inferact’s strategy to provide enterprise-grade managed services atop the free open-source core.

Major technology companies including Amazon Web Services, Meta, Google, and Character.AI already deploy vLLM in production environments. The framework currently supports more than 500 model architectures and runs on more than 200 accelerators.]]

Inferact plans to launch a paid serverless version of vLLM that automates administrative tasks like infrastructure provisioning and software updates. The company will continue supporting vLLM as an independent open-source project while building proprietary enterprise features.

The funding follows a broader investment trend toward AI inference infrastructure as industry focus shifts from model training to cost-efficient production deployment at scale.

Shobhit Kalra

Shobhit Kalra is the Chief Sub Editor at Tea4Tech, with over 12 years of experience across digital media, digital marketing, and health technology. He is responsible for editorial review, content structuring, and quality control of articles covering software, SaaS products, and developments across the technology ecosystem. || At Tea4Tech, Shobhit oversees content accuracy, clarity, and adherence to editorial standards, ensuring published stories meet the newsroom’s guidelines for originality, sourcing, and consistency.