Tensormesh Raises $20M from Investors Including AMD Ventures, CoreWeave, NVentures, Launches Tensormesh Inference to Fix AI’s Most Expensive Problem

Tensormesh, the company pioneering caching-accelerated inference optimization for enterprise AI, today announced $20 million in new funding from investors including AMD Ventures, CoreWeave, NVentures (NVIDIA’s venture capital arm), Valley Capital Partners, and Laude Ventures, extending its seed round and bringing its total funding to $24.5 million. Alongside the funding, Tensormesh is announcing the general availability of Tensormesh Inference, its flagship SaaS inference platform, which fixes enterprises’ most expensive AI problem: recomputing what GPUs have already processed.

This press release features multimedia. View the full release here: https://www.businesswire.com/news/home/20260527958597/en/

The Tensormesh team

The Tensormesh team

When every inference request recomputes the same inputs from scratch, it burns GPU cycles and drives up costs regardless of whether that work has been done before. Tensormesh solves this by storing and reusing computed results through KV caching, eliminating redundant computation and delivering up to 10x reductions in latency and GPU spend.

For developers building AI applications, this problem compounds with every request. Each call to a model reprocesses the full context window, including system prompts, conversation history, and tool definitions, from scratch and at full cost. In multi-step agentic workflows, that redundant computation adds up fast. KV caching eliminates this, serving repeated context instantly rather than reprocessing it, cutting token costs, speeding time to first token, and keeping API bills predictable as agents scale.

The strategic participation of AMD Ventures, CoreWeave, and NVIDIA’s NVentures reflects a shared conviction that KV caching is a foundational layer of the AI infrastructure stack, and that Tensormesh is the first company to bring it to market as a fully productized, enterprise-grade platform.

“Tensormesh offers a new vision on the significance of the intermediate data that LLMs generate when processing prompts. Behind the term KV cache is a whole concept of AI interpretation of the question it is asked. This makes it a whole new class of data and a category Tensormesh is uniquely positioned to define. We’re excited to keep building,” said Junchen Jiang, co-founder and CEO of Tensormesh.

“As enterprises scale AI workloads, maximizing every GPU cycle is critical. Software innovations like KV caching are a powerful complement to raw accelerator performance. Paired with AMD Instinct™ GPUs, Tensormesh’s platform can help customers drive value from their infrastructure investments,” said Ramine Roane, corporate vice president, AI at AMD.

“Tensormesh is working to solve infrastructure challenges that will ultimately impact the economics and scalability of AI. Their work advancing KV caching can help make inference faster and more efficient at scale, and it reflects exactly the kind of foundational innovation CoreWeave Ventures is committed to backing,” said Brannin McBee, co-founder and chief development officer at CoreWeave.

“KV caching represents one of the most consequential and underexplored opportunities in AI infrastructure today. Tensormesh has built the only platform that makes this technology production-ready for the enterprise, and we believe it will become a critical part of how every serious AI deployment is run,” said Steve O’Hara, founder and managing partner at Valley Capital Partners and a Tensormesh board member.

Tensormesh Inference Is Now Generally Available

Tensormesh Inference is the first inference platform built from the ground up on caching-accelerated technology. It comes from the team that built the leading open-source KV caching project, LMCache, and is now funded by the investors who build GPUs and AI clouds.

Since emerging from stealth, Tensormesh has worked closely with enterprise customers to harden the platform for production. Today, Tensormesh Inference is available to any team that wants to run AI inference more efficiently, without rebuilding infrastructure from scratch.

“As AI workloads grow, intelligent reuse of cached state has become one of the most powerful levers for performance and cost efficiency,” said Leno Park, vice president of Nand product planning at Samsung Electronics. “Tensormesh’s LMCache is built to take full advantage of next-generation storage, and we look forward to our continued collaboration to push the boundaries of what’s possible across the AI stack.”

Cached Input Tokens at Zero Cost

Tensormesh Inference introduces a pricing model that is a direct reflection of how the technology works. When a request is served from the KV cache, the cached input tokens cost nothing. Across all of Tensormesh’s serverless deployments, cached input tokens are billed at $0, not as a promotional rate, but as a permanent part of how Tensormesh prices its platform.

With Tensormesh Inference, enterprises benefit from:

  • Cost savings: With Tensormesh Inference’s intelligent KV cache reuse, enterprises can dramatically reduce their GPU spend by up to 10x.

  • Transparency: Most inference providers cache tokens on the backend without disclosing it, meaning enterprises have no visibility into what’s being cached, at what rate, or whether any savings are being passed through. Tensormesh provides cost savings, KV cache usage, and token-level cost breakdowns, providing complete transparency.

  • Greater freedom: Built on open source LMCache architecture, the Tensormesh Inference platform gives users more control over their experiences compared to other platforms.

Tensormesh also gives teams direct control over how much cache backend storage is allocated to their deployments and surfaces the metrics they need to understand exactly how that storage is performing. Cache hit rate, KV cache usage ratio, and token-level cost breakdowns are all visible in real time, giving teams the information to continuously tune their cache configuration and maximize the portion of requests served from storage (at zero cost) rather than being recomputed. As cache hit rates grow, savings compound directly against a team’s inference bill, with well-optimized deployments regularly achieving cache hit rates above 70%.

“Inference economics will define what is possible for the next generation of AI products. Tensormesh is tackling one of the most important challenges in AI infrastructure: helping companies reduce GPU spend without requiring changes to application code. The combination of meaningful cost savings and simple deployment is rare, it positions Tensormesh to become a critical layer in the AI infrastructure stack,” said Hui Zhang, CTO and co-founder of Conviva and advisor to Tensormesh.

Deployment Modes Built for AI Adoption

Tensormesh Inference is available across two deployment options.

  • Serverless inference provides immediate API access to a curated catalog of frontier models with no infrastructure to provision or manage. The API is fully OpenAI-compatible, requiring no changes to existing tooling or workflows. Teams can go from signup to their first inference request in minutes.

  • Reserved deployments are designed for enterprises running AI at scale that need dedicated capacity, predictable performance, and custom SLA support. The Tensormesh team works directly with each organization to design the right cluster configuration and pricing for their specific workload.

Real-Time Cost Savings Analytics

Tensormesh Inference includes a Cost Savings Dashboard that makes the financial impact of caching visible in real time. Rather than asking teams to trust that optimizations are working, the dashboard shows exactly how much has been saved, calculated from actual usage. It tracks cache hit rate, the ratio of cached to total prompt tokens, and converts that into a dollar figure updated continuously. Teams can view their savings across any time window and watch efficiency improve as their deployments mature.

The platform also provides a full suite of inference performance metrics across every active deployment, including time to first token, inter-token latency, input and output throughput, and GPU compute utilization, all in real time.

Continued Investment in Open Source

This new funding will be used to accelerate product development, expand hardware-level integrations with AMD, CoreWeave, and NVIDIA, and deepen Tensormesh’s contributions to the open-source ecosystem. Tensormesh’s commitment to open source remains unchanged. The company will continue shipping new capabilities through LMCache, the leading open-source KV caching project with over 8,000 GitHub stars and integrations across vLLM, SGLang, TensorRT, llmp-d, NVIDIA Dynamo, AWS SageMaker, and Oracle OCI Data Science.

“What started as a research project around KV caching is becoming a critical part of the AI stack. Tensormesh understood early that enterprises were paying AI systems to recompute the same work again and again, and built foundational infrastructure to eliminate that inefficiency and dramatically improve price-performance. The team has paired deep systems expertise with real open-source credibility to build infrastructure enterprises can actually rely on,” said Pete Sonsini, co-founder and general partner at Laude Ventures.

Getting Started

Tensormesh Inference is available now at tensormesh.ai. Full documentation, pricing, and deployment guides are available at docs.tensormesh.ai.

About Tensormesh

Tensormesh is the leader in caching-accelerated inference optimization for enterprise AI. Founded by faculty, PhD researchers and alumni from the University of Chicago, UC Berkeley, and Carnegie Mellon, and led by Junchen Jiang, University of Chicago faculty member and co-creator of LMCache, Tensormesh builds on years of academic research in distributed systems and AI infrastructure. The company has raised $24.5 million in total funding and is backed by Valley Capital Partners, NVentures, AMD Ventures, CoreWeave, and Laude Ventures.

Media gallery