Beyond Vector Databases: The Case for Local Semantic Caching
Originally published on Medium.com on November 6, 2025.
Read the Medium.com version

When “intelligence” wastes cycles
Most teams building LLM-powered products eventually realize that a large portion of their API costs come not from new insights, but from repeated questions.
A support bot, an internal assistant, or an analytics copilot, all encounter thousands of near-identical queries:
“How do I pass the API key to the local model gateway?”
“Why is the dev database connection timing out?”
“How can I refresh the cache without restarting the service?”
Each of those prompts gets re-tokenized, re-embedded, and re-sent to an LLM even when the model has already answered an equivalent question a minute earlier.
What do we have as a result? Burned tokens, wasted latency, and duplicated reasoning.
Vector databases solved storage, not reuse
The industry's first instinct was to throw vector databases at the problem. They excel at persistent embeddings and semantic retrieval, but they were never built for reuse. What they lack are TTL policies, eviction strategies, and atomic snapshotting of in-flight state. In other words, they store knowledge, not memory.
Traditional vector databases follow a key:value paradigm: they persist embeddings indefinitely so they can be queried later, much like records in a datastore.
A semantic cache, by contrast, treats embeddings as dynamic memory — governed by similarity, expiration, and adaptive retention.
Its goal is not to archive information, but to avoid redundant reasoning across millions of semantically similar requests.
With a semantic cache such as VCAL, cached answers can stay valid for days or weeks, depending on data volatility and TTL settings. This moves caching from short-term repetition avoidance to long-horizon semantic reuse where reasoning itself becomes a reusable resource rather than a recurring cost.
In essence, VCAL bridges the gap between data retrieval and cognitive efficiency, turning past computation into future acceleration.
