Engineering Apr 22, 2026 7 min read

RAG without the hype: what actually works

Field notes from twelve production RAG deployments.

SelfAiWizard Engineering

Author

RAG without the hype: what actually works

The RAG conversation has gotten a lot of vendor pitches and very few field reports. Here is what we have actually seen work, and what has consistently disappointed.

What works: chunking by semantic boundaries (not character count), storing the original document path next to each chunk for citation, and re-ranking after vector search using a small cross-encoder.

What doesn't: massive context windows as a substitute for retrieval, the largest model you can afford instead of a focused 8B with great retrieval, and per-query embedding refresh.

We almost never see a RAG project fail because the retrieval was bad. We see them fail because the chunking strategy was wrong, the metadata was lost, or the team underestimated the maintenance cost of keeping the index current. Pick a chunking strategy that matches the document genre (long-form policy text gets bigger chunks; FAQ snippets get smaller ones) and budget for nightly re-indexing from day one.

Tagged RAGVector searchField notes

RAG without the hype: what actually works

More posts

Why teams are self-hosting AI in 2026

Choosing the right open-source model for your server

n8n vs Temporal vs Airflow for AI pipelines