AI Blog Platform
HomePostsAbout
AboutContactPrivacy PolicyTerms of Service
© 2026 AI Blog Platform. All rights reserved.
How I Built a RAG Chatbot with Vercel AI SDK
AI EngineeringFeatured

How I Built a RAG Chatbot with Vercel AI SDK

This blog post details the creation of a Retrieval-Augmented Generation (RAG) chatbot that answers questions using uploaded documents. The process begins with document uploads that are chunked and transformed into embeddings, which are stored in a PostgreSQL database. When a user asks a question, the system embeds the query, retrieves the most relevant document chunks through cosine similarity, and feeds this context to an OpenAI model to generate accurate responses. Key insights include the importance of optimizing chunk size for context retention and effective prompt design to enhance output reliability, ultimately leading to a chatbot capable of providing precise, document-based answers.

C

Cary Li

February 21, 2026
3 min read
10 views
# AI coding

Test Posts for Cover Image Style


Post 1 — AI Engineering

Title: How I Built a RAG Chatbot with Vercel AI SDK

Content:

In this post, I'll walk through how I built a simple RAG (Retrieval-Augmented Generation) chatbot that can answer questions based on uploaded documents.

What is RAG?

RAG stands for Retrieval-Augmented Generation. Instead of relying purely on the model's training data, it retrieves relevant content from your own documents first, then feeds that context into the model to generate an answer. This makes it much more accurate for domain-specific questions.

Tech Stack

  • Vercel AI SDK for the chat interface and streaming

  • OpenAI text-embedding-3-small for generating embeddings

  • PostgreSQL with pgvector for vector storage and similarity search

  • Next.js as the full-stack framework

How It Works

  1. Upload a document — the text gets chunked and converted into embeddings, then stored in the database.

  2. User asks a question — the question is also embedded and compared against stored chunks using cosine similarity.

  3. Top matching chunks are retrieved and passed as context to the LLM.

  4. The model generates a response grounded in your actual document content.

What I Learned

Getting the chunk size right matters a lot. Too small and you lose context, too large and the retrieval becomes noisy. I settled on roughly 500 tokens per chunk with some overlap.

Prompt design also plays a bigger role than I expected — telling the model to only answer based on the provided context, and to say "I don't know" when the answer isn't there, makes the output much more reliable.


Post 2 — Dev in AI Era

Title: From Cursor to Claude Code: My Vibe Coding Journey

Content:

A year ago I started using Cursor. Today I use Claude Code for almost everything. Here's what changed.

Starting with Cursor

Cursor felt like a supercharged VS Code. Autocomplete that actually understood context, inline chat, codebase-aware suggestions. It was a clear upgrade from GitHub Copilot.

But I was still writing most of the code myself. AI was a tool I reached for when stuck, not something I collaborated with continuously.

The Shift

At some point I stopped thinking of AI as an autocomplete tool and started treating it more like a pair programmer. I'd describe what I wanted, review what came back, give feedback, iterate.

The code quality surprised me. Not perfect, but solid enough that my job became more about direction and review than raw implementation.

Why Claude Code

Claude Code lives in the terminal and works across the whole project — not just the file I have open. I can ask it to find something, refactor across files, or explain why something is broken, and it actually has the full picture.

The bigger shift is that I now spend more time thinking about what to build and less time thinking about how to write it. That's a meaningful change.

Where I Am Now

I still review every change. I still need to understand what the code is doing. But the output per hour has gone up noticeably, and the types of problems I can tackle solo have expanded.

Vibe coding isn't about letting AI do everything. It's about raising your own ceiling.

Comments