x-technology

XTechnology - Technology is beautiful, let's discover it together!

Building a RAG System in Node.js: Vector Databases, Embeddings & Chunking

Large Language Models (LLMs) are powerful, but they often lack real-time knowledge. Retrieval-Augmented Generation (RAG) bridges this gap by fetching relevant information from external sources before generating responses. In this workshop, we’ll explore how to build an efficient RAG pipeline in Node.js using RSS feeds as a data source. We’ll compare different vector databases (FAISS, pgvector, Elasticsearch), embedding methods, and testing strategies. We’ll also cover the crucial role of chunking—splitting and structuring data effectively for better retrieval performance.

Prerequisites

Code

Agenda

Introduction

Alex Korzhikov

alex korzhikov photo

Software Engineer, Netherlands

My primary interest is self development and craftsmanship. I enjoy exploring technologies, coding open source and enterprise projects, teaching, speaking and writing about programming - JavaScript, Node.js, TypeScript, Go, Java, Docker, Kubernetes, JSON Schema, DevOps, Web Components, Algorithms 🎧 ⚽️ 💻 👋 ☕️ 🌊 🎾

Pavlik Kiselev

Pavlik

Software Engineer, Netherlands

JavaScript developer with full-stack experience and frontend passion. He happily works at ING in a Fraud Prevention department, where helps to protect the finances of the ING customers.

About Everything

Generative Artificial Intelligence (GenAI)

Use of Artificial Intelligence to produce human consumable content in text, image, audio, video format.

Language Model (LM)

Language Models are Machine Learning models trained on natural language resources and aim predict next word based on given context.

LM relevant tasks - summarization, Q&A, classification, and more.

Large Language Model (LLM)

Artificial Intelligence is powered by Large Language Models - models trained on tons of sources and materials, having billions of parameters.

llm as blackbox

Retrieval Augmented Generation (RAG)

RAG is a method that combines information retrieval with language model generation.

alt text

Retrieval = Search

Generation = LLM

Use Cases:

async function rag(q) {
  const results = await search(q)
  const prompt = makePrompt(q, results)
  const answer = await llm(prompt)
  return answer
}
Why do we need RAG? - Additional, specific knowledge
- Reduce hallucinations
- Control system costs
Why can't we just put all context in our LLM and ask it about? - Cost
- Slow
- Size
- Noize
What do we need to build a RAG? - Application (backend)
- Search system (vector databases)
- LLM (blackbox)

Langchain 🦜️🔗

LangChain is a Python and JavaScript framework that brings flexible abstractions and AI-first toolkit for developers to build with GenAI and integrate your applications with LLMs. It includes components for abstracting and chaining LLM prompts, configure and use vector databases (for semantic search), document loaders and splitters (to analyze documents and learn from them), output parsers, and more.

Setup

Practice #1 - Hello World

ollama.chat({
  messages: [
    { role: "user", content: "What is retrieval-augmented generation?" },
  ],
});

Use Case - Exploring Node.js News

We aim to understand what happened in the Node.js community over the past year. To achieve this, we:

  1. Collect and process news, popular blog posts, and articles
  2. Store the documents in a database
  3. Analyze and query the documents by asking targeted questions

How do we find sources?

How do we scrap articles?

Some scrapping and text quality problems

Articles

Chunking

To chunk or not to chunk?

text splitter

Chunking involves breaking down texts into smaller, manageable pieces called “chunks”. Each chunk becomes a unit of information that is vectorized and stored in a database, fundamentally shaping the efficiency and effectiveness of natural language processing tasks.

How do we split documents?

text splitter example

const { CharacterTextSplitter } = require("@langchain/textsplitters");
const textSplitter = new CharacterTextSplitter({
  chunkSize: 100,
  chunkOverlap: 0,
});
const texts = await textSplitter.splitText(document)

Practice #2 - Chunking

const { RecursiveCharacterTextSplitter } = require("@langchain/textsplitters");
const splitter = new RecursiveCharacterTextSplitter({
  chunkSize: 100,
  chunkOverlap: 0,
});

return await splitter.splitText(text);

Store & Retrieve

what are embedding models

To embed or not?

An embedding is a vector representation of data in embedding space (projecting the high-dimensional space of initial data vectors into a lower-dimensional space).

embeddings vs indexing

Vectors are stored in a database, which compare them as a way to search for data that is similar in meaning (by using dot product or cosine measurement).

embedding space

Can we exchange embedding models with equal vector dimensions? - Nope, also embedding models evolve over time
obvious reaction

Practice #3 - Store & Retrieve

await Chroma.fromTexts(
  texts,
  { id: Array.from({ length: texts.length }, (_, i) => `chunk-${i}`) },
  embeddings,
  { collectionName: "rag_node_workshop_articles" },
);

Reranking

reranking process

Rerankers analyze the initial search output and adjust the ranking to better match user preferences and application requirements

Evaluation

RAG Triad

Evaluate RAG the retriever and generator of a RAG pipeline separately

Retriever

Hyperparameters:

Does your reranker model ranks the retrieved documents in the “correct” order?

Metrics:

Generation

Hyperparameters:

Can we use a smaller, faster, cheaper LLM?

Metrics:

Frameworks

Our choice

Practice #4 - Evaluation

const relevantDocs = await getRelevantDocuments(vectorStore, summary);
const relevance = relevantDocs.map((d) => d.pageContent === chunk);
console.log(hitRate(relevance));

Summary

Feedback

Please share your feedback on the workshop. Thank you and have a great coding!

If you like the workshop, you can become our patron, yay! 🙏

Technologies