Retrieval augmented generation (RAG) has gained significant attention in recent years for its ability to enhance AI capabilities. It combines retrieval techniques with design to improve performance and generate more accurate results. This approach supports the best of both worlds:
In this tutorial, we will walk through setting up a RAG AI pipeline in Node.js using LangChain and MongoDB as vector storage.

What is RAG (Retrieval Augmented Generation)?

Retrieval-Augmented Generation is a hybrid approach that pairs a retrieval model, which fetches relevant documents from a vast collection, with a generative AI model, which interprets and produces a coherent response based on the retrieved documents. This method is ideal for tasks where up-to-date and accurate information is essential, like question answering, code generation, and even customer service applications.

RAG meaning - it is a hybrid approach that fuses two AI capabilities:

1. Retrieval-based methods: These involve searching for relevant information from external sources, such as databases or document collections.

2. Generative models: These models generate text based on input prompts, providing coherent and context-aware responses.

How RAG Works:

1. Retrieve: The retrieval component queries a vector database to get the most relevant documents or data points based on the user’s input.
2. Generate: The generative model uses the retrieved documents to create a more nuanced and contextually accurate response.
The below diagram shows RAG architecture.
Image 1
Let’s dive into how RAG operates by using Node.js alongside MongoDB as a vector database, powered by LangChain.

Setting Up RAG in Node.js:

This guide will help you set up a basic RAG pipeline in Node.js using LangChain, MongoDB as a vector store, and a small set of example data from text files.

Prerequisites:

Before starting, ensure you have the following installed:

The below diagram shows what we are going to implement RAG using embed js (Wrapper version of langchain for fast and easy integration), MongoDB as vector store and Query engine API.

Image 2

6 Step-Guide For Implementing RAG In Node.js:

Implementing Retrieval-Augmented Generation (RAG) in Node.js allows developers to integrate advanced retrieval and generative AI techniques seamlessly. Why use RAG? It is crucial—it enhances AI’s ability to deliver accurate, context-aware responses by combining relevant data retrieval with generative capabilities. It is ideal for chatbots, document searches, and personalized content applications.

Pro Tip: Before diving in, ensure your development environment is set up with Node.js and the necessary packages.

Step 1: Initialize Your Project:

mkdir rag-nodejs-app
cd rag-nodejs-app
npm init -y

Step 2: Install Required Packages:

We’ll use Langchain for RAG implementation and MongoDB as our vector store. So it’s necessary to have the proper installation of LangChain and OpenAI Libraries.

npm i @llm-tools/embedjs
npm install @llm-tools/embedjs-mongodb

Step 3: Create A New .ENV file And Add The Required Environment Variables:

Set up a .env file in your project directory to securely store credentials like OPENAI_API_KEY. This file helps manage sensitive data without hardcoding it into your application.

OPENAI_API_KEY = <YOUR_OPEN_AI_API_KEY>

Step 4: Create New Folders In Current Directory to Store Cache and Data Files :

Create directories in your project for storing cached data and other necessary files. This helps organize your project and ensures efficient data management during RAG implementation.

Step 5: Create A New File As An Index.js file & add below content:

Create a new index.js file in your project directory to implement the main application logic. This file will include the configuration for loading files, setting up the cache, connecting to the vector database, and handling search queries.

import "dotenv/config";
import * as path from "node:path";
import { RAGApplicationBuilder, TextLoader } from "@llm-tools/embedjs";
import { LmdbCache } from "@llm-tools/embedjs/cache/lmdb";
import { MongoDb } from "@llm-tools/embedjs/vectorDb/mongodb";
import * as fs from "fs";

import express from "express";
import cors from "cors";

const app = express();
app.use(express.json());
app.use(cors());

const port = 4000;
app.get("/initLoader", async (req, res) => {
  //From sample file add loaders.
  const llmApplication = await new RAGApplicationBuilder()
    .setCache(new LmdbCache({ path: path.resolve("./cache") }))
    .setVectorDb(
      new MongoDb({
        connectionString:
          "MONGODB_CONNECTION_URI",
      })
    )
    .build();

  const folderPath = "./files";
  // Read all files in the folder
  fs.readdir(folderPath, (err, files) => {
    if (err) {
      return console.error(`Unable to scan directory: ${err}`);
    }

    // Loop through the files
    for (const file of files) {
      const filePath = path.join(folderPath, file);

      // Perform an operation on each file, for example, log file name
      console.log(`Processing file: ${filePath}`);

      // You can read the file contents if needed
      fs.readFile(filePath, "utf8", async (err, data) => {
        if (err) {
          console.error(`Error reading file: ${err}`);
        } else {
          console.log(`File content of ${file}`);
          const fileType = getFileExtension(file);
          switch (fileType) {
            case "txt":
              await llmApplication.addLoader(new TextLoader({ text: data }));
              break;
            case "pdf":
              await llmApplication.addLoader(
                new PdfLoader({
                  filePathOrUrl: path.resolve(filePath),
                })
              );
            default:
              break;
          }
        }
      });
    }
  });

  res.send(200);
});

const getFileExtension = (fileName) => {
  return fileName.split(".").pop(); // Returns the last part after the last '.'
};

app.post("/searchQuery", async (req, res) => {
  const { searchText } = req.body;
  console.log("inside add loader Post call", req.body);
  const llmApplication = await new RAGApplicationBuilder()
    .setCache(new LmdbCache({ path: path.resolve("./cache") }))
    .setVectorDb(
      new MongoDb({
        connectionString:
          "MONGODB_CONNECTION_URI",
      })
    )
    .build();

let result = await llmApplication.query(searchText);
console.log(searchText, " ==> ", result.content);


  res.status(200).json({ result: result.content });
});


app.listen(port, () => {
  console.log(`Example app listening on port ${port}`);
});

Explanation Of The Above Code:

1. Importing Dependencies
2. Application Setup
3. API Endpoints
  1. Initializes the RAG application with an LMDB cache for caching embeddings and connects to a MongoDB vector database.
  2. Reads files from the ./files directory, processes each file based on its extension, and adds it to the RAG application as a loader:
    • .txt files are processed using TextLoader.
    • .pdf files are processed using a PdfLoader (though PdfLoader isn’t imported here, assuming it’s defined or available in the project).
  1. Takes a search query (searchText) from the request body, creates a new RAG application instance, and uses the MongoDB vector store and LMDB cache.
  2. Runs a query on the RAG application with the searchText and retrieves results, which are then returned as JSON.
4. Utility Function
5. Server Initialization
At the end of all steps your folder and files structure like this.
Image 3

Step 6: Build the UI and Connect the API

Create a UI using React or another preferred library, and integrate the API query to fetch responses based on your data. This will allow users to interact with the backend and get context-specific answers.

Conclusion

This RAG architecture setup enables your AI applications to reference specific information while still generating human-like, accurate responses. It’s particularly valuable for customer support, content generation, and domain-specific knowledge applications. By combining the retrieval and generation capabilities in Node.js, you’re opening the door to a whole new level of AI-driven applications.

Looking ahead, the future of RAGs and LLMs holds immense potential, with advancements enabling even more personalized and context-aware AI models. Applications of Retrieval Augmented Generation (RAG) are expected to expand across industries such as healthcare, finance, e-commerce, and more, offering smarter, more efficient solutions for real-time data retrieval and intelligent content generation.

FAQs

1. What is Retrieval-Augmented Generation (RAG)?

Retrieval Augmented Generation (RAG) is a technique that combines information retrieval with Gen AI models. It retrieves relevant data from a database or external source and uses this information to generate more accurate, contextually aware responses. RAG in AI enhances the capabilities of AI models by ensuring the responses are not just generated but are informed by specific, up-to-date data, making it ideal for complex applications like customer support or knowledge management.

2. What Are The Benefits of Retrieval-Augmented Generation (RAG)?

RAG enhances AI’s ability to generate more precise, contextually relevant answers by leveraging external data sources. It improves the accuracy of responses by incorporating up-to-date information, reducing irrelevant information. Additionally, RAG minimizes the need to train large language models with vast amounts of data, making it a cost-effective approach. It enables AI systems to be more adaptable and dynamic, especially in real-time applications like content generation and customer support.

3. How Does Retrieval-Augmented Generation (RAG) Work?

RAG works by combining two processes: Retrieval and Generation. First, the system retrieves relevant information from a large database or knowledge source using techniques like search engines or vector databases. Then, a generative AI model, such as a transformer, uses this retrieved data to generate coherent, human-like responses. This process ensures that responses are not only generated from the model’s training but also informed by the most relevant data available.
The key difference between Retrieval Augmented Generation (RAG) and Semantic Search lies in the output. Semantic search focuses on retrieving the most relevant documents or data based on the meaning of the query, whereas RAG not only retrieves the information but also generates a human-like response using that data. RAG combines retrieval with generation, allowing for context-specific, conversational AI, while semantic search is more about finding information rather than generating new content from it.

5. What Are The Diverse Approaches of RAG?

There are several approaches to implementing RAG, primarily based on the retrieval mechanism and generation model.
  • One approach is to use traditional search engines or vector-based retrieval for data access, combined with deep learning-based models for generation.
  • Another approach involves using specialized vector databases, which optimize the retrieval of information.
  • Hybrid methods integrate RAG with other AI techniques, such as reinforcement learning, to further improve accuracy and response generation across diverse use cases.

Radhik Bhojani