Search
Full Text Search
Full-text search refers to matching some or all of a text query with documents stored in a database. Compared to traditional database queries, full-text search provides results even in case of partial matches. It allows building more flexible search interfaces for users, thus enabling them to find accurate results more quickly.
Prefix and infix searching: This allows you to search for parts of words, like finding "apple" by searching "app" or finding "highlight" by searching "light."
Morphology processing: This includes stemming and lemmatization. Stemming finds different forms of a word, like "running" "and ran," all stemming from "run." Lemmatization finds the base form of a word, so "running" becomes "run."
Fuzzy searching: This helps find results even when the query contains typos.
Exact result count: Full-text search provides the total number of documents that match the search criteria.
Vector Search

Unlike traditional keyword-based search, vector search retrieves results by analyzing the similarity between vectors
Vectorization: Machine learning (ML) models, such as sentence transformers or OpenAI embeddings, convert the search query text and the documents into numerical representations. These representations are called vectors or embeddings.
Embedding space: These vectors are plotted in a multi-dimensional space, where the distance between vectors reflects the semantic similarity between the original pieces of text. Documents with similar meanings have vectors that are closer together in this space.
Nearest neighbors: The search engine uses algorithms like k-nearest neighbors (KNN) to find the vectors in the embedding space that are closest to the query vector. These closest vectors represent the documents that are most semantically similar to the search query.
Comparison
Feature
Full-Text Search
Vector Search
Data Type
Structured or semi-structured text
Unstructured or high-dimensional data
Query Type
Keyword or phrase matching
Similarity matching
Primary Use Case
Exact matches, metadata filtering
Semantic understanding, recommendations
Technology Examples
PostgreSQL full-text search, Elasticsearch
pgvectorscale, FAISS
Full text search cannot understand the relationship and semantic
Vector sarch cannot identify the exact keyword precisely , some of the precise meaning of text may be missed
Hybrid Search
Hybrid search combines the strengths of full-text search and vector search. It builds upon the accessible, search-as-you-type experience of full-text search and integrates the enhanced discovery capabilities that AI search enables.
from langchain.vectorstores import LanceDB
import lancedb
from langchain.retrievers import BM25Retriever, EnsembleRetriever
from langchain.schema import Document
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.document_loaders import PyPDFLoader
# Initialize embeddings
embedding = OpenAIEmbeddings()
# load single pdf
loader = PyPDFLoader("/content/Food_and_Nutrition.pdf")
pages = loader.load_and_split()
# Initialize the BM25 retriever
bm25_retriever = BM25Retriever.from_documents(pages)
bm25_retriever.k = 2 # Retrieve top 2 results
db = lancedb.connect('/tmp/lancedb')
table = db.create_table("pandas_docs", data=[
{"vector": embedding.embed_query("Hello World"), "text": "Hello World", "id": "1"}
], mode="overwrite")
# Initialize LanceDB retriever
docsearch = LanceDB.from_documents(pages, embedding, connection=table)
retriever_lancedb = docsearch.as_retriever(search_kwargs={"k": 2})
# Initialize the ensemble retriever
ensemble_retriever = EnsembleRetriever(retrievers=[bm25_retriever, retriever_lancedb],
weights=[0.4, 0.6])
# Example customer query
query = "which food needed for building strong bones and teeth ?
which Vitamin & minerals importat for this?"
# Retrieve relevant documents/products
docs = ensemble_retriever.get_relevant_documents(query)
from langchain.chat_models import ChatOpenAI
llm = ChatOpenAI(openai_api_key="sk-yourapikey")
#if you want to use opensource models such as lama,mistral check this
# https://github.com/lancedb/vectordb-recipes/blob/main/tutorials/chatbot_using_Llama2_&_lanceDB
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=ensemble_retriever)
query = "what nutrition needed for pregnant women "
qa.run(query)
Maximal Marginal Relevance (MMR)
Let’s say your final keyPhrases are ranked like
Good Product, Great Product, Nice Product, Excellent Product, Easy Install, Nice UI, Light weight etc.
But there is an issue with this approach, all the phrases likegood product, nice product, excellent product
are similar and define the same property of the product and are ranked higher. Suppose we have a space to show just 5 keyPhrases, in that case, we don't want to show all these similar phrases.For the traditional semantic search, the highest similarity, the higheest ranking, which may cause the similar result
The idea behind using MMR is that it tries to reduce redundancy and increase diversity in the result and is used in text summarization. MMR selects the phrase in the final keyphrases list according to a combined criterion of query relevance and novelty of information.
Last updated
Was this helpful?