Natural Language Processing(NLP)
Introduction
A branch of artificial intelligence (AI) that enables computers to understand, interpret, and generate human language.
Uses techniques from computational linguistics, machine learning, and deep learning to process and analyze text and speech, allowing machines to perform tasks like understanding the meaning of text, translating languages, and generating human-like conversations
Text Preprocessing
To filter out non essential data , such as adjective , a , an , the, ...
Common Practice:
Lowercase , Removing stop words, Regular expression, Lemmatization( ate -> eat, eats -> eat), N-gram (to get the group of word based on N number)
Vectorizing Text
Text vectorization is the broad process of converting words, sentences, or entire documents into numbers that machine learning models can work with. It’s like creating a translation dictionary between human language and computer language.
One of the simplest ways to vectorize text is the Bag-of-Words (BoW) model. The idea is to use a vector to represent the frequency or presence of each word in a document. Imagine taking all the unique words in your dataset
TF-IDF (Term Frequency-Inverse Document Frequency) improves on this by weighting words based on how important they are in a document compared to a collection of documents.
Topic Modelling

An unsupervised machine learning technique used in natural language processing (NLP) to discover abstract "topics" or semantic themes within a large collection of documents, such as articles or social media posts
Actually using the model to understand the nature of word, e.g: Latent Semantic Analysis (LSA) or Latent Dirichlet allocation (LDA)
Last updated
Was this helpful?