Natural Language Processing (NLP) is a field concerned with the interaction between human languages and, to some extent, the humans who communicate them. NLP is an interdisciplinary topic that has historically been the domain of artificial intelligence researchers and linguists alike. It evolved from computational linguistics, which uses computer science to comprehend the principles of language. However, instead of developing theoretical frameworks, NLP is an engineering discipline that seeks to develop technology that performs useful tasks. NLP can be divided into two subfields that overlap:
1) Natural Language Understanding (NLU), which focuses on semantic analysis or determining the intended meaning of text
2) Natural Language Generation (NLG), which focuses on text generation by a computer. NLP is distinct from speech recognition, which endeavors to parse spoken language into words, converting sound to text and vice versa, but is frequently used in conjunction with it.
NLP is used for a wide variety of language-related tasks as follows:
- Sentiment analysis is the process of classifying the emotional intent of text.
- Machine translation automates translation between different languages.
- Named entity recognition aims to extract entities in a piece of text into predefined categories such as personal names, organizations, locations, and quantities.
- Spam detection is a prevalent binary classification problem in NLP, where the purpose is to classify emails as either spam or not.
- Information retrieval finds the documents that are most relevant to a query.
- Topic modeling is an unsupervised text mining task that takes a corpus of documents and discovers abstract topics within that corpus.
- Text generation, more formally known as natural language generation (NLG), produces text that’s similar to human-written text. Such models can be fine-tuned to produce text in different genres and formats — including tweets, blogs, and even computer code. Text generation has been performed using Markov processes, LSTMs, BERT, GPT-2, LaMDA, and other approaches. It’s particularly useful for autocomplete and chatbots.
- Part-of-Speech tagging (POS)
- Parsing
- Question Answering
- Summarization
- ...