Natural Language Processing (NLP) is a field within Artificial Intelligence (AI) that allows machines to parse, understand, and generate human language. This branch of AI can be applied to multiple languages and across many different formats (for example, unstructured documents, audio, etc.).
Considering that the NLP market is anticipated to be worth $13.4 billion in 2020, it is worth delving deeper into this field of AI.
This article seeks to explain first how NLP works, followed by how it is used, and what the future looks like for this exciting area of AI.
The NLP market is anticipated to be worth $13.4 billion
How Does NLP Work?
Most NLP nowadays is delivered using machine learning (ML) techniques. It is, however, worth looking at the more traditional methods first. Traditional techniques reveal the foundation on which NLP is based and is still applied in specific scenarios today.
Traditional NLP attempts to deconstruct sentences into smaller phrases, which in turn are broken down into parts of speech. To illustrate this concept, take the simple sentence:
“The dog saw a man in the park.”
Traditional NLP techniques would syntactically deconstruct the above sentence as follows:
This approach uses a technique called Part of Speech Tagging (POS), which categorizes words into different types (that is, verb, noun, adjective, etc.). Classification in this way is not always straightforward, and other supplementary techniques such as Lemmatization1 and Segmentation2 are used (besides others). Building from this categorization process, the semantics or meaning of the sentence is then derived.
Deriving semantics can be achieved using a number of techniques from linguistic rules or techniques such as Named Entity Recognition (NER). NER involves using a domain model to help filter domain entities that are relevant to the semantic process from the non-relevant ones. NER usually involves an extensive database of domain objects and attributes, although advances have been made with statistical approaches for entity recognition.
While useful for proof of concepts, traditional NLP approaches are very labor-intensive, requiring a lot of developer time and effort. Machine learning has thus evolved as the de facto mechanism for NLP development.
Unlike traditional NLP techniques, machine learning does not need to be explicitly programmed; instead, it is trained on examples (called training data). The learning aspect of machine learning means that the solutions get better over time as they are exposed to more and more examples. In other words, they continue learning and get better as more data becomes available.
1. Returns a word to its root form; for example, “walking” becomes “walk”.
2. A method of further dividing words into smaller functional units.