What is NLP
Natural language is the language that humans use in everyday life, such as spoken and written language. Natural language contains ambiguities and overlaps in meaning that can be interpreted differently depending on the context, as in the following examples.
- "dog ate a bone" and "bone dog a ate"
- The same word appears with the same frequency in both sentences, but depending on the position of the word, the first sentence is given meaning and the other sentence is not.
- "Jack saw Ben with a telescope on a mountain."
- Is it Jack or Ben with a telescope?
- Who is on the mountain?
- "I went to the bank."
- The word "bank" can refer not only to a financial institution but also to a river bank.
Natural Language Processing (NLP) is a series of computer processes that analyze ambiguous and complex words used by humans.
NLP terminology
Key terms in NLP are listed in the table below.
Term | Meaning | Example |
---|---|---|
Corpus | Set of documents | Sentences on all pages of Wikipedia |
Document | Document | Sentence from the "word2vec" page on Wikipedia |
Sentence | Sentence | First sentence of Document(Word2vec is a group of related models that are used to produce word embeddings. ) |
Phrase | Phase | First clause of Sentence(Word2vec is a group of related models ) |
Token | Word | First word of Phase (Word2vec ) |
Character | Character | First character of Token(W ) |
Vocabulary | Vocabulary | A collection of unique Tokens that appear in a Corpus |
Process of NLP
NLP is processed based on four main processes:
- Morphological analysis/ Lexical analysis
- Syntax analysis
- Semantic analysis
- Pragmatic analysis
Morphological analysis
Morphological analysis is the process of breaking down a sentence into its smallest elements (morphemes) that have meaning and assigning information such as parts of speech. This process allows the meaning of each morpheme in a sentence to be extracted as data.
For example, the sentence "Jack saw Ben with a telescope on a mountain.
Original | Morphological analysis |
---|---|
Jack saw Ben with a telescope on a mountain.」 | Jack (noun) | saw (verb) | Ben (noun) | with (preposition) | a (noun) | telescope (noun) | on (preposition) | a (noun) | mountain (noun) |
Syntax analysis
Syntax analysis is the process of clarifying the structure of a sentence based on morphological analysis of language elements.
After morphological analysis of "Jack saw Ben with a telescope on a mountain," the Syntax Analysis result is as follows.
Jack saw
|Ben with a telescope on a mountain
Jack saw
|Ben with a telescope
|on a mountain
Jack saw Ben with a telescope
|on a mountain
In terms of syntax, both sentences are correct.
Semantic analysis
Semantic analysis determines the relationship between each word based on syntactic analysis. Suppose we have the following sentences.
Green
|shining
|aurora
|and
|stars
|are
|beautiful
In the above statement, it is immediately understood that the aurora borealis glows green. It can also be interpreted that not only the aurora borealis but also the stars glow green.
Checking the relationship between each word while pulling up a dictionary in the semantic analysis reveals that while the northern lights glow green, stars are rarely described as glowing green. Therefore, the AI can understand that in the above sentence, the only thing that glows green is the aurora borealis.
Pragmatic analysis
Pragmatic analysis is the process of analyzing the relationship between sentences by performing morphological and semantic analysis on multiple sentences. However, this process requires machines to learn knowledge from various domains and is still a developing field.
Examples of NLP applications
NLP has the following applications:
- Text mining
- SNS analysis
- Survey analysis
- Dialogue systems
- Siri
- Alexa
- Google Home
- Machine translation
- DeepL
- Google Translate
- Search Engine
- Yahoo
- Spam detection
- Document summary
References