Introduction
The Tokyo Institute of Technology has created and maintains a collection of exercises on NLP called "NLP 100 Exercise".
In this article, I will find sample answers to "Chapter 1: Warm-up".
00. Reversed string
Obtain the string that arranges letters of the string “stressed” in reverse order (tail to head).
print('stressed'[::-1])
desserts
01. “schooled”
Obtain the string that concatenates the 1st, 3rd, 5th, and 7th letters in the string “schooled”.
print('schooled'[0:7:2])
shoe
02. “shoe” + “cold” = “schooled”
Obtain the string “schooled” by concatenating the letters in “shoe” and “cold” one after the other from head to tail.
char_list = [char1 + char2 for char1, char2 in zip('shoe', 'cold')]
print(''.join(char_list))
schooled
03. Pi
Split the sentence “Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics”. into words, and create a list whose element presents the number of alphabetical letters in the corresponding word.
sentence = 'Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics.'
result = [len(word.strip(',.')) for word in sentence.split()]
print(result)
[3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8, 9, 7, 9]
04. Atomic symbols
Split the sentence “Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can”. into words, and extract the first letter from the 1st, 5th, 6th, 7th, 8th, 9th, 15th, 16th, 19th words and the first two letters from the other words. Create an associative array (dictionary object or mapping object) that maps from the extracted string to the position (offset in the sentence) of the corresponding word.
from pprint import pprint
sentence = 'Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can.'
result = ({word[0] if i in {1, 5, 6, 7, 8, 9, 15, 16, 19} else word[:2]: i for i, word in enumerate(sentence.split(), 1)})
pprint(sorted(result.items(), key=lambda x:x[1]))
[('H', 1),
('He', 2),
('Li', 3),
('Be', 4),
('B', 5),
('C', 6),
('N', 7),
('O', 8),
('F', 9),
('Ne', 10),
('Na', 11),
('Mi', 12),
('Al', 13),
('Si', 14),
('P', 15),
('S', 16),
('Cl', 17),
('Ar', 18),
('K', 19),
('Ca', 20)]
05. n-gram
Implement a function that obtains n-grams from a given sequence object (e.g., string and list). Use this function to obtain word bi-grams and letter bi-grams from the sentence “I am an NLPer”
def ngram(n, lst):
# [str[i:] for i in range(2)] -> ['I am an NLPer', ' am an NLPer']
# zip(*[str[i:] for i in range(2)]) -> zip('I am an NLPer', ' am an NLPer')
return list(zip(*[lst[i:] for i in range(n)]))
str = 'I am an NLPer'
words_bi_gram = ngram(2, str.split())
chars_bi_gram = ngram(2, str)
print('word bi-gram:', words_bi_gram)
print('char bi-gram:', chars_bi_gram)
word bi-gram: [('am', 'an'), ('I', 'am'), ('an', 'NLPer')]
char bi-gram: [('I', ' '), (' ', 'N'), ('e', 'r'), ('a', 'm'), (' ', 'a'), ('n', ' '), ('L', 'P'), ('m', ' '), ('P', 'e'), ('N', 'L'), ('a', 'n')]
06. Set
Let the sets of letter bi-grams from the words “paraparaparadise” and “paragraph”
and X , respectively. Obtain the union, intersection, difference of the two sets. In addition, check whether the bigram “se” is included in the sets Y and X Y
def ngram(n, lst):
return list(zip(*[lst[i:] for i in range(n)]))
str1 = 'paraparaparadise'
str2 = 'paragraph'
X = set(ngram(2, str1))
Y = set(ngram(2, str2))
union = X | Y
intersection = X & Y
difference = X - Y
print('X:', X)
print('Y:', Y)
print('union:', union)
print('intersection:', intersection)
print('difference:', difference)
print('X included in se:', {('s', 'e')} <= X)
print('Y included in se:', {('s', 'e')} <= Y)
X: {('a', 'r'), ('a', 'p'), ('s', 'e'), ('p', 'a'), ('r', 'a'), ('i', 's'), ('d', 'i'), ('a', 'd')}
Y: {('p', 'h'), ('a', 'r'), ('a', 'p'), ('p', 'a'), ('g', 'r'), ('r', 'a'), ('a', 'g')}
union: {('p', 'h'), ('a', 'r'), ('a', 'p'), ('s', 'e'), ('p', 'a'), ('g', 'r'), ('r', 'a'), ('i', 's'), ('a', 'g'), ('d', 'i'), ('a', 'd')}
intersection: {('p', 'a'), ('r', 'a'), ('a', 'r'), ('a', 'p')}
difference: {('d', 'i'), ('i', 's'), ('a', 'd'), ('s', 'e')}
X included in se: True
Y included in se: False
07. Template-based sentence generation
Implement a function that receives arguments, x, y, and z and returns a string “{y} is {z} at {x}”, where “{x}”, “{y}”, and “{z}” denote the values of x, y, and z, respectively. In addition, confirm the return string by giving the arguments x=12, y="temperature", and z=22.4.
def create_sentence(x,y,z):
return f'{str(y)} is {str(z)} at {str(x)}'
print(create_sentence(12, 'temperature', 22.4))
Temperature is 22.4 at 12
08. Cipher text
Implement a function cipher that converts a given string with the specification:
- Every alphabetical lowercase letter c is converted to a letter whose ASCII code is (219 - [the ASCII code of c])
- Keep other letters unchanged
Use this function to cipher and decipher an English message.
def cipher(str):
rep = [chr(219 - ord(x)) if x.islower() else x for x in str]
return ''.join(rep)
sentence = 'I am titanium.'
sentence = cipher(sentence)
print('encrypt:', sentence)
sentence = cipher(sentence)
print('decrypt:', sentence)
encrypt: I zn grgzmrfn.
decrypt: I am titanium.
09. Typoglycemia
Write a program with the specification:
- Receive a word sequence separated by space
- For each word in the sequence:
- If the word is no longer than four letters, keep the word unchanged
- Otherwise,
- Keep the first and last letters unchanged
- Shuffle other letters in other positions (in the middle of the word)
Observe the result by giving a sentence, e.g., “I couldn’t believe that I could actually understand what I was reading : the phenomenal power of the human mind “.
import random
sentence = 'I couldn’t believe that I could actually understand what I was reading : the phenomenal power of the human mind .'
print(' '.join(i[0] + ''.join(random.sample(i[1:-1], len(i[1:-1]))) + i[-1] if len(i) > 4 else i for i in sentence.split()
))
I clnuo’dt beevile that I culod aactully unrsntdead what I was rdianeg : the pahenomenl peowr of the human mind .
References