2023-02-03

NLP 100 Exercise ch1：Warm-up

Machine Learning

NLP

Introduction

The Tokyo Institute of Technology has created and maintains a collection of exercises on NLP called "NLP 100 Exercise".

https://nlp100.github.io/en/ch01.html

In this article, I will find sample answers to "Chapter 1: Warm-up".

00. Reversed string

Obtain the string that arranges letters of the string “stressed” in reverse order (tail to head).

print('stressed'[::-1])

desserts

01. “schooled”

Obtain the string that concatenates the 1st, 3rd, 5th, and 7th letters in the string “schooled”.

print('schooled'[0:7:2])

shoe

02. “shoe” + “cold” = “schooled”

Obtain the string “schooled” by concatenating the letters in “shoe” and “cold” one after the other from head to tail.

char_list = [char1 + char2 for char1, char2 in zip('shoe', 'cold')]
print(''.join(char_list))

schooled

03. Pi

Split the sentence “Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics”. into words, and create a list whose element presents the number of alphabetical letters in the corresponding word.

sentence = 'Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics.'
result = [len(word.strip(',.')) for word in sentence.split()]
print(result)

[3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8, 9, 7, 9]

04. Atomic symbols

Split the sentence “Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can”. into words, and extract the first letter from the 1st, 5th, 6th, 7th, 8th, 9th, 15th, 16th, 19th words and the first two letters from the other words. Create an associative array (dictionary object or mapping object) that maps from the extracted string to the position (offset in the sentence) of the corresponding word.

from pprint import pprint

sentence = 'Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can.'
result = ({word[0] if i in {1, 5, 6, 7, 8, 9, 15, 16, 19} else word[:2]: i for i, word in enumerate(sentence.split(), 1)})
pprint(sorted(result.items(), key=lambda x:x[1]))

[('H', 1),
 ('He', 2),
 ('Li', 3),
 ('Be', 4),
 ('B', 5),
 ('C', 6),
 ('N', 7),
 ('O', 8),
 ('F', 9),
 ('Ne', 10),
 ('Na', 11),
 ('Mi', 12),
 ('Al', 13),
 ('Si', 14),
 ('P', 15),
 ('S', 16),
 ('Cl', 17),
 ('Ar', 18),
 ('K', 19),
 ('Ca', 20)]

05. n-gram

Implement a function that obtains n-grams from a given sequence object (e.g., string and list). Use this function to obtain word bi-grams and letter bi-grams from the sentence “I am an NLPer”

def ngram(n, lst):
  # [str[i:] for i in range(2)] -> ['I am an NLPer', ' am an NLPer']
  # zip(*[str[i:] for i in range(2)]) -> zip('I am an NLPer', ' am an NLPer')
  return list(zip(*[lst[i:] for i in range(n)]))

str = 'I am an NLPer'
words_bi_gram = ngram(2, str.split())
chars_bi_gram = ngram(2, str)

print('word bi-gram:', words_bi_gram)
print('char bi-gram:', chars_bi_gram)

word bi-gram: [('am', 'an'), ('I', 'am'), ('an', 'NLPer')]
char bi-gram: [('I', ' '), (' ', 'N'), ('e', 'r'), ('a', 'm'), (' ', 'a'), ('n', ' '), ('L', 'P'), ('m', ' '), ('P', 'e'), ('N', 'L'), ('a', 'n')]

06. Set

Let the sets of letter bi-grams from the words “paraparaparadise” and “paragraph” $X$ and $Y$ , respectively. Obtain the union, intersection, difference of the two sets. In addition, check whether the bigram “se” is included in the sets $X$ and $Y$

def ngram(n, lst):
  return list(zip(*[lst[i:] for i in range(n)]))

str1 = 'paraparaparadise'
str2 = 'paragraph'
X = set(ngram(2, str1))
Y = set(ngram(2, str2))
union = X | Y
intersection = X & Y
difference = X - Y

print('X:', X)
print('Y:', Y)
print('union:', union)
print('intersection:', intersection)
print('difference:', difference)
print('X included in se:', {('s', 'e')} <= X)
print('Y included in se:', {('s', 'e')} <= Y)

X: {('a', 'r'), ('a', 'p'), ('s', 'e'), ('p', 'a'), ('r', 'a'), ('i', 's'), ('d', 'i'), ('a', 'd')}
Y: {('p', 'h'), ('a', 'r'), ('a', 'p'), ('p', 'a'), ('g', 'r'), ('r', 'a'), ('a', 'g')}
union: {('p', 'h'), ('a', 'r'), ('a', 'p'), ('s', 'e'), ('p', 'a'), ('g', 'r'), ('r', 'a'), ('i', 's'), ('a', 'g'), ('d', 'i'), ('a', 'd')}
intersection: {('p', 'a'), ('r', 'a'), ('a', 'r'), ('a', 'p')}
difference: {('d', 'i'), ('i', 's'), ('a', 'd'), ('s', 'e')}
X included in se: True
Y included in se: False

07. Template-based sentence generation

Implement a function that receives arguments, x, y, and z and returns a string “{y} is {z} at {x}”, where “{x}”, “{y}”, and “{z}” denote the values of x, y, and z, respectively. In addition, confirm the return string by giving the arguments x=12, y="temperature", and z=22.4.

def create_sentence(x,y,z):
    return f'{str(y)} is {str(z)} at {str(x)}'

print(create_sentence(12, 'temperature', 22.4))

Temperature is 22.4 at 12

08. Cipher text

Implement a function cipher that converts a given string with the specification:

Every alphabetical lowercase letter c is converted to a letter whose ASCII code is (219 - [the ASCII code of c])

Keep other letters unchanged

Use this function to cipher and decipher an English message.

def cipher(str):
    rep = [chr(219 - ord(x)) if x.islower() else x for x in str]
    return ''.join(rep)

sentence = 'I am titanium.'
sentence = cipher(sentence)
print('encrypt:', sentence)
sentence = cipher(sentence)
print('decrypt:', sentence)

encrypt: I zn grgzmrfn.
decrypt: I am titanium.

09. Typoglycemia

Write a program with the specification:

Receive a word sequence separated by space

For each word in the sequence:

If the word is no longer than four letters, keep the word unchanged

Otherwise,

Keep the first and last letters unchanged

Shuffle other letters in other positions (in the middle of the word)

Observe the result by giving a sentence, e.g., “I couldn’t believe that I could actually understand what I was reading : the phenomenal power of the human mind “.

import random

sentence = 'I couldn’t believe that I could actually understand what I was reading : the phenomenal power of the human mind .'
print(' '.join(i[0] + ''.join(random.sample(i[1:-1], len(i[1:-1]))) + i[-1] if len(i) > 4 else i for i in sentence.split()
))

I clnuo’dt beevile that I culod aactully unrsntdead what I was rdianeg : the pahenomenl peowr of the human mind .

References

https://nlp100.github.io/en/about.html
https://nlp100.github.io/en/ch01.html

Chunking in LLM Applications

NLP 100 Exercise ch2：UNIX Commands

Descriptive Statistics

Differential Equation

Dimensionality Reduction

Discrete Choice Model

Google Search Console

Hugging Face

Hypothesis Testing

Inferential Statistics

Probability Distribution

Ryusei Kakujo

Weave the future of cities through data

Transportation modeling/ Urban planning/ Machine learning/ Computer science/ GIS