2023-02-03

Hugging Face Transformers：Pipeline

Machine Learning

NLP

Hugging Face

Python

Hugging Face Transformers Pipeline

Hugging face TransformersのPipelineを使うと、わずか数行のコードでNLPのタスクを実行することができます。

Pipelineでは、生のテキストデータを受け取ったときに内部的に次の3つのステップが実行されます。

Tokenizer: モデルへの入力形式に変換するための前処理が実行される
Model: 変換された入力テキストがモデルへ入力される
Post Processing: モデルの推論結果が扱いやすい形に後処理されて出力される

Pipeline flow
Behind the pipeline

Pipeline の使い方

次のコマンドでHugging Face Transformersをインストールします。

$ pip install transformers

pipeline("question-answering")のように、行いたいタスク名をpipelineに指定します。タスクは例えば次のようなものがあります。

feature-extraction (get the vector representation of a text)
fill-mask
ner (named entity recognition)
question-answering
sentiment-analysis
summarization
text-generation
translation
zero-shot-classification

詳細は次のリンクから確認することができます。

例えば、テキスト分類を行いたいときは次のように記述します。

from transformers import pipeline

pipe = pipeline("text-classification")
pipe("This restaurant is awesome")

次の結果が返ってきます。

[{'label': 'POSITIVE', 'score': 0.9998743534088135}]

Hub から特定のモデルを使用したい場合、Hub上のモデルがすでにそのタスクを定義しているときに限り、タスク名を省略することができます。

from transformers import pipeline

pipe = pipeline(model="roberta-large-mnli")
pipe("This restaurant is awesome")

>> [{'label': 'NEUTRAL', 'score': 0.7313136458396912}]

入力にリストを渡すことも可能です。

from transformers import pipeline

pipe = pipeline("text-classification")
pipe(["This restaurant is awesome", "This restaurant is awful"])

>> [{'label': 'POSITIVE', 'score': 0.9998743534088135},
>>  {'label': 'NEGATIVE', 'score': 0.9996669292449951}]

カスタムのPipelineを定義することも可能です。

class MyPipeline(TextClassificationPipeline):
    def postprocess():
        # Your code goes here
        scores = scores * 100
        # And here

my_pipeline = MyPipeline(model=model, tokenizer=tokenizer, ...)
# or if you use *pipeline* function, then:
my_pipeline = pipeline(model="xxxx", pipeline_class=MyPipeline)

Pipeline の例

次のNLPタスクの例を紹介します。

Zero-shot classification
Text generation
Mask filling

Zero-shot classification

Zero-shot classificationは、ラベリングされたテキストを用意することなく、分類したいラベルをPipelineに直接与えるだけで、そのラベルに対する推論結果を返すというタスクです。テキストにアノテーションを付けるのは通常時間がかかり、専門知識が必要になります。このような場合にZero-shot classificationは有効です。

from transformers import pipeline

classifier = pipeline("zero-shot-classification")
classifier(
    "This is a course about the Transformers library",
    candidate_labels=["education", "politics", "business"],
)

{'sequence': 'This is a course about the Transformers library',
 'labels': ['education', 'business', 'politics'],
 'scores': [0.8445988297462463, 0.11197440326213837, 0.04342682659626007]}

Text generation

Text generationは、プロンプトを与えるとモデルが残りのテキストを生成してそれをオートコンプリートするタスクです。

Hugging Face Transformers：Pipeline

Hugging Face Transformers Pipeline

Pipeline の使い方

Pipeline の例

Zero-shot classification

Text generation

Mask filling

参考

Hugging Face Transformers：概要

Hugging Face Transformers：Model

Ryusei Kakujo