2023-01-14

Kedro dan Jupyter

Kedro dan Jupyter

Kedro dapat dikembangkan bersama dengan Jupyter Notebook, Jupyter Lab, dan IPython.

$ kedro jupyter notebook

Jupyter Notebook

$ kedro jupyter lab

Jupyter Lab

$ kedro ipython

In [1]:
In [2]: exit()

Variabel Kedro

Kedro memungkinkan variabel berikut digunakan dalam Jupyter Notebook.

catalog
context
pipelines
session

Kita akan membuat proyek contoh untuk pandas-iris dan memeriksa variabel-variabel di atas.

$ kedro new --starter=pandas-iris
$ cd iris
$ kedro jupyter notebook

Klik New > Kedro (iris) untuk membuat notebook baru.

iris

catalog

catalog memungkinkan Anda untuk mencari DataCatalog yang berisi parameter.

In [1]: catalog.list()

[
    'example_iris_data',
    'parameters',
    'params:train_fraction',
    'params:random_state',
    'params:target_column'
]

In [2]: catalog.load("example_iris_data")

INFO     Loading data from 'example_iris_data' (CSVDataSet)...
	sepal_length	sepal_width	petal_length	petal_width	species
0	  5.1	          3.5	        1.4	          0.2	        setosa
1	  4.9	          3.0	        1.4	          0.2	        setosa
2	  4.7	          3.2	        1.3	          0.2	        setosa
3	  4.6	          3.1	        1.5	          0.2	        setosa
4	  5.0	          3.6	        1.4	          0.2	        setosa
...	...	          ...	        ...	          ...	        ...
145	6.7	          3.0	        5.2	          2.3	        virginica
146	6.3	          2.5	        5.0	          1.9	        virginica
147	6.5	          3.0	        5.2	          2.0	        virginica
148	6.2	          3.4	        5.4	          2.3	        virginica
149	5.9	          3.0	        5.1	          1.8	        virginica

150 rows × 5 columns

In [3]: catalog.load("parameters")

INFO     Loading data from 'parameters' (MemoryDataSet)...
{'train_fraction': 0.8, 'random_state': 3, 'target_column': 'species'}

context

context menyediakan akses ke komponen pustaka kedro dan metadata proyek.

In [4]: context.project_path

PosixPath('/Users/ryu/iris')

pipeline

Gunakan pipeline untuk menampilkan pipeline yang terdaftar dalam proyek Anda.

In [5]: pipelines

{'__default__': Pipeline([
Node(split_data, ['example_iris_data', 'parameters'], ['X_train', 'X_test', 'y_train', 'y_test'], 'split'),
Node(make_predictions, ['X_train', 'X_test', 'y_train'], 'y_pred', 'make_predictions'),
Node(report_accuracy, ['y_pred', 'y_test'], None, 'report_accuracy')
])}

In [6]: pipelines["__default__"].all_outputs()

{'y_test', 'y_train', 'X_test', 'y_pred', 'X_train'}

session

session dapat digunakan untuk mengeksekusi pipeline.

In [7]: session.run()

[01/15/23 09:24:05] INFO     Kedro project iris                                                      session.py:340
[01/15/23 09:24:06] INFO     Loading data from 'example_iris_data' (CSVDataSet)...              data_catalog.py:343
                    INFO     Loading data from 'parameters' (MemoryDataSet)...                  data_catalog.py:343
                    INFO     Running node: split: split_data([example_iris_data,parameters]) ->         node.py:327
                             [X_train,X_test,y_train,y_test]
                    INFO     Saving data to 'X_train' (MemoryDataSet)...                        data_catalog.py:382
                    INFO     Saving data to 'X_test' (MemoryDataSet)...                         data_catalog.py:382
                    INFO     Saving data to 'y_train' (MemoryDataSet)...                        data_catalog.py:382
                    INFO     Saving data to 'y_test' (MemoryDataSet)...                         data_catalog.py:382
                    INFO     Completed 1 out of 3 tasks                                     sequential_runner.py:85
                    INFO     Loading data from 'X_train' (MemoryDataSet)...                     data_catalog.py:343
                    INFO     Loading data from 'X_test' (MemoryDataSet)...                      data_catalog.py:343
                    INFO     Loading data from 'y_train' (MemoryDataSet)...                     data_catalog.py:343
                    INFO     Running node: make_predictions: make_predictions([X_train,X_test,y_train]) node.py:327
                             -> [y_pred]
                    INFO     Saving data to 'y_pred' (MemoryDataSet)...                         data_catalog.py:382
                    INFO     Completed 2 out of 3 tasks                                     sequential_runner.py:85
                    INFO     Loading data from 'y_pred' (MemoryDataSet)...                      data_catalog.py:343
                    INFO     Loading data from 'y_test' (MemoryDataSet)...                      data_catalog.py:343
                    INFO     Running node: report_accuracy: report_accuracy([y_pred,y_test]) -> None    node.py:327
                    INFO     Model has accuracy of 0.933 on test data.                                  nodes.py:74
                    INFO     Completed 3 out of 3 tasks                                     sequential_runner.py:85
                    INFO     Pipeline execution completed successfully.                                runner.py:90

%reload_kedro

Anda dapat memuat ulang variabel-variabel Kedro dengan menjalankan %reload_kedro.

In [8]: %reload_kedro

[01/15/23 09:25:42] INFO     Resolved project path as: /Users/ryu/iris.         __init__.py:132
                             To set a different path, run '%reload_kedro <project_root>'
[01/15/23 09:25:43] INFO     Kedro project Iris                                                     __init__.py:101
                    INFO     Defined global variable 'context', 'session', 'catalog' and            __init__.py:102
                             'pipelines'
                    INFO     Registered line magic 'run_viz'                                        __init__.py:108

Dokumentasi untuk %reload_kedro dapat ditemukan dengan perintah berikut.

In [9]: %reload_kedro?

Docstring:
::

  %reload_kedro [-e ENV] [--params PARAMS] [path]

The `%reload_kedro` IPython line magic. See
 https://kedro.readthedocs.io/en/stable/tools_integration/ipython.html for more.

positional arguments:
  path               Path to the project root directory. If not given, use the
                     previously setproject root.

optional arguments:
  -e ENV, --env ENV  Kedro configuration environment name. Defaults to
                     `local`.
  --params PARAMS    Specify extra parameters that you want to pass to the
                     context initializer. Items must be separated by comma,
                     keys - by colon, example: param1:value1,param2:value2.
                     Each parameter is split by the first comma, so parameter
                     values are allowed to contain colons, parameter keys are
                     not. To pass a nested dictionary as parameter, separate
                     keys by '.', example: param_group.param1:value1.
File:      ~/Program/MLOps/kedro/venv/lib/python3.8/site-packages/kedro/ipython/__init__.py

%run_viz

Jalankan %run_viz untuk memulai Kedro-Viz.

In [10]: %run_viz

run_viz

Konversi kode Notebook Jupyter ke Node

Kedro memungkinkan Anda untuk menyalin kode yang ditulis di Jupyter Notebook ke Node.

Misalkan fungsi berikut ini ditulis di Jupyter Notebook.

def some_action():
    print("This function came from `notebooks/my_notebook.ipynb`")

Pada Jupyter Notebook, klik View > Cell Toolbar > Tags dan tambahkan tag node ke sel.

jupyter_notebook_workflow_activating_tags
jupyter_notebook_workflow_tagging_nodes

Simpan Jupyter Notebook sebagai my_notebook dan pindahkan file-file ke folder notebooks dengan perintah berikut.

$ mv my_notebook.ipynb notebooks

Jalankan perintah berikut ini.

$ kedro jupyter convert notebooks/my_notebook.ipynb

Anda dapat melihat bahwa fungsi tersebut telah ditambahkan ke src/iris/nodes/my_notebook.py.

$ cat src/iris/nodes/my_notebook.py

def some_action():
    print("This function came from `notebooks/my_notebook.ipynb`")

Referensi

https://github.com/kedro-org/kedro-viz

Kedro Viz

Kedro DataCatalog

Descriptive Statistics

Differential Equation

Dimensionality Reduction

Discrete Choice Model

Google Search Console

Hugging Face

Hypothesis Testing

Inferential Statistics

Probability Distribution

Ryusei Kakujo

Weave the future of cities through data

Transportation modeling/ Urban planning/ Machine learning/ Computer science/ GIS