Traffine I/O

English

2022-12-09

Multinomial distribution

Statistics

Probability Distribution

Probability Distribution

Python

What is the multinomial distribution

The multinomial distribution is a probability distribution that follows a probability $K$ of events $X_1, X_2, ..., X_K$ with probability $p_1, p_2, ..., p_K$ for each event in a single trial, and the number of times $X$ each event occurs in $N$ repeated trials.

The multinomial distribution is a probability distribution that extends the binomial distribution to $K$ dimensions. In binomial distribution, there are two events ( $K=2$ ), but when the number of events is six ( $K=6$ ), like the number of dice rolls, i.e. multidimensional, it becomes multinomial distribution.

When $N=1$ , the multinomial distribution is categorical distribution. The distribution is multinomial distribution when $N=1$ and $K=2$ is Bernoulli distribution.

The probability of the multinomial distribution is expressed by the following equation:

P(X=x;N,p_1, p_2, ...,p_K) = \prod_{k=1}^K \frac{N!}{n_k!}p^{x_k}_k

x_k \in \{0,1, ...,N\}, \quad \sum_{k=1}^{K} x_k=N

Multinomial distribution is sometimes denoted as $Multinomial(N,p)$ .

Expected value and variance of multinomial distribution

The expected value and variance of the categorical distribution are respectively:

E(X_k)=Np_k \quad (k=1,2,...,K)

V(X_k)=Np_k(1-p_k) \quad (k=1,2,...,K)

Check the effect of $N

With $K=3$ and event probabilities of 0.2, 0.3, and 0.5, respectively, check the effect of $N$ on the distribution by varying $N$ . The implementation code is as follows.

import numpy as np
from scipy.stats import multinomial
import matplotlib
import matplotlib.pyplot as plt
from matplotlib import animation, rc
from matplotlib.animation import FuncAnimation

rc('animation', html='html5')
np.random.seed(5)

# Set up formatting for the movie files
Writer = animation.writers['ffmpeg']
writer = Writer(fps=15, metadata=dict(artist='Me'), bitrate=1800)

p_v = np.array([0.2, 0.3, 0.5])
N_max = 20
p_max = 0.1

plt.style.use('ggplot')
cm = plt.get_cmap('jet')
fig = plt.figure(figsize=(10, 5))
ax = fig.add_subplot(projection='3d')

def update(N):
    # initialize previous frame
    plt.cla()

    N += 1

    x_vals = np.arange(N + 1)
    X1, X2 = np.meshgrid(x_vals, x_vals)

    x1_vals = np.delete(X1.flatten(), obj=(X1 + X2).flatten() > N)
    x2_vals = np.delete(X2.flatten(), obj=(X1 + X2).flatten() > N)
    x3_vals = np.where(x1_vals+x2_vals <= N, N - (x1_vals+x2_vals), 0.0)

    x_points = np.stack([x1_vals, x2_vals, x3_vals], axis=1)

    probability = multinomial.pmf(x=x_points, n=N, p=p_v)

    ax.bar3d(x=x1_vals - 0.45,
             y=x2_vals - 0.45,
             z=np.zeros_like(x1_vals),
             dx=0.9,
             dy=0.9,
             dz=probability,
             color=cm(probability / p_max),
             alpha=0.5,
             shade=True
    )
    ax.set_xlabel('$x_1$')
    ax.set_ylabel('$x_2$')
    ax.set_zlabel('Probability')
    ax.set_title('$p=(' + ', '.join([str(phi) for phi in p_v]) + ')' +
                 ', N=' + str(N) + '$', loc='left')
    ax.set_zlim(0.0, p_max)

anime_prob = FuncAnimation(fig, update, frames=N_max, interval=100)

anime_prob.save('multinomial_dist.gif', writer='pillow', fps=1)

Multinomial distribution

As $N$ increases, the range of possible values for $x$ increases, and we see that the peak of the distribution becomes smaller.

Categorical distribution

Normal distribution

AlloyDB

Amazon Cognito

Amazon EC2

Amazon ECS

Amazon QuickSight

Amazon QuickSight

Amazon RDS

Amazon Redshift

Amazon Redshift

Amazon S3

API

Autonomous Vehicle

Autonomous Vehicle

AWS

AWS API Gateway

AWS API Gateway

AWS Chalice

AWS Control Tower

AWS Control Tower

AWS IAM

AWS Lambda

AWS VPC

BERT

BigQuery

Causal Inference

Causal Inference

ChatGPT

Chrome Extension

Chrome Extension

CircleCI

Classification

Cloud Functions

Cloud Functions

Cloud IAM

Cloud Run

Cloud Storage

Clustering

CSS

Data Engineering

Data Engineering

Data Modeling

Database

dbt

Decision Tree

Deep Learning

Descriptive Statistics

Descriptive Statistics

Differential Equation

Differential Equation

Dimensionality Reduction

Dimensionality Reduction

Discrete Choice Model

Discrete Choice Model

Docker

Economics

FastAPI

Firebase

GIS

git

GitHub

GitHub Actions

Google

Google Cloud

Google Search Console

Google Search Console

Hugging Face

Hypothesis Testing

Hypothesis Testing

Inferential Statistics

Inferential Statistics

Interval Estimation

Interval Estimation

JavaScript

Jinja

Kedro

Kubernetes

LightGBM

Linux

LLM

Mac

Machine Learning

Machine Learning

Macroeconomics

Marketing

Mathematical Model

Mathematical Model

Meltano

MLflow

MLOps

MySQL

NextJS

NLP

Nodejs

NoSQL

ONNX

OpenAI

Optimization Problem

Optimization Problem

Optuna

Pandas

Pinecone

PostGIS

PostgreSQL

Probability Distribution

Probability Distribution

Product

Project

Psychology

Python

PyTorch

QGIS

ReactJS

Regression

Rideshare

SEO

Singer

sklearn

Slack

Snowflake

Software Development

Software Development

SQL

Statistical Model

Statistical Model

Statistics

Streamlit

Tabular

Tailwind CSS

TensorFlow

Terraform

Transportation

TypeScript

Urban Planning

Vector Database

Vector Database

Vertex AI

VSCode

XGBoost

Ryusei Kakujo

researchgate

github

Weave the future of cities through data

Transportation modeling/ Urban planning/ Machine learning/ Computer science/ GIS