2022-12-09

Multinomial distribution

What is the multinomial distribution

The multinomial distribution is a probability distribution that follows a probability K of events X_1, X_2, ..., X_K with probability p_1, p_2, ..., p_K for each event in a single trial, and the number of times X each event occurs in N repeated trials.

The multinomial distribution is a probability distribution that extends the binomial distribution to K dimensions. In binomial distribution, there are two events (K=2), but when the number of events is six (K=6), like the number of dice rolls, i.e. multidimensional, it becomes multinomial distribution.

When N=1, the multinomial distribution is categorical distribution. The distribution is multinomial distribution when N=1 and K=2 is Bernoulli distribution.

The probability of the multinomial distribution is expressed by the following equation:

P(X=x;N,p_1, p_2, ...,p_K) = \prod_{k=1}^K \frac{N!}{n_k!}p^{x_k}_k
x_k \in \{0,1, ...,N\}, \quad \sum_{k=1}^{K} x_k=N

Multinomial distribution is sometimes denoted as Multinomial(N,p).

Expected value and variance of multinomial distribution

The expected value and variance of the categorical distribution are respectively:

E(X_k)=Np_k \quad (k=1,2,...,K)
V(X_k)=Np_k(1-p_k) \quad (k=1,2,...,K)

Check the effect of $N

With K=3 and event probabilities of 0.2, 0.3, and 0.5, respectively, check the effect of N on the distribution by varying N. The implementation code is as follows.

import numpy as np
from scipy.stats import multinomial
import matplotlib
import matplotlib.pyplot as plt
from matplotlib import animation, rc
from matplotlib.animation import FuncAnimation

rc('animation', html='html5')
np.random.seed(5)

# Set up formatting for the movie files
Writer = animation.writers['ffmpeg']
writer = Writer(fps=15, metadata=dict(artist='Me'), bitrate=1800)

p_v = np.array([0.2, 0.3, 0.5])
N_max = 20
p_max = 0.1

plt.style.use('ggplot')
cm = plt.get_cmap('jet')
fig = plt.figure(figsize=(10, 5))
ax = fig.add_subplot(projection='3d')

def update(N):
    # initialize previous frame
    plt.cla()

    N += 1

    x_vals = np.arange(N + 1)
    X1, X2 = np.meshgrid(x_vals, x_vals)

    x1_vals = np.delete(X1.flatten(), obj=(X1 + X2).flatten() > N)
    x2_vals = np.delete(X2.flatten(), obj=(X1 + X2).flatten() > N)
    x3_vals = np.where(x1_vals+x2_vals <= N, N - (x1_vals+x2_vals), 0.0)

    x_points = np.stack([x1_vals, x2_vals, x3_vals], axis=1)

    probability = multinomial.pmf(x=x_points, n=N, p=p_v)

    ax.bar3d(x=x1_vals - 0.45,
             y=x2_vals - 0.45,
             z=np.zeros_like(x1_vals),
             dx=0.9,
             dy=0.9,
             dz=probability,
             color=cm(probability / p_max),
             alpha=0.5,
             shade=True
    )
    ax.set_xlabel('$x_1$')
    ax.set_ylabel('$x_2$')
    ax.set_zlabel('Probability')
    ax.set_title('$p=(' + ', '.join([str(phi) for phi in p_v]) + ')' +
                 ', N=' + str(N) + '$', loc='left')
    ax.set_zlim(0.0, p_max)

anime_prob = FuncAnimation(fig, update, frames=N_max, interval=100)

anime_prob.save('multinomial_dist.gif', writer='pillow', fps=1)

Multinomial distribution

As N increases, the range of possible values for x increases, and we see that the peak of the distribution becomes smaller.

Ryusei Kakujo

researchgatelinkedingithub

Focusing on data science for mobility

Bench Press 100kg!