蒙特卡罗模拟在Python中的多个输入

import pandas as pd import matplotlib.pyplot as plt import numpy as np ID = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20] Revenue = [1000, 1200, 1300, 100 ,500, 0, 800, 950, 4321, 800, 1000, 1200, 1300, 100 ,500, 0, 800, 950, 4321, 800] odds = [0.5, 0.6, 0.33, 0.1, 0.9, 0.87, 0.37, 0.55, 0.97, 0.09, 0.5, 0.6, 0.33, 0.1, 0.9, 0.87, 0.37, 0.55, 0.97, 0.09] d = {'ID': ID, 'Revenue': Revenue, 'Odds': odds} df = pd.DataFrame(d) df['Expected Value'] = df['Revenue']*df['Odds'] print(df) num_samples = 100 df['Random Number'] = np.random.rand(len(df)) def monte_carlo_array(df): for _ in range(len(df)): yield [] mc_arrays = list(monte_carlo_array(df)) # Fill each list with 100 observations (no filtering necessary) id_1 = [] filter_1 = (df['ID'] == 5) for _ in range(num_samples): sample = df['Revenue'] * np.where(np.random.rand(len(df)) < \ df['Odds'], 1, 0) for l in monte_carlo_array(df): for i in l: mc_arrays[i].append(sample.sum()) id_1.append(sample.loc[filter_1].sum()) # Plot simulation results. n_bins = 10 plt.hist([id_1], bins=n_bins, label=["ID: 1"]) plt.legend() plt.title("{} simulations of revenue".format(num_samples)) print(mc_arrays) df['Monte Carlo Mean'] = np.mean(mc_arrays[0]) print(df['Monte Carlo Mean'])

1条回答

网友

1楼 · 发布于 2024-05-26 14:20:25

IIUC，这就是你要做的：

对于每一行（表示一个ID），您需要总共num_samples的蒙特卡罗模拟，以判断该行是否实现了它的Revenue。在
确定给定模拟实例是否实现其Revenue的方法是将[0,1]中随机抽取的值与该行的Odds进行比较（以标准montecarlo方式）。在
您想知道所有样本中每行Revenue的平均值和标准差。在

如果是这样，您可以通过利用二项式分布的采样函数来实现这一点，而不是从一个均匀分布中提取，然后基于Odds进行过滤。我将在这篇文章的最后用一个答案来回答。在

但是，由于您已经开始使用统一绘制方法：我建议首先使用num_samples来生成n_rows = len(df)的采样矩阵s_draws（在我下面的代码中又称n_draws）。然后对Odds的每一行中的所有样本应用Odds检查。然后乘以Revenue，取每行的平均值和sd。像这样：

首先，绘制采样矩阵：

np.random.seed(42)

n_rows = len(df)
n_draws = 5
s_draws = pd.DataFrame(np.random.rand(n_rows, n_draws))

# the matrix of random values between [0,1]
# note: only showing the first 3 rows for brevity
s_draws
           0         1         2         3         4
0   0.374540  0.950714  0.731994  0.598658  0.156019
1   0.155995  0.058084  0.866176  0.601115  0.708073
2   0.020584  0.969910  0.832443  0.212339  0.181825
...

现在找出哪些示例实例“实现”了目标Revenue：

^{pr2}$

最后，计算每行/ID的摘要统计信息：

s_result = pd.DataFrame({"avg": s_rev.mean(axis=1), "sd": s_rev.std(axis=1)})

# the summary statistics of each row of samples
s_result
       avg          sd
0    400.0  547.722558
1    480.0  657.267069
2    780.0  712.039325
...

下面是使用二项式抽样的版本：

draws = pd.DataFrame(
    np.random.binomial(n=1, p=df.Odds, size=(n_draws, n_rows)).T
).multiply(df.Revenue, axis=0)

pd.DataFrame({"avg": draws.mean(axis=1), "sd": draws.std(axis=1)})

注意：如果ID在df中的多行中重复执行，则这一切的工作方式会略有不同。在这种情况下，可以使用groupby，然后获取摘要统计信息。但是在您的例子中，ID从来没有重复过，所以我将把答案保留到现在。在

相关问题更多 >

编程相关推荐

热门问题

热门文章