数据帧中的操作

2024-04-25 02:26:48 发布

男 | 程序猿一只，喜欢编程写python代码。

我有一个相当大（约5000行）的数据帧，有许多变量，比如2['max'，'min'，按4个参数排序，['Hs'，'Tp'，'wd'，'seed']。看起来是这样的：

>>> data.head()
   Hs  Tp   wd  seed  max  min
0   1   9  165    22  225   18
1   1   9  195    16  190   18
2   2   5  165    43  193   12
3   2  10  180    15  141   22
4   1   6  180    17  219   18
>>> len(data)
4500

我只想保留前2个参数，并获得每个'wd'单独计算的所有'seed'的最大标准偏差。你知道吗

最后，给我留下了唯一的（Hs，Tp）对，每个变量都有最大的标准差。比如：

>>> stdev.head()
  Hs Tp       max       min
0  1  5  43.31321  4.597629
1  1  6  43.20004  4.640795
2  1  7  47.31507  4.569408
3  1  8  41.75081  4.651762
4  1  9  41.35818  4.285991
>>> len(stdev)
30

下面的代码实现了我想要的功能，但是由于我对DataFrames知之甚少，我想知道这些嵌套循环是否可以用一种不同的、更多的dataframe方式来完成（）

import pandas as pd
import numpy as np

#
#data = pd.read_table('data.txt')
#
# don't worry too much about this ugly generator,
# it just emulates the format of my data...
total = 4500
data = pd.DataFrame()
data['Hs'] = np.random.randint(1,4,size=total)
data['Tp'] = np.random.randint(5,15,size=total)
data['wd'] = [[165, 180, 195][np.random.randint(0,3)] for _ in xrange(total)]
data['seed'] = np.random.randint(1,51,size=total)
data['max'] = np.random.randint(100,250,size=total)
data['min'] = np.random.randint(10,25,size=total)

# and here it starts. would the creators of pandas pull their hair out if they see this?
# can this be made better?
stdev = pd.DataFrame(columns = ['Hs', 'Tp', 'max', 'min'])
i=0
for hs in set(data['Hs']):
    data_Hs = data[data['Hs'] == hs]
    for tp in set(data_Hs['Tp']):
        data_tp = data_Hs[data_Hs['Tp'] == tp]
        stdev.loc[i] = [
               hs, 
               tp, 
               max([np.std(data_tp[data_tp['wd']==wd]['max']) for wd in set(data_tp['wd'])]), 
               max([np.std(data_tp[data_tp['wd']==wd]['min']) for wd in set(data_tp['wd'])])]
        i+=1

谢谢！你知道吗

附言：如果好奇的话，这是根据海浪变化的统计数据。Hs是波高、Tp波周期、wd波方向，种子代表一个不规则波列的不同实现，min和max是某一曝光时间内的峰值或my变量。在所有这些之后，通过标准差和平均值，我可以对数据拟合一些分布，比如Gumbel。你知道吗

Tags： in for data size np random min max

1条回答

网友

1楼 · 发布于 2024-04-25 02:26:48

如果我理解正确的话，这可能是一行：

data.groupby(['Hs', 'Tp', 'wd'])[['max', 'min']].std(ddof=0).max(level=[0, 1])

（如果您愿意，可以在末尾加上reset_index()）

数据帧中的操作

相关问题更多 >

编程相关推荐

热门问题

热门文章

数据帧中的操作

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >