pandas中有没有函数可以高效创建字典键值对的列?

0 投票
2 回答
56 浏览
提问于 2025-04-12 16:30

我有一个列表,里面包含了很多数据,格式是 [[[key1,value1],[key2,value2],...,500],...,n],也就是一个有500个层级的订单簿数据。现在我想把这个列表转换成一个数据框(dataframe),这个数据框里有列名为 key_1,...,key_500 和 value_1,...,value_500。也就是说,每个层级都要有一个价格和一个交易量的列,比如 level_1_volume, level_2_volume, ..., level_500_volume。

我现在的做法是通过循环遍历现有的数据框(这个数据框里只有这个列表作为一列),使用 df.iterrows() 方法,把值提取到不同的列表中,然后用这些列表创建新的数据框,再把它们合并成列。这样做感觉效率不高,跟我数据集里的其他操作比起来有点慢。有没有什么内置的方法可以更简单地做到这一点呢?

List1 = [[key1.1,val1.1],[key2.1,val2.1],...]
List2 = [[key1.2,val1.2],[key2.2,val2.2],...]

         key1_column, val1_column, key2_column, val2_column
Row_List1 key1.1       val1.1           key2.1          val2.1
Row_List2 key1.2       val1.2           key2.2          val2.2

目前的解决方案是 "bids" 和 "asks" 这两个部分,里面简单地包含了形式为 {key1:value1; key2:value2} 的字典/Json 对象,共有500对。

# Initialize empty lists for prices and volumes
prices_bids = [[] for _ in range(500)]
volumes_bids = [[] for _ in range(500)]
prices_asks = [[] for _ in range(500)]
volumes_asks = [[] for _ in range(500)]

# Iterate through each row
for index, row in df.iterrows():

    attributes = row["bids"]
    result_bids = [[key,attributes[key]] for key in sorted(attributes.keys(), reverse=True)]

    attributes = row["asks"]
    result_asks = [[key,attributes[key]] for key in sorted(attributes.keys(), reverse=False)]
    
    for i in range(500):
        prices_bids[i].append(np.float64(result_bids[i][0]))
        volumes_bids[i].append(np.float64(result_bids[i][1]))
        prices_asks[i].append(np.float64(result_asks[i][0]))
        volumes_asks[i].append(np.float64(result_asks[i][1]))
        

# Create DataFrame from lists
for i in range(500):
    # Expressing prices as spreads
    df[f"bid_level_{i}_price"] = pd.Series((df["mid_price"]/pd.Series(prices_bids[i],dtype='float64')-1)*10000,dtype="float64")
    df[f"bid_level_{i}_volume"] = pd.Series(volumes_bids[i],dtype='float64')
    df[f"ask_level_{i}_price"] = pd.Series((df["mid_price"]/pd.Series(prices_asks[i],dtype='float64')-1)*10000,dtype="float64")
    df[f"ask_level_{i}_volume"] = pd.Series(volumes_asks[i],dtype='float64')

2 个回答

0

根据@Nick提供的数据,你可以这样做:

pd.DataFrame(map(dict, data))

   key1  key2  key3  key4
0   100   101   102   103
1   200   201   202   203
2   300   301   302   303
0

试着运行下面的代码,它应该能正常工作:

import pandas as pd

# Your list of lists
data = [['key1', 'value1'],
        ['key2', 'value2'],
        ['key3', 'value3'],
        ['key4', 'value4']]

# Convert the list of lists into a dataframe
df = pd.DataFrame(data, columns=['Key', 'Value'])

# Display the original dataframe
print("Original DataFrame:")
print(df)

# Reshape the dataframe by setting the 'Key' column as the index
df.set_index('Key', inplace=True)

# Transpose the dataframe to convert keys into columns
df = df.T

# Reset the index to have numeric index
df.reset_index(drop=True, inplace=True)

# Display the final dataframe
print("\nFinal DataFrame:")
print(df)

撰写回答