pandas中有没有函数可以高效创建字典键值对的列?
我有一个列表,里面包含了很多数据,格式是 [[[key1,value1],[key2,value2],...,500],...,n],也就是一个有500个层级的订单簿数据。现在我想把这个列表转换成一个数据框(dataframe),这个数据框里有列名为 key_1,...,key_500 和 value_1,...,value_500。也就是说,每个层级都要有一个价格和一个交易量的列,比如 level_1_volume, level_2_volume, ..., level_500_volume。
我现在的做法是通过循环遍历现有的数据框(这个数据框里只有这个列表作为一列),使用 df.iterrows() 方法,把值提取到不同的列表中,然后用这些列表创建新的数据框,再把它们合并成列。这样做感觉效率不高,跟我数据集里的其他操作比起来有点慢。有没有什么内置的方法可以更简单地做到这一点呢?
List1 = [[key1.1,val1.1],[key2.1,val2.1],...]
List2 = [[key1.2,val1.2],[key2.2,val2.2],...]
key1_column, val1_column, key2_column, val2_column
Row_List1 key1.1 val1.1 key2.1 val2.1
Row_List2 key1.2 val1.2 key2.2 val2.2
目前的解决方案是 "bids" 和 "asks" 这两个部分,里面简单地包含了形式为 {key1:value1; key2:value2} 的字典/Json 对象,共有500对。
# Initialize empty lists for prices and volumes
prices_bids = [[] for _ in range(500)]
volumes_bids = [[] for _ in range(500)]
prices_asks = [[] for _ in range(500)]
volumes_asks = [[] for _ in range(500)]
# Iterate through each row
for index, row in df.iterrows():
attributes = row["bids"]
result_bids = [[key,attributes[key]] for key in sorted(attributes.keys(), reverse=True)]
attributes = row["asks"]
result_asks = [[key,attributes[key]] for key in sorted(attributes.keys(), reverse=False)]
for i in range(500):
prices_bids[i].append(np.float64(result_bids[i][0]))
volumes_bids[i].append(np.float64(result_bids[i][1]))
prices_asks[i].append(np.float64(result_asks[i][0]))
volumes_asks[i].append(np.float64(result_asks[i][1]))
# Create DataFrame from lists
for i in range(500):
# Expressing prices as spreads
df[f"bid_level_{i}_price"] = pd.Series((df["mid_price"]/pd.Series(prices_bids[i],dtype='float64')-1)*10000,dtype="float64")
df[f"bid_level_{i}_volume"] = pd.Series(volumes_bids[i],dtype='float64')
df[f"ask_level_{i}_price"] = pd.Series((df["mid_price"]/pd.Series(prices_asks[i],dtype='float64')-1)*10000,dtype="float64")
df[f"ask_level_{i}_volume"] = pd.Series(volumes_asks[i],dtype='float64')
2 个回答
0
根据@Nick提供的数据,你可以这样做:
pd.DataFrame(map(dict, data))
key1 key2 key3 key4
0 100 101 102 103
1 200 201 202 203
2 300 301 302 303
0
试着运行下面的代码,它应该能正常工作:
import pandas as pd
# Your list of lists
data = [['key1', 'value1'],
['key2', 'value2'],
['key3', 'value3'],
['key4', 'value4']]
# Convert the list of lists into a dataframe
df = pd.DataFrame(data, columns=['Key', 'Value'])
# Display the original dataframe
print("Original DataFrame:")
print(df)
# Reshape the dataframe by setting the 'Key' column as the index
df.set_index('Key', inplace=True)
# Transpose the dataframe to convert keys into columns
df = df.T
# Reset the index to have numeric index
df.reset_index(drop=True, inplace=True)
# Display the final dataframe
print("\nFinal DataFrame:")
print(df)