如何创建padas.数据帧从JSON列表

3条回答

网友

1楼 · 编辑于 2024-04-19 04:14:53

如果你的csv数据看起来像这样。你知道吗

（我将引号添加到genres json的键中只是为了方便地使用json包。因为这不是主要的问题，所以可以作为预处理来完成）

您必须遍历输入DataFrame的所有行。你知道吗

for index, row in inputDf.iterrows():
    fullDataFrame = pd.concat([fullDataFrame, get_dataframe_for_a_row(row)])

在get\ dataframe\中，为\行函数：

准备一个包含列标题和值行['title']的数据帧
添加具有通过将id附加到“genre”而形成的名称的列。你知道吗
给他们赋值1

然后为每一行构建一个数据帧，并将它们连接到一个完整的数据帧。 pd.concat公司（）连接从每行获得的数据帧。将合并已存在的组件。你知道吗

最后，fullDataFrame.fillna(0)将NaN替换为0

您的最终数据帧将如下所示。

以下是完整代码：

import pandas as pd
import json

inputDf = pd.read_csv('title_genre.csv')

def labels_for_genre(a):
    a[0]['id']
    labels = []
    for i in range(0 , len(a)):
        label = 'genre'+'_'+str(a[i]['id'])
        labels.append(label)
    return labels

def get_dataframe_for_a_row(row): 
    labels = labels_for_genre(json.loads(row['genres']))
    tempDf = pd.DataFrame()
    tempDf['title'] = [row['title']]
    for label in labels:
        tempDf[label] = ['1']
    return tempDf

fullDataFrame = pd.DataFrame()
for index, row in inputDf.iterrows():
    fullDataFrame = pd.concat([fullDataFrame, get_dataframe_for_a_row(row)])
fullDataFrame = fullDataFrame.fillna(0)

网友

2楼 · 编辑于 2024-04-19 04:14:53

据我所知，没有办法以矢量化的方式对Pandas数据帧执行JSON反序列化。您应该能够做到这一点的一种方法是使用^{}，它将允许您在一个循环中完成这一点（尽管比大多数内置操作要慢）。你知道吗

import json

df = # ... your dataframe

for index, row in df.iterrows():
    # deserialize the JSON string
    json_data = json.loads(row['genres'])

    # add a new column for each of the genres (Pandas is okay with it being sparse)
    for genre in json_data:
        df.loc[index, genre['name']] = 1  # update the row in the df itself

df.drop(['genres'], axis=1, inplace=True)

请注意，带有的空单元格必须用NaN而不是0填充。您应该使用^{}来更改此设置。一个简单的示例，其数据帧非常相似

In [1]: import pandas as pd

In [2]: df = pd.DataFrame([{'title': 'hello', 'json': '{"foo": "bar"}'}, {'title': 'world', 'json': '{"foo": "bar", "ba
   ...: z": "boo"}'}])

In [3]: df.head()
Out[3]:
                           json  title
0                {"foo": "bar"}  hello
1  {"foo": "bar", "baz": "boo"}  world

In [4]: import json
   ...: for index, row in df.iterrows():
   ...:     data = json.loads(row['json'])
   ...:     for k, v in data.items():
   ...:         df.loc[index, k] = v
   ...: df.drop(['json'], axis=1, inplace=True)

In [5]: df.head()
Out[5]:
   title  foo  baz
0  hello  bar  NaN
1  world  bar  boo

网友

3楼 · 编辑于 2024-04-19 04:14:53

无iterrows的完整工作溶液：

import pandas as pd
import itertools
import json

# read data
movies_df = pd.read_csv('https://gist.githubusercontent.com/feeeper/9c7b1e8f8a4cc262f17675ef0f6e1124/raw/022c0d45c660970ca55e889cd763ce37a54cc73b/example.csv', converters={ 'genres': json.loads })

# get genres for all items
all_genres_entries = list(itertools.chain.from_iterable(movies_df['genres'].values))

# create the list with unique genres
genres = list({v['id']:v for v in all_genres_entries}.values())

# fill genres columns
for genre in genres:
    movies_df['genre_{}'.format(genre['id'])] = movies_df['genres'].apply(lambda x: 1 if genre in x else 0)

相关问题更多 >

编程相关推荐

热门问题

热门文章