导入Pandas中的嵌套字典数据

2024-05-23 21:31:06 发布

您现在位置:Python中文网/ 问答频道 /正文

如果我的json文件如下所示

!head test.json

{"Item":{"title":{"S":"https://medium.com/media/d40eb665beb374c0baaacb3b5a86534c/href"}}}
{"Item":{"title":{"S":"https://fasttext.cc/docs/en/autotune.html"}}}
{"Item":{"title":{"S":"https://nlp.stanford.edu/~socherr/EMNLP2013_RNTN.pdf"}}}
{"Item":{"title":{"S":"https://github.com/avinashbarnwal/GSOC-2019/tree/master/AFT/test/data/neuroblastoma-data-master/data/H3K27ac-H3K4me3_TDHAM_BP"}}}

我可以导入熊猫中的数据使用

import pandas as pd
df = pd.read_json("test.json", lines=True, orient="columns")

但是数据看起来是这样的

Item
0   {'title': {'S': 'https://medium.com/media/d40e...
1   {'title': {'S': 'https://fasttext.cc/docs/en/a...
2   {'title': {'S': 'https://nlp.stanford.edu/~soc...
3   {'title': {'S': 'https://github.com/avinashbar...

我需要在一列中的所有URL


Tags: httpstestcomjsondocsdatanlptitle
2条回答

test.json的有效json格式

[{"Item":{"title":{"S":"https://medium.com/media/d40eb665beb374c0baaacb3b5a86534c/href"}}},
{"Item":{"title":{"S":"https://fasttext.cc/docs/en/autotune.html"}}},
{"Item":{"title":{"S":"https://nlp.stanford.edu/~socherr/EMNLP2013_RNTN.pdf"}}},
{"Item":{"title":{"S":"https://github.com/avinashbarnwal/GSOC-2019/tree/master/AFT/test/data/neuroblastoma-data-master/data/H3K27ac-H3K4me3_TDHAM_BP"}}}]

使用此代码:

df = pd.read_json("test.json")
df['url'] = df['Item'].apply(lambda x: x.get('title').get('S'))
print(df['url'])

输出:

0    https://medium.com/media/d40eb665beb374c0baaac...
1            https://fasttext.cc/docs/en/autotune.html
2    https://nlp.stanford.edu/~socherr/EMNLP2013_RN...
3    https://github.com/avinashbarnwal/GSOC-2019/tr...
  • 在这种情况下,最容易在df'Item'列上使用pandas.json_normalize
  • 因为您有一列链接,所以我提供了代码,将其显示为笔记本中的可单击链接,或保存到html文件中
import pandas as pd
from IPython.display import HTML  # used to show clickable link in a notebook

# read the file in as you are already doing
df = pd.read_json("test.json", lines=True, orient="columns")

# normalized the Item column
df = pd.json_normalize(df.Item)

# optional steps
# make the link clickable
df['title.S'] = '<a href=' + df['title.S'] + '>' +  df['title.S'] + '</a>'

# display clickable dataframe in notebook
HTML(df.to_html(escape=False))

# save to html file
HTML(so.to_html('test.html', escape=False))

相关问题 更多 >