如何从jsonlike tex中提取值

2024-04-24 08:55:06 发布

您现在位置:Python中文网/ 问答频道 /正文

我想从类似json的文本中提取如下值:

df.head()
    budget  genres  homepage    id  keywords    original_language   original_title  overview    popularity  production_companies    ... runtime spoken_languages    status  tagline title   vote_average    vote_count  movie   cast    crew
0   237000000   [{"id": 28, "name": "Action"}, {"id": 12, "nam...   http://www.avatarmovie.com/ 19995   [{"id": 1463, "name": "culture clash"}, {"id":...   en  Avatar  In the 22nd century, a paraplegic Marine is di...   150.437577  [{"name": "Ingenious Film Partners", "id": 289...   ... 162.0   [{"iso_639_1": "en", "name": "English"}, {"iso...   Released    Enter the World of Pandora. Avatar  7.2 11800   Avatar  [{"cast_id": 242, "character": "Jake Sully", "...   [{"credit_id": "52fe48009251416c750aca23", "de...
1   300000000   [{"id": 12, "name": "Adventure"}, {"id": 14, "...   http://disney.go.com/disneypictures/pirates/    285 [{"id": 270, "name": "ocean"}, {"id": 726, "na...   en  Pirates of the Caribbean: At World's End    Captain Barbossa, long believed to be dead, ha...   139.082615  [{"name": "Walt Disney Pictures", "id": 2}, {"...   ... 169.0   [{"iso_639_1": "en", "name": "English"}]    Released    At the end of the world, the adventure begins.  Pirates of the Caribbean: At World's End    6.9 4500    Pirates of the Caribbean: At World's End    [{"cast_id": 4, "character": "Captain Jack Spa...   [{"credit_id": "52fe4232c3a36847f800b579", "de...
2   245000000   [{"id": 28, "name": "Action"}, {"id": 12, "nam...   http://www.sonypictures.com/movies/spectre/ 206647  [{"id": 470, "name": "spy"}, {"id": 818, "name...   en  Spectre A cryptic message from Bond’s past sends him o...

我试过:

# Parse the stringified features into their corresponding python objects
from ast import literal_eval

features = ['cast', 'crew', 'keywords', 'genres', 'original_language']
for feature in features:
    df[feature] = df[feature].apply(literal_eval)

…这引起了:

ValueError: malformed node or string: <_ast.Name object at 0x7f5c5a523358>

会得到帮助的。你知道吗


Tags: ofthenamecomidhttpdfworld
1条回答
网友
1楼 · 发布于 2024-04-24 08:55:06

我认为问题在于错误的值,一种可能的解决方案是使用try-except语句创建自定义函数:

df = pd.DataFrame({'genres':['[{"id": 28, "name": "Action"}]',
                             '[{"id": 28, "name": "Action"}, {"id": 12, "n]']})
print (df)
                                          genres
0                 [{"id": 28, "name": "Action"}]
1  [{"id": 28, "name": "Action"}, {"id": 12, "n]

from ast import literal_eval

def literal_eval_cust(x):
    try:
        return literal_eval(x)
    except Exception:
        return {}

features = ['genres']
for feature in features:
    df[feature] = df[feature].apply(literal_eval_cust)

print (df)
                           genres
0  [{'id': 28, 'name': 'Action'}]
1                              {}

相关问题 更多 >