如何基于另一列从列中访问值?

2024-06-16 09:19:31 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个数据框架,其中有一列名为url

http://example.com/images/41456gn7L.jpg
http://example.com/images/31mndfg.jpg'
http://example.com/images/dsfsdf8587eh.jpg

我还有一列元数据,列表中有两个字典

[{'id': 0, 'imageUrl': 'http://example.com/images/41dY3ASVn7L.jpg' 'score': 54.09280014038086}, {'id': 0, 'imageUrl': 'http://example.com/images/41dY3ASVn7L.jpg', 'score': 54.09280014038086}]
[{'id': 0, 'imageUrl': 'http://example.com/images/31mnLrB5IHL.jpg' 'score': 99.902099609375}, {'id': 0, 'imageUrl': 'http://example.com/images/31mnLrB5IHL.jpg', 'score': 99.902099609375}]
[{'id': 0, 'imageUrl': 'http://example.com/images/4189TDx0e0L.jpg' 'score': 97.33160400390625}, {'id': 0, 'imageUrl': 'http://example.com/images/4189TDx0e0L.jpg', 'score': 97.33160400390625}]"}]

我有一个变量中包含文件名的列表

file_names = ["41456gn7L.jpg","31mndfg.jpg","dsfsdf8587eh.jpg"] 

如果url中的文件名在文件名列表中,我需要在元数据列中获取分数的值(来自列表中的第一个dict)

我怎样才能得到它

df = pd.DataFrame({'num': {0: 3234, 1: 3433, 2: 4443},
 'URL': {0: 'http://example.com/images/41456gn7L.jpg',
  1: 'http://example.com/images/31mndfg.jpg',
  2: 'http://example.com/images/dsfsdf8587eh.jpg'},
 'meta_data': {0: "[{'id': 0, 'imageUrl': 'http://example.com/images/41dY3ASVn7L.jpg' 'score': 54.09280014038086}, {'id': 0, 'imageUrl': 'http://example.com/images/41dY3ASVn7L.jpg', 'score': 54.09280014038086}]",
  1: "[{'id': 0, 'imageUrl': 'http://example.com/images/31mnLrB5IHL.jpg' 'score': 99.902099609375}, {'id': 0, 'imageUrl': 'http://example.com/images/31mnLrB5IHL.jpg', 'score': 99.902099609375}]",
  2: "[{'id': 0, 'imageUrl': 'http://example.com/images/4189TDx0e0L.jpg' 'score': 97.33160400390625}, {'id': 0, 'imageUrl': 'http://example.com/images/4189TDx0e0L.jpg', 'score': 97.33160400390625}]"}})

Tags: 数据com框架idhttpurl列表字典
3条回答

In [2]: df = pd.DataFrame({'num': {0: 3234, 1: 3433, 2: 4443},
   ...:  'URL': {0: 'http://example.com/images/41456gn7L.jpg',
   ...:   1: 'http://example.com/images/31mndfg.jpg',
   ...:   2: 'http://example.com/images/dsfsdf8587eh.jpg'},
   ...:  'meta_data': {0: "[{'id': 0, 'imageUrl': 'http://example.com/images/41dY3ASVn7L.jpg' 'score': 54.09280014038086}, {'id': 0, 'imageUrl': 'http://examp
   ...: le.com/images/41dY3ASVn7L.jpg', 'score': 54.09280014038086}]",
   ...:   1: "[{'id': 0, 'imageUrl': 'http://example.com/images/31mnLrB5IHL.jpg' 'score': 99.902099609375}, {'id': 0, 'imageUrl': 'http://example.com/images/3
   ...: 1mnLrB5IHL.jpg', 'score': 99.902099609375}]",
   ...:   2: "[{'id': 0, 'imageUrl': 'http://example.com/images/4189TDx0e0L.jpg' 'score': 97.33160400390625}, {'id': 0, 'imageUrl': 'http://example.com/images
   ...: /4189TDx0e0L.jpg', 'score': 97.33160400390625}]"}})
   ...: file_names = ["41456gn7L.jpg","31mndfg.jpg","dsfsdf8587eh.jpg"]
   ...: df
Out[2]: 
    num                                         URL                                          meta_data
0  3234     http://example.com/images/41456gn7L.jpg  [{'id': 0, 'imageUrl': 'http://example.com/ima...
1  3433       http://example.com/images/31mndfg.jpg  [{'id': 0, 'imageUrl': 'http://example.com/ima...
2  4443  http://example.com/images/dsfsdf8587eh.jpg  [{'id': 0, 'imageUrl': 'http://example.com/ima...

In [3]: df['Score'] = df.loc[df.URL.apply(lambda x:x.split("/")[-1]).isin(file_names), :].meta_data.apply(lambda x:x.split(",")[-1]).str.extract(r"([\d]*[.][\
   ...: d]+)")

In [4]: df
Out[4]: 
    num                                         URL                                          meta_data              Score
0  3234     http://example.com/images/41456gn7L.jpg  [{'id': 0, 'imageUrl': 'http://example.com/ima...  54.09280014038086
1  3433       http://example.com/images/31mndfg.jpg  [{'id': 0, 'imageUrl': 'http://example.com/ima...    99.902099609375
2  4443  http://example.com/images/dsfsdf8587eh.jpg  [{'id': 0, 'imageUrl': 'http://example.com/ima...  97.33160400390625

从您提供的示例df来看,元数据中的值看起来像字符串。假设它们是你在问题中提到的词典列表

file_names = ["41456gn7L.jpg","31mndfg.jpg","dsfsdf8587eh.jpg"] 
df = pd.DataFrame({'url':['http://example.com/images/41456gn7L.jpg','http://example.com/images/31mndfg.jpg','http://example.com/images/dsfsdf8587eh.jpg'],
                'meta_data':[[{'id': 0, 'imageUrl': 'http://example.com/images/41dY3ASVn7L.jpg', 'score': 54.09280014038086}, {'id': 0, 'imageUrl': 'http://example.com/images/41dY3ASVn7L.jpg', 'score': 54.09280014038086}],[{'id': 0, 'imageUrl': 'http://example.com/images/31mnLrB5IHL.jpg', 'score': 99.902099609375}, {'id': 0, 'imageUrl': 'http://example.com/images/31mnLrB5IHL.jpg', 'score': 99.902099609375}],[{'id': 0, 'imageUrl': 'http://example.com/images/4189TDx0e0L.jpg' ,'score': 97.33160400390625}, {'id': 0, 'imageUrl': 'http://example.com/images/4189TDx0e0L.jpg', 'score': 97.33160400390625}]]})

您可以从列表的first element中选择文件名出现在列表文件名中的片段,并访问与键“score”关联的值

df['score'] = df.loc[df['url'].str.rsplit('/').str[-1].isin(file_names), 'meta_data'].apply(lambda x: x[0]['score'])

    url                                         meta_data         score
0   http://example.com/images/41456gn7L.jpg     [{'id': 0, 'imageUrl': 'http://example.com/ima...   54.092800
1   http://example.com/images/31mndfg.jpg       [{'id': 0, 'imageUrl': 'http://example.com/ima...   99.902100
2   http://example.com/images/dsfsdf8587eh.jpg  [{'id': 0, 'imageUrl': 'http://example.com/ima...   97.331604

希望这有助于:

metadata = [
[{'id': 0, 'imageUrl': 'http://example.com/images/41dY3ASVn7L.jpg', 'score': 54.09280014038086}, 
 {'id': 0, 'imageUrl': 'http://example.com/images/41dY3ASVn7L.jpg', 'score': 54.09280014038086}],
[{'id': 0, 'imageUrl': 'http://example.com/images/31mnLrB5IHL.jpg' ,'score': 99.902099609375}, 
 {'id': 0, 'imageUrl': 'http://example.com/images/31mnLrB5IHL.jpg', 'score': 99.902099609375}],
[{'id': 0, 'imageUrl': 'http://example.com/images/4189TDx0e0L.jpg' ,'score': 97.33160400390625}, 
 {'id': 0, 'imageUrl': 'http://example.com/images/4189TDx0e0L.jpg', 'score': 97.33160400390625}]
]
# Extract metadata into dataframe
df = pd.DataFrame([a[0] for a in metadata])

# List of filenames NOTE: added last file so match is found
fnList = ["41456gn7L.jpg","31mndfg.jpg","dsfsdf8587eh.jpg","31mnLrB5IHL.jpg"]

# show DF
print(df)
print("\n  \n Matching Filename\n")

# Generate list of matching scores
for f in fnList:
    try: v = df[df.imageUrl.str.contains(f)]['score'].iloc[0]
    except: v = None
    print(f, v)

相关问题 更多 >