根据其项的后缀创建新列（数据框）

URLS \ 0 1 www.gene.eu 2 www.cittametropolitana.me.it 3 www.regione.basilicata.it 4 www.bbc.co.uk Paths 0 1 /news-room/q-a-detail/ 2 /emergenza-sanitari/ 3 /giunta/site/giunta/detail.jsp 4 /focus/

suffix=['.it','.uk','.eu'] # this should be used as set which includes all the suffix that I want to check country=['Italy','United Kingdom','Europe'] # values to assign based on the suffix zipped = list(zip(suffix, country)) # create a connection between suffix and country

country = {k.lower() : v for (k,v) in zipped} og = {k : v for (k,v) in suffix} country.update(og) # (1) df['value'] = df['URLS'].str.split(".", expand=True).stack().reset_index(1).query( "level_1 == level_1.max()" )[0].map(country) # (2) original_domain = {x: y for x, y in zipped} df['value'] = df['URLS'].apply(lambda sen : original_domain.get( sen[-1], 'Unknown') ) ) # (3) df['value']=df['URLS'].map(lambda x: x[-3:] in zipped) #(4) df['value'] = np.where(df['URLS'].str.endswith(suffix), pd.to_datetime(df['value'])) # it returns me errors and t needs another step to assign country

1条回答

网友

1楼 · 发布于 2024-04-18 11:23:20

如果我理解正确的话，你就明白了：

import pandas as pd


suffix = ['it', 'uk', 'eu']
country = ['Italy', 'United Kingdom', 'Europe']
mapping = dict(zip(suffix, country))
urls = ['www.gene.eu', 'www.cittametropolitana.me.it', 'www.regione.basilicata.it', 'www.bbc.co.uk']
paths = ['/news-room/q-a-detail/', '/emergenza-sanitari/', '/giunta/site/giunta/detail.jsp', '/focus/']
frame = pd.DataFrame(zip(urls, paths), columns=['urls', 'paths'])
for ext in mapping:
    frame.loc[frame['urls'].apply(lambda x: x.split('.')[-1]) == ext, 'Country'] = mapping[ext]
print(frame)

输出：

                           urls                           paths         Country
0                   www.gene.eu          /news-room/q-a-detail/          Europe
1  www.cittametropolitana.me.it            /emergenza-sanitari/           Italy
2     www.regione.basilicata.it  /giunta/site/giunta/detail.jsp           Italy
3                 www.bbc.co.uk                         /focus/  United Kingdom

请注意，为了使其正常工作，您需要事先添加要包含在映射中的所有扩展，并且数据必须是统一的（您必须确保每个url都有一个.并以映射中包含的扩展结束，否则您将获得不想要的nan值

相关问题更多 >

编程相关推荐

热门问题

热门文章