我有一个CSV，想用另一个CSV的值更新它。最有效的方法是什么？

Name Species Country 0 Hobbes Tiger U.S. 1 SherKhan Tiger India 2 Rescuer Mouse Australia 3 Mickey Mouse U.S. And I have a second CSV: Continent Countries Unnamed: 2 Unnamed: 3 Unnamed: 4 0 North America U.S. Mexico Guatemala Honduras 1 Asia India China Nepal NaN 2 Australia Australia NaN NaN NaN 3 Africa South Africa Botswana Zimbabwe NaN I want to use the second CSV to update the first file so that the output is: Name Species Country 0 Hobbes Tiger North America 1 SherKhan Tiger Asia 2 Rescuer Mouse Australia 3 Mickey Mouse North America

import pandas as pd # Import my data. data = pd.read_csv('Continents.csv') Animals = pd.read_csv('Animals.csv') Animalsdf = pd.DataFrame(Animals) # Transpose my data from horizontal to vertical. data1 = data.T # Clean my data and update my header with the first column. data1.columns = data1.iloc[0] # Drop now duplicated data. data1.drop(data1.index[[0]], inplace = True) # Build the dictionary. data_dict = {col: list(data1[col]) for col in data1.columns} # Update my csv. Animals['Country'] = Animals['Country'].map(data_dict) print ('Animals')

Name Species Country 0 Hobbes Tiger NaN 1 SherKhan Tiger NaN 2 Rescuer Mole [Australia, nan, nan, nan] 3 Mickey Mole NaN

3条回答

网友

1楼 · 编辑于 2024-04-19 02:50:31

你应该使用听写器和听写器。你可以通过下面的链接学习如何使用它们。你知道吗

https://docs.python.org/2/library/csv.html

网友

2楼 · 编辑于 2024-04-19 02:50:31

一个直观的解决方案是使用字典映射。来自@WillMonge的数据。你知道吗

pd.DataFrame.itertuples通过生成namedtuple来工作，但是也可以使用数字索引器来引用它们。你知道吗

# create mapping dictionary
d = {}
for row in df.itertuples():
    d.update(dict.fromkeys(filter(None, row[2:]), row[1]))

# apply mapping dictionary
data['Continent'] = data['Country'].map(d)

print(data)

  Country  name Continent
0   China     2      Asia
1   China     5      Asia
2  Canada     9   America
3   Egypt     0    Africa
4  Mexico     3   America

网友

3楼 · 编辑于 2024-04-19 02:50:31

这里是你的代码更新，我试图添加注释来解释

import pandas as pd

# Read data in (read_csv also returns a DataFrame directly)
data = pd.DataFrame({'name': [2, 5, 9, 0, 3], 'Country': ['China', 'China', 'Canada', 'Egypt', 'Mexico']})
df = pd.DataFrame({'Continent': ['Asia', 'America', 'Africa'],
                   'Country1': ['China', 'Mexico', 'Egypt'],
                   'Country2': ['Japan', 'Canada', None],
                   'Country3': ['Thailand', None, None ]})

# Unstack to get a row for each country (remove the continent rows)
premap_df = pd.DataFrame(df.unstack('Continent').drop('Continent')).dropna().reset_index()
premap_df.columns = ['_', 'continent_key', 'Country']

# Merge the continent back based on the continent_key (old row number)
map_df = pd.merge(premap_df, df[['Continent']], left_on='continent_key', right_index=True)[['Continent', 'Country']]

# Merge with the data now 
pd.merge(data, map_df, on='Country')

为了进一步参考，Wes McKinney的Python For Data Analysis（here是我在网上找到的pdf版本）是学习pandas的最好的书籍之一

相关问题更多 >

编程相关推荐

热门问题

热门文章