我有一个CSV,想用另一个CSV的值更新它。最有效的方法是什么?

2024-04-19 02:50:31 发布

您现在位置:Python中文网/ 问答频道 /正文

我有这个CSV:

      Name  Species    Country
0    Hobbes   Tiger       U.S.
1  SherKhan   Tiger      India
2   Rescuer   Mouse  Australia
3    Mickey   Mouse       U.S.

And I have a second CSV: 
   Continent     Countries Unnamed: 2 Unnamed: 3 Unnamed: 4
0  North America          U.S.     Mexico  Guatemala   Honduras
1           Asia         India      China      Nepal        NaN
2      Australia     Australia        NaN        NaN        NaN
3         Africa  South Africa   Botswana   Zimbabwe        NaN

I want to use the second CSV to update the first file so that the output is:
      Name  Species        Country
0    Hobbes   Tiger  North America
1  SherKhan   Tiger           Asia 
2   Rescuer   Mouse      Australia
3    Mickey   Mouse  North America

到目前为止,这是我得到的最接近的结果:

import pandas as pd

# Import my data. 
data = pd.read_csv('Continents.csv')
Animals = pd.read_csv('Animals.csv')
Animalsdf = pd.DataFrame(Animals)

# Transpose my data from horizontal to vertical. 
data1 = data.T

# Clean my data and update my header with the first column. 
data1.columns = data1.iloc[0]

# Drop now duplicated data. 
data1.drop(data1.index[[0]], inplace = True)
# Build the dictionary. 
data_dict = {col: list(data1[col]) for col in data1.columns}

# Update my csv. 
Animals['Country'] = Animals['Country'].map(data_dict)

print ('Animals')

这就产生了一个以列表作为其值的字典,因此我只需将NaN导出:

      Name  Species                     Country
0    Hobbes   Tiger                         NaN
1  SherKhan   Tiger                         NaN
2   Rescuer    Mole  [Australia, nan, nan, nan]
3    Mickey    Mole                         NaN

我试过从列表切换到元组,但这不起作用。我试过多种方法来查字典等。我只是没有主意。你知道吗

抱歉,如果代码是超级垃圾。我边走边学。我认为一个项目是学习一门新语言的最好方法。没想到会这么难。你知道吗

如有任何建议,将不胜感激。我需要能够使用代码,这样当我得到多个参考csv时,我就可以用新的键更新我的数据。希望这是清楚的。你知道吗

提前谢谢。你知道吗


Tags: csvthenamedatamynancountrypd
3条回答

你应该使用听写器和听写器。你可以通过下面的链接学习如何使用它们。你知道吗

https://docs.python.org/2/library/csv.html

一个直观的解决方案是使用字典映射。来自@WillMonge的数据。你知道吗

pd.DataFrame.itertuples通过生成namedtuple来工作,但是也可以使用数字索引器来引用它们。你知道吗

# create mapping dictionary
d = {}
for row in df.itertuples():
    d.update(dict.fromkeys(filter(None, row[2:]), row[1]))

# apply mapping dictionary
data['Continent'] = data['Country'].map(d)

print(data)

  Country  name Continent
0   China     2      Asia
1   China     5      Asia
2  Canada     9   America
3   Egypt     0    Africa
4  Mexico     3   America

这里是你的代码更新,我试图添加注释来解释

import pandas as pd

# Read data in (read_csv also returns a DataFrame directly)
data = pd.DataFrame({'name': [2, 5, 9, 0, 3], 'Country': ['China', 'China', 'Canada', 'Egypt', 'Mexico']})
df = pd.DataFrame({'Continent': ['Asia', 'America', 'Africa'],
                   'Country1': ['China', 'Mexico', 'Egypt'],
                   'Country2': ['Japan', 'Canada', None],
                   'Country3': ['Thailand', None, None ]})

# Unstack to get a row for each country (remove the continent rows)
premap_df = pd.DataFrame(df.unstack('Continent').drop('Continent')).dropna().reset_index()
premap_df.columns = ['_', 'continent_key', 'Country']

# Merge the continent back based on the continent_key (old row number)
map_df = pd.merge(premap_df, df[['Continent']], left_on='continent_key', right_index=True)[['Continent', 'Country']]

# Merge with the data now 
pd.merge(data, map_df, on='Country')

为了进一步参考,Wes McKinney的Python For Data Analysis(here是我在网上找到的pdf版本)是学习pandas的最好的书籍之一

相关问题 更多 >