
2024-04-19 02:50:31 发布

您现在位置:Python中文网/ 问答频道 /正文


      Name  Species    Country
0    Hobbes   Tiger       U.S.
1  SherKhan   Tiger      India
2   Rescuer   Mouse  Australia
3    Mickey   Mouse       U.S.

And I have a second CSV: 
   Continent     Countries Unnamed: 2 Unnamed: 3 Unnamed: 4
0  North America          U.S.     Mexico  Guatemala   Honduras
1           Asia         India      China      Nepal        NaN
2      Australia     Australia        NaN        NaN        NaN
3         Africa  South Africa   Botswana   Zimbabwe        NaN

I want to use the second CSV to update the first file so that the output is:
      Name  Species        Country
0    Hobbes   Tiger  North America
1  SherKhan   Tiger           Asia 
2   Rescuer   Mouse      Australia
3    Mickey   Mouse  North America


import pandas as pd

# Import my data. 
data = pd.read_csv('Continents.csv')
Animals = pd.read_csv('Animals.csv')
Animalsdf = pd.DataFrame(Animals)

# Transpose my data from horizontal to vertical. 
data1 = data.T

# Clean my data and update my header with the first column. 
data1.columns = data1.iloc[0]

# Drop now duplicated data. 
data1.drop(data1.index[[0]], inplace = True)
# Build the dictionary. 
data_dict = {col: list(data1[col]) for col in data1.columns}

# Update my csv. 
Animals['Country'] = Animals['Country'].map(data_dict)

print ('Animals')


      Name  Species                     Country
0    Hobbes   Tiger                         NaN
1  SherKhan   Tiger                         NaN
2   Rescuer    Mole  [Australia, nan, nan, nan]
3    Mickey    Mole                         NaN





Tags: csvthenamedatamynancountrypd





# create mapping dictionary
d = {}
for row in df.itertuples():
    d.update(dict.fromkeys(filter(None, row[2:]), row[1]))

# apply mapping dictionary
data['Continent'] = data['Country'].map(d)


  Country  name Continent
0   China     2      Asia
1   China     5      Asia
2  Canada     9   America
3   Egypt     0    Africa
4  Mexico     3   America


import pandas as pd

# Read data in (read_csv also returns a DataFrame directly)
data = pd.DataFrame({'name': [2, 5, 9, 0, 3], 'Country': ['China', 'China', 'Canada', 'Egypt', 'Mexico']})
df = pd.DataFrame({'Continent': ['Asia', 'America', 'Africa'],
                   'Country1': ['China', 'Mexico', 'Egypt'],
                   'Country2': ['Japan', 'Canada', None],
                   'Country3': ['Thailand', None, None ]})

# Unstack to get a row for each country (remove the continent rows)
premap_df = pd.DataFrame(df.unstack('Continent').drop('Continent')).dropna().reset_index()
premap_df.columns = ['_', 'continent_key', 'Country']

# Merge the continent back based on the continent_key (old row number)
map_df = pd.merge(premap_df, df[['Continent']], left_on='continent_key', right_index=True)[['Continent', 'Country']]

# Merge with the data now 
pd.merge(data, map_df, on='Country')

为了进一步参考,Wes McKinney的Python For Data Analysis(here是我在网上找到的pdf版本)是学习pandas的最好的书籍之一

相关问题 更多 >