python通过唯一的列/行标题组合多个CSV

2024-04-23 20:11:03 发布

您现在位置:Python中文网/ 问答频道 /正文

首先,我是Python的新手。我正在尝试将多个数据组合成一个CSV。以下是CSV格式

文件1.csv

Country of Residence,2014-04,2015-04
 NORTH AMERICA ,"5,514","6,160"
  Canada ,"2,417","2,864"
  U.S.A. ,"3,097","3,296"
 LATIN AMERICA & THE CARIBBEAN ,281,293
 WESTERN EUROPE ,"37,369","34,964"
  Austria ,893,666
  Belgium ,867,995

文件2.csv

^{pr2}$

在最后的csv中,我希望有一个唯一的标题列表为

Country of Residence,2014-04,2015-04,2014-05,2015-05,..2014-11,2014-11
NORTH AMERICA ,"5,514","6,160",NaN,Nan,...
Portugal, Nan,Nan,Nan,Nan,.....,211,261

另外,我希望国家列表是唯一的,这样我可以通过阅读csv列表来填充数字。在

在下面的代码中,我得到了唯一的列标题,但我不知道如何使Country列唯一,并根据Country和一年中的月份添加一个数字。。在

非常感谢任何帮助。在

for filename in glob.iglob(os.path.join('/Documents/stats/csv','*.csv')):
with open(filename,'rb') as f:
    csvIn = csv.reader(f)
    hdr = csvIn.next()
    hdr[0] = hdr[0].replace('\xef\xbb\xbf','')

    hdrList.append((len(hdr),hdr))
 hdrList.sort()

hdrs = []
template = []


for t in hdrList:
    for f in t[1]:
        print(f)
        if 
        if not (f in hdrs):
            hdrs.append(f)
            template.append('')

Tags: 文件ofcsvin列表forhdrnan
2条回答

如果你不在乎背后的逻辑,你可以用熊猫来做:

import pandas as pd
file_list = [file1, file2]
dfs = []
for file in file_list:
    dfs.append(pd.read_csv(filepath_or_buffer=file, sep=',', index_col=0))
result_df = pd.concat(dfs, axis=1)
result_df.index.name = 'Country of Residence'
result_df.to_csv('result.csv')

这段代码会让你走上正轨。注意:它是为Python3编写的。在

import glob
import os
import csv

class CountryData:
    """Data for one country for one period of residence."""
    def __init__(self, val1, val2):
        # XXX: What do these values represent?
        self.val1 = val1
        self.val2 = val2

class ResidenceData:
    """Data for one period of residence."""
    def __init__(self):
        self.start_date = ""
        self.end_date = ""
        self.countries = {}

residence_data_list = []
countries = set()
for filename in glob.iglob(os.path.join('/Documents/stats/csv','*.csv')):
    residence_data = ResidenceData()
    residence_data_list.append(residence_data)
    with open(filename,'r') as f:
        csvIn = csv.reader(f)
        for hdr in csvIn:
            hdr[0] = hdr[0].replace('\xef\xbb\xbf','')
            if hdr[0] == 'Country of Residence':
                residence_data.start_date = hdr[1]
                residence_data.end_date = hdr[2]
            else:
                country, val1, val2 = hdr
                country = country.strip()
                country_data = CountryData(val1, val2)
                residence_data.countries[country] = country_data
                countries.add(country)

print("Country of Residence", end="")
for data in residence_data_list:
    print(",", end="")
    print(",".join([data.start_date, data.end_date]), end="")
print()
for country in sorted(countries):
    print(country, end="")
    for data in residence_data_list:
        print(",", end="")
        if country in data.countries:
            country_data = data.countries[country]
            print(",".join([country_data.val1, country_data.val2]), end="")
        else:
            print("NaN,NaN", end="")
    print()

结果:

^{pr2}$

相关问题 更多 >