删除python CSV modu中的行和列

2024-04-19 14:30:09 发布

您现在位置:Python中文网/ 问答频道 /正文

我保证在我写这篇文章之前,我已经搜索并阅读了谷歌的几页。我发誓,我已经尽职尽责了。在

我试图用python打开一个CSV文件,读取该文件,对其进行更改,然后写出一个新文件。在

我已经做到了:

import csv
def water_data ():
    with open('aquastat.csv', 'r') as csv_file:
        csv_reader = csv.reader(csv_file)
        final_file_name = "final_water.data.csv"
        final_file = open(final_file_name,'w')
        csv_writer = csv.writer(final_file,delimiter="\t")
        for row in csv_reader:
            csv_writer.writerow(row)

但我正努力向前迈进。我想删除某些列,但我无法理解python如何知道行和列之间的区别。例如,列是Area, Area ID, Year, Value,等等。我只想要Area, Year, Value。我试过了

^{pr2}$

但我一直得到以下错误:索引器错误:列表索引超出范围

[我也想用*替换空白单元格,但列的事情是优先的]

注意我不能用熊猫

如果可能的话,如果有人不只是告诉我代码,而是向我解释,这样我就可以自己进一步弄清楚了。在

TLDR:如何从CVS文件中删除空行并只将某些列写入新文件?

输入:

"Area","Area Id","Variable Name","Variable Id","Year","Value","Symbol","Md" 
"Afghanistan",2,"Total area of the country",4100,1977,65286.0,"E","",""
"Afghanistan",2,"Total area of the country",4100,1982,65286.0,"E","","" 
"Afghanistan",2,"Total area of the country",4100,1987,65286.0,"E","",""
"Afghanistan",2,"Total area of the country",4100,1992,65286.0,"E","","" 
"Afghanistan",2,"Total area of the country",4100,1997,65286.0,"E","",""
"Afghanistan",2,"Total area of the country",4100,2002,65286.0,"E","",""

Tags: 文件ofcsvthevalueareayearcountry
3条回答

此行不会IndexError,并将忽略不存在的值写入该行:

final_file.writerow((row[i] for i in (0,2,5) if i<len(row)))

这一行不会IndexError,它将写一行用一个星号替换空值:

final_file.writerow((row[i] if i<len(row) else "*" for i in (0,2,5)))

此行也不会IndexError,但不会写入该行:

if len(row)>5: final_file.writerow((row[i] for i in (0,2,5)))

此行也不会IndexError,但不会写入任何行:

pass

您可以使用^{} and ^{}有选择地修改和写入特定的列,使用它们的头/列名称。在

我将使用^{}来模拟这些文件

s = '''"Area","Area Id","Variable Name","Variable Id","Year","Value","Symbol","Md" 
"Afghanistan",2,"Total area of the country",4100,1977,65286.0,"E","",""
"Afghanistan",2,"Total area of the country",4100,1982,65286.0,"E","","" 
"Afghanistan",2,"Total area of the country",4100,1987,65286.0,"E","",""
"Afghanistan",2,"Total area of the country",4100,1992,65286.0,"E","","" 
"Afghanistan",2,"Total area of the country",4100,1997,65286.0,"E","",""
"Afghanistan",2,"Total area of the country",4100,2002,65286.0,"E","",""'''

f = io.StringIO(s)
g = io.StringIO()

reader = csv.DictReader(f)
writer = csv.DictWriter(g, fieldnames=["Area","Variable Id","Value"], extrasaction='ignore')

for row in reader:
    #process row values?
    row['Value'] = float(row['Value']) / 1000
    writer.writerow(row)

请注意,DictWriterextrasaction参数需要设置为'ignore',因为在原始文件中有额外的键/字段。在

如果csv文件没有标题行,则必须指定DictWriter的字段名。在


^{pr2}$

我尽量给你一个比你目前所做的更接近的答案。在

原型:

import csv

with open('aquastat.csv', 'r') as csv_file:
  csv_reader = csv.reader(csv_file)
  final_file_name = "final_water.data.csv"
  final_file = open(final_file_name,'w')
  csv_writer = csv.writer(final_file,delimiter="\t")
  for row in csv_reader:
    if len(row) >= 6:
        row = [row[0], row[4], row[5]]
        csv_writer.writerow(row)
  final_file.close()

解释:

  • 在输出csv文件中输出行的csv_writer.writerow(row)之前。我添加了row = [row[0], row[4], row[5]]行,其中我用一个只包含3个单元格的数组覆盖数组row的内容,这些单元格分别取自AreaYearValue
  • 在此基础上,我添加了if条件if len(row) >= 6:,以检查行中是否至少有足够的元素来提取Value之前的列。在

输入:

^{pr2}$

输出:

Area    Year    Value
Afghanistan     1977    65286.0
Afghanistan     1982    65286.0
Afghanistan     1987    65286.0
Afghanistan     1992    65286.0
Afghanistan     1997    65286.0
Afghanistan     2002    65286.0

相关问题 更多 >