擅长:python、mysql、java
<p>当您得到有已知故障的坏数据集时,一个好的解决方案是清除数据并将好的数据写回磁盘。该代码只需在下载后运行,而您的其他代码不会因修复程序的错误而负担过重。这最好使用csv模块来完成,我们可以逐行修复</p>
<p>kaggle_campaign_data_fixer.py</p>
<pre><code>import sys
import csv
from pathlib import Path
filename = Path(sys.argv[1])
newname = filename.parent/f"{filename.stem}-fixed{filename.suffix}"
BADCOLS = ['', '']
with open(filename, newline='') as infile, open(newname, 'w', newline='') as outfile:
writer = csv.writer(outfile)
for row in csv.reader(infile):
if row[-2:] == BADCOLS:
row[3:3] = BADCOLS
del row[-2:]
writer.writerow(row)
# test it
import pandas as pd
df = pd.read_csv(filename, header=None)
print(df)
print("""
============== FIXED ==================
""")
df = pd.read_csv(newname, header=None)
print(df)
</code></pre>