ValueError:无效的整数字面值,基数10:'3"\r
我的csv文件(test.csv)内容示例如下:注意:我的test.csv文件大约有60MB。
"Position","Value"
"2545600","19"
"2545601","19"
"2545602","19"
"2545603","19"
"2545604","20"
"2545605","20"
"2545606","21"
"2545607","22"
"2545608","21"
"2545609","20"
"2545610","21"
"2545611","18"
"2545612","19"
"2545613","21"
"2545614","21"
"2545615","21"
"2545616","21"
"2545617","22"
"2545618","25"
"2545619","25"
我的python代码(test.py)如下:
#!/usr/bin/python
import sys
txt = open(sys.argv[1], 'r')
out = open(sys.argv[2], 'w')
mil = float(sys.argv[3])
out.write('chr\tstart\tend\tfeature\t'+sys.argv[2]+'\n')
for line in txt:
if 'Position' not in line:
line = line.strip('",\n')
line = line.split('","')
line[1] = str(int(line[1])/mil)
out.write('gi|255767013|ref|NC_000964.3|\t'+line[0]+'\t'+line[0]+'\t\t'+line[1]+'\n')
txt.close()
out.close()
我的命令行:
python test.py test.csv test.igv 5
当我运行这个命令时,出现了一个错误:
Traceback (most recent call last):
File "test.py", line 15, in <module>
line[1] = str(int(line[1])/mil)
ValueError: invalid literal for int() with base 10: '3"\r'
但是如果我创建一个新的空csv文件,比如说small.csv,并且从我的test.csv文件中复制/粘贴几行(像上面的示例那样),那么这个命令就能成功运行。
python test.py small.csv small.igv 5
输入small.csv:
"Position","Value"
"2545600","19"
"2545601","19"
"2545602","19"
"2545603","19"
"2545604","20"
"2545605","20"
"2545606","21"
"2545607","22"
"2545608","21"
"2545609","20"
输出small.igv:
chr start end feature small.igv
gi|255767013|ref|NC_000964.3| 2545600 2545600 3.8
gi|255767013|ref|NC_000964.3| 2545601 2545601 3.8
gi|255767013|ref|NC_000964.3| 2545602 2545602 3.8
gi|255767013|ref|NC_000964.3| 2545603 2545603 3.8
gi|255767013|ref|NC_000964.3| 2545604 2545604 4.0
gi|255767013|ref|NC_000964.3| 2545605 2545605 4.0
gi|255767013|ref|NC_000964.3| 2545606 2545606 4.2
gi|255767013|ref|NC_000964.3| 2545607 2545607 4.4
gi|255767013|ref|NC_000964.3| 2545608 2545608 4.2
gi|255767013|ref|NC_000964.3| 2545609 2545609 4.0
这就是我想要的。那么问题来了,为什么我不能在更大的csv文件上做到这一点呢?
3 个回答
0
正如建议的那样,csv模块会更有帮助。
举个例子:
import csv
f = open("ex.csv")
for line in csv.reader(f):
print line
还有数据是
"Position","Value"
"2545600","19"
"2545601","19"
"2545602","19"
"2545603","19"
这样会得到的结果是
['Position', 'Value']
['2545600', '19']
['2545601', '19']
['2545602', '19']
['2545603', '19']
这样看起来就更容易处理了。
另外,csv模块也可以用来写csv文件。
4
试试这个:
for line in ..... :
line = line.strip()
这样可以去掉字符串末尾的换行符。
更好的办法是使用Python的csv模块,它可以帮你处理这些问题。
1
在这种情况下,使用 csv 模块会更好。每次从csv文件读取的一行数据都会以字符串列表的形式返回。这样就不用担心去掉空格的问题了,而且你还可以在 csv.reader
函数的参数中指定分隔符(不过在这里不需要)。
import csv
import sys
out = open(sys.argv[2], 'w')
mil = float(sys.argv[3])
out.write('chr\tstart\tend\tfeature\t'+sys.argv[2]+'\n')
with open(sys.argv[1], 'rb') as f:
reader = csv.reader(f, delimiter=',')
headers = reader.next() # Consider headers separately
for line in reader:
line[1] = str(int(line[1])/mil)
out.write('gi|255767013|ref|NC_000964.3|\t'+line[0]+'\t'+line[0]+'\t\t'+line[1]+'\n')
out.close()
运行 python test.py test.csv test.igv 5 && cat test.igv
应该能看到预期的输出结果。