ValueError:无效的整数字面值,基数10:'3"\r

0 投票
3 回答
1543 浏览
提问于 2025-04-17 13:19

我的csv文件(test.csv)内容示例如下:注意:我的test.csv文件大约有60MB。

"Position","Value"
"2545600","19"
"2545601","19"
"2545602","19"
"2545603","19"
"2545604","20"
"2545605","20"
"2545606","21"
"2545607","22"
"2545608","21"
"2545609","20"
"2545610","21"
"2545611","18"
"2545612","19"
"2545613","21"
"2545614","21"
"2545615","21"
"2545616","21"
"2545617","22"
"2545618","25"
"2545619","25"

我的python代码(test.py)如下:

#!/usr/bin/python
import sys

txt = open(sys.argv[1], 'r')
out = open(sys.argv[2], 'w')
mil = float(sys.argv[3])

out.write('chr\tstart\tend\tfeature\t'+sys.argv[2]+'\n')

for line in txt:
    if 'Position' not in line:
        line = line.strip('",\n')
        line = line.split('","')

        line[1] = str(int(line[1])/mil)

        out.write('gi|255767013|ref|NC_000964.3|\t'+line[0]+'\t'+line[0]+'\t\t'+line[1]+'\n')

txt.close()
out.close()

我的命令行:

python test.py test.csv test.igv 5

当我运行这个命令时,出现了一个错误:

Traceback (most recent call last):
  File "test.py", line 15, in <module>
    line[1] = str(int(line[1])/mil)
ValueError: invalid literal for int() with base 10: '3"\r'

但是如果我创建一个新的空csv文件,比如说small.csv,并且从我的test.csv文件中复制/粘贴几行(像上面的示例那样),那么这个命令就能成功运行。

python test.py small.csv small.igv 5

输入small.csv:

"Position","Value"
"2545600","19"
"2545601","19"
"2545602","19"
"2545603","19"
"2545604","20"
"2545605","20"
"2545606","21"
"2545607","22"
"2545608","21"
"2545609","20"

输出small.igv:

chr start   end feature small.igv
gi|255767013|ref|NC_000964.3|   2545600 2545600     3.8
gi|255767013|ref|NC_000964.3|   2545601 2545601     3.8
gi|255767013|ref|NC_000964.3|   2545602 2545602     3.8
gi|255767013|ref|NC_000964.3|   2545603 2545603     3.8
gi|255767013|ref|NC_000964.3|   2545604 2545604     4.0
gi|255767013|ref|NC_000964.3|   2545605 2545605     4.0
gi|255767013|ref|NC_000964.3|   2545606 2545606     4.2
gi|255767013|ref|NC_000964.3|   2545607 2545607     4.4
gi|255767013|ref|NC_000964.3|   2545608 2545608     4.2
gi|255767013|ref|NC_000964.3|   2545609 2545609     4.0

这就是我想要的。那么问题来了,为什么我不能在更大的csv文件上做到这一点呢?

3 个回答

0

正如建议的那样,csv模块会更有帮助。

举个例子:

import csv
f = open("ex.csv")
for line in csv.reader(f):
    print line

还有数据是

"Position","Value"
"2545600","19"
"2545601","19"
"2545602","19"
"2545603","19"

这样会得到的结果是

['Position', 'Value']
['2545600', '19']
['2545601', '19']
['2545602', '19']
['2545603', '19']

这样看起来就更容易处理了。

另外,csv模块也可以用来写csv文件。

4

试试这个:

for line in ..... :
     line = line.strip()

这样可以去掉字符串末尾的换行符。

更好的办法是使用Python的csv模块,它可以帮你处理这些问题。

1

在这种情况下,使用 csv 模块会更好。每次从csv文件读取的一行数据都会以字符串列表的形式返回。这样就不用担心去掉空格的问题了,而且你还可以在 csv.reader 函数的参数中指定分隔符(不过在这里不需要)。

import csv
import sys

out = open(sys.argv[2], 'w')
mil = float(sys.argv[3])

out.write('chr\tstart\tend\tfeature\t'+sys.argv[2]+'\n')
with open(sys.argv[1], 'rb') as f:
    reader = csv.reader(f, delimiter=',')
    headers = reader.next()    # Consider headers separately
    for line in reader:
        line[1] = str(int(line[1])/mil)
        out.write('gi|255767013|ref|NC_000964.3|\t'+line[0]+'\t'+line[0]+'\t\t'+line[1]+'\n')
out.close()

运行 python test.py test.csv test.igv 5 && cat test.igv 应该能看到预期的输出结果。

撰写回答