<p>问题是,你重复672343*795516=534'859'613'988次,这太多了。你需要一个更聪明的解决方案。在</p>
<p>所以我们发现问题是我们看了太多的数据,我们需要改变这一点。一个方法就是试着变得聪明。也许创建一个字典,其中的键对应于<code>chr</code>,所以我们只需要检查这些条目。但是我们还没有处理<code>start</code>和{<cd3>}。也许也有一个聪明的方法</p>
<p>这看起来很像数据库。所以如果它是一个数据库,也许我们应该把它当作一个数据库。Python附带了sqlite3。在</p>
<p>这里有一个解决方案,但还有无数的其他可能性。在</p>
<pre><code>import sqlite3
import csv
# create an in-memory database
conn = sqlite3.connect(":memory:")
# create the tables
c = conn.cursor()
c.execute("""CREATE TABLE t1 (
chr TEXT,
type TEXT,
name TEXT,
start INTEGER,
end INTEGER
);""")
# if you only have a few columns, just name them all,
# if you have a lot, maybe just put everything in one
# column as a string
c.execute("""CREATE TABLE t2 (
chr TEXT,
num INTEGER,
col3,
col4
);""")
# create indices on the columns we use for selecting
c.execute("""CREATE INDEX i1 ON t1 (chr, start, end);""")
c.execute("""CREATE INDEX i2 ON t2 (chr, num);""")
# fill the tables
with open("comparison_file.csv", 'rb') as f:
reader = csv.reader(f)
# sqlite takes care of converting the number-strings to numbers
c.executemany("INSERT INTO t1 VALUES (?, ?, ?, ?, ?)", reader)
with open("input.csv", 'rb') as f:
reader = csv.reader(f)
# sqlite takes care of converting the number-strings to numbers
c.executemany("INSERT INTO t2 VALUES (?, ?, ?, ?)", reader)
# now let sqlite do its magic and select the correct lines
c.execute("""SELECT t2.*, t1.* FROM t1
JOIN t2 ON t1.chr == t2.chr
WHERE t2.num BETWEEN t1.start AND t1.end;""")
# write result to disk
with open("output.csv", "wb") as f:
writer = csv.writer(f)
for row in c:
writer.writerow(row)
</code></pre>
<h2>Python编码技巧</h2>
<p>下面是我如何编写您的原始代码。在</p>
^{pr2}$
<h3>备注1</h3>
<pre><code>line = line[0:len(line) - 1]
</code></pre>
<p>可以写成</p>
^{4}$
<h3>备注2</h3>
<p>而不是</p>
<pre><code>my_list = [1,2,3]
for i in xrange(len(my_list)):
# do something with my_list[i]
</code></pre>
<p>您应该:</p>
<pre><code>my_list = [1,2,3]
for item in my_list:
# do something with item
</code></pre>
<p>如果需要索引,请将其与<code>enumerate()</code>合并。在</p>