<p>让我们尝试以<code>pandas</code>的方式解决问题,首先将<code>csv</code>文件读入<code>pandas</code>数据帧</p>
<pre><code>f1 = pd.read_csv('file1.csv', header=None)
f2 = pd.read_csv('file2.csv')
>>> f1
0 1 2 3 4 5 6
0 chr1 3073253 3074322 gene_id ENSMUSG00000102693.1 gene_type TEC
1 chr1 3074253 3075322 gene_id ENSMUSG00000102693.1 transcript_id ENSMUST00000193812.1
2 chr1 3077253 3078322 gene_id ENSMUSG00000102693.1 transcript_id ENSMUST00000193812.1
3 chr1 3102916 3103025 gene_id ENSMUSG00000064842.1 gene_type snRNA
4 chr1 3105016 3106025 gene_id ENSMUSG00000064842.1 transcript_id ENSMUST00000082908.1
>>> f2
chr name start end
0 chr1 linc1320 3073300 3074300
1 chr3 linc2245 3077270 3078250
2 chr1 linc8956 4410501 4406025
</code></pre>
<p>现在我们可以<code>merge</code>和<code>filter</code>满足给定区间包含条件的行,然后我们可以<code>join</code>使用文件<code>f1</code>过滤行</p>
<pre><code>m = f1.reset_index()\
.merge(f2, left_on=0, right_on='chr')\
.where(lambda x: x[1].le(x['start']) & x[2].ge(x['end']))\
.set_index('index')[['name', 'start', 'end']]
f3 = f1.join(m)
</code></pre>
<hr/>
<pre><code>>>> f3
0 1 2 3 4 5 6 name start end
0 chr1 3073253 3074322 gene_id ENSMUSG00000102693.1 gene_type TEC linc1320 3073300.0 3074300.0
1 chr1 3074253 3075322 gene_id ENSMUSG00000102693.1 transcript_id ENSMUST00000193812.1 NaN NaN NaN
2 chr1 3077253 3078322 gene_id ENSMUSG00000102693.1 transcript_id ENSMUST00000193812.1 NaN NaN NaN
3 chr1 3102916 3103025 gene_id ENSMUSG00000064842.1 gene_type snRNA NaN NaN NaN
4 chr1 3105016 3106025 gene_id ENSMUSG00000064842.1 transcript_id ENSMUST00000082908.1 NaN NaN NaN
</code></pre>
<p>PS:您还可以使用<code>f3.to_csv('file3.csv')</code>将生成的数据帧<code>f3</code>保存到csv文件</p>