<p>如果我是你,我会在一个新的文本文件中重写整个数据,只需对源文本文件进行简单的迭代,然后将结果文件加载到Pandas中,无需<code>re</code>:</p>
<pre><code>with open('source.txt') as fin, open('target.txt', 'w') as fout:
lc = 0
for line in fin:
lc += line.count(';')
if lc < 3:
fout.write(line[:-1])
else:
fout.write(line)
lc = 0
</code></pre>
<p>结果:</p>
<pre><code># New York City; Iron Man; no superpowers;
# Metropolis; Superman; superpowers;
# New York City;Spider-Man;superpowers;
# Gotham; Batman; no superpowers;
# New York City; Doctor Strange; superpowers;
</code></pre>
<p>解读熊猫:</p>
<pre><code>pd.read_csv('target.txt', header=None, sep=';', usecols=range(3))
# 0 1 2
# 0 New York City Iron Man no superpowers
# 1 Metropolis Superman superpowers
# 2 New York City Spider-Man superpowers
# 3 Gotham Batman no superpowers
# 4 New York City Doctor Strange superpowers
</code></pre>
<hr/>
<p>注意:<code>usecols</code>是唯一需要的,因为后面有分号。通过使用导入可以避免这种情况</p>
<pre><code>with open('source.txt') as fin, open('target.txt', 'w') as fout:
lc = 0
for line in fin:
lc += line.count(';')
if lc < 3:
fout.write(line.strip())
else:
fout.write(line.strip()[:-1] + '\n')
lc = 0
</code></pre>
<p>解读熊猫:</p>
<pre><code>pd.read_csv('target.txt', header=None, sep=';')
# 0 1 2
# 0 New York City Iron Man no superpowers
# 1 Metropolis Superman superpowers
# 2 New York City Spider-Man superpowers
# 3 Gotham Batman no superpowers
# 4 New York City Doctor Strange superpowers
</code></pre>