回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>我想用python读取<code>.dat</code>文件,我尝试了不同的读取方法,最后我得到了以下代码:</p>
<pre><code>datContent = open("..\\data\\train.dat.abs", 'r')
MyList=[]
for line in datContent:
print(line)
</code></pre>
<p>将打开此表单中的内容:</p>
<pre><code>1 Should O
2 students O
3 be O
4 taught O
5 to O
6 compete O
7 or O
8 to O
9 cooperate O
10 ? O
------------------> THIS SHOWS, STARTING OF THE NEXT SENTENCES
1 It O
2 is O
3 always O
4 said O
5 that O
6 competition O
7 can O
8 effectively O
9 promote O
10 the O
11 development O
12 of O
13 economy O
14 . O
</code></pre>
<p>但是我想提取第一列和第二列作为元组列表:</p>
<pre><code>[(Should, O), (students,O), (be,O), (taught O), (to,O), (compete,O), (or,O), (to,O), (cooperate,O), (? O)]
</code></pre>
<p>每个句子(句子以原始格式用空格签名)是数据帧的一行。我试过分开。
我已通过以下方式完成此项工作:</p>
<pre><code>datContent = open("..\\data\\train.dat.abs", 'r', encoding='utf-8' )
MyList=[]
for line in datContent:
a=line.split()
print(a)
</code></pre>
<p>结果是:</p>
<pre><code>['1', 'Should', 'O']
['2', 'students', 'O']
['3', 'be', 'O']
['4', 'taught', 'O']
['5', 'to', 'O']
['6', 'compete', 'O']
['7', 'or', 'O']
['8', 'to', 'O']
['9', 'cooperate', 'O']
['10', '?', 'O']
[]
['1', 'It', 'O']
['2', 'is', 'O']
['3', 'always', 'O']
['4', 'said', 'O']
['5', 'that', 'O']
['6', 'competition', 'O']
['7', 'can', 'O']
['8', 'effectively', 'O']
['9', 'promote', 'O']
['10', 'the', 'O']
['11', 'development', 'O']
['12', 'of', 'O']
['13', 'economy', 'O']
['14', '.', 'O']
</code></pre>
<p>正如我告诉你的,我想保存:</p>
<pre><code>[(Should, O), (students,O), (be,O), (taught O), (to,O), (compete,O), (or,O), (to,O), (cooperate,O), (? O)]
</code></pre>
<p>作为一行数据帧(基本上是上面每个列表的第2、3项)和您看到的<code>[]</code>分隔发送的</p>
<p>df</p>
<pre><code>row 1= [(Should, O), (students,O), (be,O), (taught O), (to,O), (compete,O), (or,O), (to,O), (cooperate,O), (? O)]
row 2= ...
</code></pre>
<p>等等</p>