Python:fi的分句

2024-04-26 22:26:33 发布

您现在位置:Python中文网/ 问答频道 /正文

我有个档案数据.txt就像这样:

<<a
<<t This is a title 01
/t>>
<<c
This is a sentence. This is a sentence. This is a sentence. This is a sentence.
This is a sentence. This is a sentence. This is a sentence. This is a sentence.
/c>>
/a>>
<<a
<<t This is a title 02
/t>>
<<c
This is a sentence. This is a sentence. This is a sentence. This is a sentence.
This is a sentence. This is a sentence. This is a sentence. This is a sentence.
/c>>
/a>>

我想读一读文件,把每个句子分成一个列表,比如:

[[This is a title 01],[This is a sentence.],[This is a sentence.]...[This is a title 02],[This is a sentence.]...]

事先谢谢你的帮助。你知道吗


Tags: 文件数据txt列表titleis档案this
1条回答
网友
1楼 · 发布于 2024-04-26 22:26:33

你可以试试以下方法-

result = []
with open('data.txt', 'r') as f:
  for line in f:
    if "This is a title" in line:
      cleaned_line = line.lstrip('<<t').strip()
      result.append(cleaned_line)
    elif line.startswith("This is a sentence"):
      sentence_list = line.split('.')
      for _ in sentence_list:
        result.append(_)

这是怎么回事?
打开文件,逐行迭代。 提取标题。去掉<<t和空格。
要提取句子,只需在句点(.)处拆分行字符串。然后将所有内容添加到result列表中。
编辑:
注意:您最终会有一个字符串列表。由于您是Python新手,我将把它作为一个练习留给您,让您将字符串列表转换为列表列表列表。它应该非常简单。你知道吗

相关问题 更多 >