如何在Python中根据空白分隔符将文本文件分割为多个列表?

2024-03-29 12:58:22 发布

您现在位置:Python中文网/ 问答频道 /正文

在python中创建一个新的单词列表,并在其中创建一个新的单词列表。我的意思是列表应该在文本文件中的每一个双空格中创建存在项。在

这是我的职责:

def tokenize(document):
    file = open("document.txt","r+").read()
    print re.findall(r'\w+', file)

输入文本文件包含如下字符串:

^{pr2}$

注意:典狱长后面有两个空格?在他之前

我的函数给我这样的输出

['what','s','did','the','little','boy','tell','the','game','warden','His','dad','was','in','the','kitchen','poaching','eggs']

期望输出:

[['what','s','did','the','little','boy','tell','the','game','warden'],
['His','dad','was','in','the','kitchen','poaching','eggs']]

Tags: thegame列表单词documentwhatfileboy
3条回答

首先split双空格上的整个文本然后将每个项目传递给regex,如下所示:

>>> file = "What's did the little boy tell the game warden?  His dad was in the kitchen poaching eggs!"
>>> file = text.split('  ')
>>> file
["What's did the little boy tell the game warden?", 'His dad was in the kitchen poaching eggs!']
>>> res = []
>>> for sen in file:
...    res.append(re.findall(r'\w+', sen))
... 
>>> res
[['What', 's', 'did', 'the', 'little', 'boy', 'tell', 'the', 'game', 'warden'], ['His', 'dad', 'was', 'in', 'the', 'kitchen', 'poaching', 'eggs']]

以下是合理的all RE方法:

def tokenize(document):
    with open("document.txt") as f:
        text = f.read()
    blocks = re.split(r'\s\s+', text)
    return [re.findall(r'\w+', b) for b in blocks]

内置split函数允许在多个空间上拆分。在

这个:

a = "hello world.  How are you"
b = a.split('  ')
c = [ x.split(' ') for x in b ]

产量:

^{pr2}$

如果还想删除标点符号,请将正则表达式应用于“b”中的元素或第三个语句中的“x”中。在

相关问题 更多 >