在Python中更快地读取文本文件

2024-04-20 04:46:05 发布

男 | 程序猿一只，喜欢编程写python代码。

我在一个文件夹中有多个文本文件。文本文件总数为21941。我的代码对于少量的文本文件工作得很好，但是当我运行5000个文本文件时，它会陷入阅读中。当我运行完整数据的代码时，读取数据需要3个小时，但仍然无法完成所有数据的读取。请帮助我如何改进我的代码，或者如何使用GPU或多处理来完成这个任务。你知道吗

这段代码读取一个文件并返回一个单词列表。你知道吗

def wordList(doc):
    """
    1: Remove Punctuation
    2: Remove Stop Words
    3: return 
    """
    file = open("C:\\Users\\Zed\\PycharmProjects\\ACL txt\\"+doc, 'r', encoding="utf8", errors='ignore')
    text = file.read().strip()
    file.close()
    nopunc=[char for char in text if char not in string.punctuation]
    nopunc=''.join(nopunc)
    return [word for word in nopunc.split() if word.lower() not in stopwords.words('english')]

这段代码从文件夹中读取文件名

file_names=[]
for file in Path("ACL txt").rglob("*.txt"):
file_names.append(file.name)

这段代码构成了一个包含所有文档的字典。文件名作为键，其内容作为列表。你知道吗

documents = {}
for i in file_names[:5000]:
documents[i]=wordList(i)

这是数据集的link

我的系统规格是I7四核16gb内存

Tags：数据代码 in txt 文件夹列表 for doc

0条回答

目前没有回答

在Python中更快地读取文本文件

相关问题更多 >

编程相关推荐

热门问题

热门文章

在Python中更快地读取文本文件

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >