从文本文件中删除字符

2条回答

网友

1楼 · 编辑于 2024-04-20 04:20:34

一气呵成

如果您的文件不是很大，您可以一次性完成：

import re
with open('englishtweets1.txt') as f:
    contents = re.sub(r'^@\w+\s|\bhttp[^\s]+', '', f.read(), flags=re.MULTILINE)
print contents

结果：

thanks for the follow :)
hii... if u want to make a new friend just add me on facebook! :) xx
enjoy tmrro. saw them earlier this wk here in tokyo :)

请注意，http剥离非常简单，它将剥离以http开头的任何内容。要解决这个问题，可以改进regex以搜索有效的httpurl。你知道吗

逐行

如果您的文件非常大，您可能不想将其全部存储在内存中。您可以迭代文件中的所有行：

import re
with open('englishtweets1.txt') as f:
    for line in f:
        print re.sub(r'^@\w+\s|\bhttp[^\s]+', '', line)

网友

2楼 · 编辑于 2024-04-20 04:20:34

像这样使用它

import re
data = open('englishtweets1.txt').read()
new_str = re.sub(r'^@', ' ', data)
new_str = re.sub(r'^https?:\/\/.*[\r\n]*', '', new_str, flags=re.MULTILINE)
#open('removed.txt', 'w').write(new_str) (if needed)

更新这是工作刚刚测试

new_str = re.sub(r'https.(.*?) ', '', new_str, flags=re.MULTILINE)

一气呵成

逐行

相关问题更多 >

编程相关推荐

热门问题

热门文章

从文本文件中删除字符

一气呵成

逐行

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >