在Python中净化文本
我刚开始学Python,想通过写一些脚本来学习,做一些我会用到的功能。我有一些文本,是在《军团要塞2》的控制台输入“status”后得到的。我想做的是,把下面的文本转换成只有STEAM_X:X:XXXXXXXX
格式的内容,也就是Steam64 ID。
# userid name uniqueid connected ping loss state
# 31 "Atonement -Ai-" STEAM_0:1:27464943 00:48 103 0 active
# 10 "?loop?" STEAM_0:0:31072991 40:48 62 0 active
# 11 "爱 -Ai-" STEAM_0:0:41992530 40:46 68 0 active
# 12 "MrKateUpton -Ai-" STEAM_0:1:10894538 40:25 81 0 active
# 13 "Tacet -Ai-" STEAM_0:1:52131782 39:59 83 0 active
# 14 "CottonBonbon-Ai-" STEAM_0:1:47812003 39:39 51 0 active
# 15 "belt -Ai-" STEAM_0:1:4941202 38:43 123 0 active
# 16 "boutros :3" STEAM_0:0:32271324 38:21 65 0 active
# 17 "[tilt] Xikkari" STEAM_0:1:41148798 38:14 92 0 active
# 24 "ElenaWitch" STEAM_0:0:17495028 31:30 73 0 active
# 19 "[tilt] Batcan #boutros" STEAM_0:1:41205650 38:10 63 0 active
# 20 "[?l??]whatupmydiggas" STEAM_0:1:50559125 37:58 112 0 active
# 21 "[tilt] musicman" STEAM_0:1:37758467 37:31 89 0 active
# 22 "Jack Frost" STEAM_0:0:24206189 37:28 90 0 active
# 28 "[tilt-sub]deaf ears #best safet" STEAM_0:1:29612138 19:05 94 0 active
# 25 "? notez ?ai" STEAM_0:1:29663879 31:23 113 0 active
# 27 "-Ai- Lord English" STEAM_0:1:44114633 24:08 116 0 active
# 29 "1.prototypes" STEAM_0:0:42256202 17:41 83 0 active
# 30 "SourceTV // name for SourceTV" BOT active
# 32 "PUT ME IN COACH" STEAM_0:1:48004781 00:36 173 0 spawning
Python里有没有什么内置函数可以实现这个算法呢?
For all that is not (!) Steam_X:X:XXXXXXXX, delete/remove.
我在网上查了很多资料,但都没有找到具体的答案。如果有人能给我推荐一个Python的内置函数,我会非常感激,这样我就可以开始编码了。
附注:输出结果应该是这样的
STEAM_0:1:27464943
STEAM_0:0:31072991
STEAM_0:1:10894538
etc
etc
1 个回答
4
这听起来是个用正则表达式处理的简单例子。假设这些内容总是像那样是数字:
>>> import re
>>> with open('/tmp/spam.txt') as f:
... for steam64id in re.findall(r'STEAM_\d:\d:\d+', f.read()):
... print steam64id
...
STEAM_0:1:27464943
STEAM_0:0:31072991
STEAM_0:0:41992530
STEAM_0:1:10894538
STEAM_0:1:52131782
STEAM_0:1:47812003
STEAM_0:1:4941202
STEAM_0:0:32271324
STEAM_0:1:41148798
STEAM_0:0:17495028
STEAM_0:1:41205650
STEAM_0:1:50559125
STEAM_0:1:37758467
STEAM_0:0:24206189
STEAM_0:1:29612138
STEAM_0:1:29663879
STEAM_0:1:44114633
STEAM_0:0:42256202
STEAM_0:1:48004781
通常,删除行的做法不是直接从原文件中删掉,而是把你想要保留的行打印到一个新文件里(然后,如果处理成功的话,可以选择把新文件复制回去,覆盖掉原来的文件)。