在Python中净化文本

2 投票
1 回答
1943 浏览
提问于 2025-04-17 15:02

我刚开始学Python,想通过写一些脚本来学习,做一些我会用到的功能。我有一些文本,是在《军团要塞2》的控制台输入“status”后得到的。我想做的是,把下面的文本转换成只有STEAM_X:X:XXXXXXXX格式的内容,也就是Steam64 ID。

# userid name                uniqueid            connected ping loss state
#     31 "Atonement -Ai-"    STEAM_0:1:27464943  00:48      103    0 active
#     10 "?loop?"        STEAM_0:0:31072991  40:48       62    0 active
#     11 "爱 -Ai-"          STEAM_0:0:41992530  40:46       68    0 active
#     12 "MrKateUpton -Ai-"  STEAM_0:1:10894538  40:25       81    0 active
#     13 "Tacet -Ai-"        STEAM_0:1:52131782  39:59       83    0 active
#     14 "CottonBonbon-Ai-"  STEAM_0:1:47812003  39:39       51    0 active
#     15 "belt -Ai-"         STEAM_0:1:4941202   38:43      123    0 active
#     16 "boutros :3"        STEAM_0:0:32271324  38:21       65    0 active
#     17 "[tilt] Xikkari"    STEAM_0:1:41148798  38:14       92    0 active
#     24 "ElenaWitch"        STEAM_0:0:17495028  31:30       73    0 active
#     19 "[tilt] Batcan #boutros" STEAM_0:1:41205650 38:10   63    0 active
#     20 "[?l??]whatupmydiggas" STEAM_0:1:50559125 37:58  112    0 active
#     21 "[tilt] musicman"   STEAM_0:1:37758467  37:31       89    0 active
#     22 "Jack Frost"        STEAM_0:0:24206189  37:28       90    0 active
#     28 "[tilt-sub]deaf ears #best safet" STEAM_0:1:29612138 19:05   94    0 active
#     25 "? notez ?ai"    STEAM_0:1:29663879  31:23      113    0 active
#     27 "-Ai- Lord English" STEAM_0:1:44114633  24:08      116    0 active
#     29 "1.prototypes"      STEAM_0:0:42256202  17:41       83    0 active
#     30 "SourceTV  // name for SourceTV" BOT                        active
#     32 "PUT ME IN COACH"   STEAM_0:1:48004781  00:36      173    0 spawning

Python里有没有什么内置函数可以实现这个算法呢?

For all that is not (!) Steam_X:X:XXXXXXXX, delete/remove.

我在网上查了很多资料,但都没有找到具体的答案。如果有人能给我推荐一个Python的内置函数,我会非常感激,这样我就可以开始编码了。

附注:输出结果应该是这样的

STEAM_0:1:27464943
STEAM_0:0:31072991
STEAM_0:1:10894538
etc
etc

1 个回答

4

这听起来是个用正则表达式处理的简单例子。假设这些内容总是像那样是数字:

>>> import re

>>> with open('/tmp/spam.txt') as f:
...   for steam64id in re.findall(r'STEAM_\d:\d:\d+', f.read()):
...     print steam64id
... 
STEAM_0:1:27464943
STEAM_0:0:31072991
STEAM_0:0:41992530
STEAM_0:1:10894538
STEAM_0:1:52131782
STEAM_0:1:47812003
STEAM_0:1:4941202
STEAM_0:0:32271324
STEAM_0:1:41148798
STEAM_0:0:17495028
STEAM_0:1:41205650
STEAM_0:1:50559125
STEAM_0:1:37758467
STEAM_0:0:24206189
STEAM_0:1:29612138
STEAM_0:1:29663879
STEAM_0:1:44114633
STEAM_0:0:42256202
STEAM_0:1:48004781

通常,删除行的做法不是直接从原文件中删掉,而是把你想要保留的行打印到一个新文件里(然后,如果处理成功的话,可以选择把新文件复制回去,覆盖掉原来的文件)。

撰写回答