从多个文档中检测字母、数字和符号(不仅仅是单词)序列(模式)的正确方法

2024-04-26 07:45:25 发布

您现在位置:Python中文网/ 问答频道 /正文

文本输出不是由人最终输入的,而是作为回溯或异常细节的计算机程序,并提供额外的文本作为程序崩溃或代码处理异常的上下文。 话虽如此,我不想修复文本,显然可能是拼写错误。 目标是能够从两个或多个文档中检测出常用的单词序列(不仅是单词,还可以是符号和数字。返回代码123)。 该解决方案可以基于正则表达式,但由于代码维护的原因,希望避免该路径。有成百上千种不同的可能模式,一旦部署了新的代码版本,这种模式可能会经常发生变化。 尝试了TF-IDF方法,但我需要更多的上下文,并在一个句子中找到连续的单词/符号/数字,这些单词/符号/数字可能在文档之间相似。 我也在尝试空间感和相似性,但不知道我的方向是否正确。 以下是5个文档示例:

list_dict = [{'document':'binary tool failed. Command Failed. Command:binary -d serverlogs --selectlog=. Exception details::rest data error : return code: 6 stdout: binary : binary Tool version2.4   Mounting  partition...  stderr: Error: No  log files found. debug data error : return code: 255 stdout:binary : binary Tool version 2.4   Mounting  partition...  stderr: INFO   : Entering  local download functions... INFO    : Obtaining the absolute path of device. INFO    : device folder path: INFO    : device data files path:/tmp/device/data INFO    : Updating binary_version to format data appropriately INFO    : Getting all config files that are required. INFO : array [] INFO    : Obtaining all relevant file names from device. INFO    : Exception: StopIteration() ERROR:  Traceback (most recent call last):   File "dummy.py", line 298, in run File "dummy.py", line 186, in _run_command   File "extensions/COMMANDS/Server.py", line 120, in run   File "extensions/COMMANDS/Server.py", line 144, in serverlogsworkerfunction   File "extensions/COMMANDS/Server.py", line 724, in downloadlocally   File "extensions/COMMANDS/Server.py", line 767, in downloadlocalworker   File "extensions/COMMANDS/Server.py", line 842, in getfilenames StopIteration Test Component Recipe: A'},

{'document':'binary tool failed. Command Failed. Command:binary -d serverlogs --selectlog=. Exception details::rest data error : return code: 6 stdout: binary : binary Tool version2.4   Mounting  partition...  stderr: Error: No  log files found. debug data error : return code: 255 stdout:binary : binary Tool version 2.4   Mounting  partition...  stderr: INFO   : Entering  local download functions... INFO    : Obtaining the absolute path of device. INFO    : device folder path: INFO    : device data files path:/tmp/device/data INFO    : Updating binary_version to format data appropriately INFO    : Getting all config files that are required. INFO : array [] INFO    : Obtaining all relevant file names from device. INFO    : Exception: StopIteration() ERROR:  Traceback (most recent call last):   File "dummy.py", line 298, in run File "dummy.py", line 186, in _run_command   File "extensions/COMMANDS/Server.py", line 120, in run   File "extensions/COMMANDS/Server.py", line 144, in serverlogsworkerfunction   File "extensions/COMMANDS/Server.py", line 724, in downloadlocally   File "extensions/COMMANDS/Server.py", line 767, in downloadlocalworker   File "extensions/COMMANDS/Server.py", line 842, in getfilenames StopIteration Test Component Recipe: A'},

{'document':'binary tool failed. Command Failed. Command:binary -d rawpatch /scripts/python/set__defaults.json. Exception details::rest data error : return code: 63 stdout:binary : binary Tool version 2.4    stderr: binary response with code [404]: The operation expected an image or resource at the provided URI, but found none. debug data error : return code: 63 stdout: binary: binary Tool version 2.4    stderr: DEBUG  : Blobstore REQUEST: PATCH PATH: {"ContentLength": 55, "Accept": "*/*", "XAuthToken": "bdd8b578e24ba1b66b3fa6e98092", "Connection": "KeepAlive", "ODataVersion": "4.0", "ContentType": "application/json"} HEADERS: /device/settings/ BODY: {"Attributes": {"RestoreDefaults": "Yes"}} INFO    :binary response Time to /device/settings/: 0.0451371669769 secs. DEBUG   : Blobstore RESPONSEfor /device/settings/: Code: 404 Not Found Headers:         contentlength: 229 connection: keepalive         etag: W/"BBBBB123" cachecontrol: nocache         date: Wed, 19 Jun 2019 05:16:32 GMT         odataversion: 4.0 xframeoptions: sameorigin         contenttype: application/json; charset=utf8 xhostauthmethodused: AutoLogin  Body of /device/settings/: {"error":{"code":"device.0.10.ExtendedInfo","message":"See@Message.ExtendedInfo for more '},

{'document':'binary tool failed. Command Failed. Command:binary -d save --select baseconfigs -f /opt/TheCompany/biosconfig/DefaultBiosSettings.txt. Exception details::rest data error : return code: 255 stdout: binary : binary Tool version 2.4   Saving configuration...  stderr: ERROR: Invalid control character at: line 1 column 3713 (char 3712) debugdata error : return code: 255 stdout: binary : binary Tool version 2.4   Saving configuration...  stderr: DEBUG  : _loading /Reddog/v1/Systems/1/ DEBUG   : Blobstore REQUEST: GET         PATH: {"ODataVersion": "4.0", "Connection": "KeepAlive","Accept": "*/*", "XAuthToken": "94144fef6d35114c2dc298b9b35e72e3"} HEADERS: /Reddog/v1/Systems/1/         BODY: NoneINFO    : binary response Time to /Reddog/v1/Systems/1/: 0.0791671276093 secs. DEBUG   : Blobstore RESPONSE for /Reddog/v1/Systems/1/: Code: 200 OK Headers: contentlength: 6283         connection: keepalive etag: W/"1234567"         link: </Reddog/v1/SchemaStore/en/CSystem.json/>;rel=describedby         allow: GET, HEAD, POST, PATCH         cachecontrol: nocache         date: Tue, 25 Jun 2019 16:24:48 GMT odataversion: 4.0         xframeoptions: sameorigin         contenttype: application/json; charset=utf8         xhostauthmethodused: Session Body of /Reddog/v1/Systems/1/:'},

{'document':'binary tool failed. Command Failed. Command:binary -d save --select baseconfigs -f /opt/TheCompany/biosconfig/DefaultBiosSettings.txt. Exception details::rest data error : return code: 255 stdout: binary : binary Tool version 2.4   Saving configuration...  stderr: ERROR: Invalid control character at: line 1 column 3713 (char 3712) debugdata error : return code: 255 stdout: binary : binary Tool version 2.4   Saving configuration...  stderr: DEBUG  : _loading /Reddog/v1/Systems/1/ DEBUG   : Blobstore REQUEST: GET         PATH: {"ODataVersion": "4.0", "Connection": "KeepAlive","Accept": "*/*", "XAuthToken": "86e8625c60196754e81db3adfe119816"} HEADERS: /Reddog/v1/Systems/1/         BODY: NoneINFO    : binary response Time to /Reddog/v1/Systems/1/: 0.0443890094757 secs. DEBUG   : Blobstore RESPONSE for /Reddog/v1/Systems/1/: Code: 200 OK Headers: contentlength: 6334         connection: keepalive etag: W/"123456"         link: </Reddog/v1/SchemaStore/en/CSystem.json/>;rel=describedby         allow: GET, HEAD, POST, PATCH         cachecontrol: nocache         date: Tue, 25 Jun 2019 01:55:23 GMT odataversion: 4.0         xframeoptions: sameorigin         contenttype: application/json; charset=utf8         xhostauthmethodused: Session Body of /Reddog/v1/Systems/1/: {"@odata.context":"/Reddog/v1/$metadata#CSystem.CSystem","@odata.etag":"W/\"123456\"","@odata.id":"/Reddog/v1/Systems/1/","@odata.type":"#CSystem.v1_4_0.CSystem","Id":"1","Actions":{"#CSystem.Reset":{"ResetType@Reddog.AllowableValues":["On","ForceOff","ForceRestart","Nmi","PushPowerButton"],"target":"/Reddog/v1/Systems/1/Actions/CSystem.Reset/"}},"AssetTag":"","SMB":{"@odata.id":"/Reddog/v1/systems/1/bios/"},"thingVersion":"M4a3v2.00 '}]

df = pd.DataFrame(list_dict)

对于提供的样品,有兴趣检测:

  • 命令
  • 异常详细信息,包括返回代码值
  • 然后是文档之间的共同点

需要Python或R方面的帮助


Tags: inpyinfodatareturndevicelinecode