在新行、制表符和一些空格上拆分字符串

2024-04-27 17:35:20 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试对一组看起来有些不规则的数据执行字符串拆分,这些数据如下:

\n\tName: John Smith
\n\t  Home: Anytown USA
\n\t    Phone: 555-555-555
\n\t  Other Home: Somewhere Else
\n\t Notes: Other data
\n\tName: Jane Smith
\n\t  Misc: Data with spaces

我想把它转换成tuple/dict,在这里我稍后将在冒号:上拆分,但首先我需要去掉所有多余的空白。我猜regex是最好的方法,但我似乎找不到一个有效的,下面是我的尝试。

data_string.split('\n\t *')

Tags: 数据字符串homedataphonejohnelsenotes
3条回答

你可以用一块regex石头杀死两只鸟:

>>> r = """
... \n\tName: John Smith
... \n\t  Home: Anytown USA
... \n\t    Phone: 555-555-555
... \n\t  Other Home: Somewhere Else
... \n\t Notes: Other data
... \n\tName: Jane Smith
... \n\t  Misc: Data with spaces
... """
>>> import re
>>> print re.findall(r'(\S[^:]+):\s*(.*\S)', r)
[('Name', 'John Smith'), ('Home', 'Anytown USA'), ('Phone', '555-555-555'), ('Other Home', 'Somewhere Else'), ('Notes', 'Other data'), ('Name', 'Jane Smith'), ('Misc', 'Data with spaces')]
>>> 

只需使用.strip(),它会在拆分时为您删除所有空白,包括制表符和换行符。然后可以使用^{}完成拆分:

[s.strip() for s in data_string.splitlines()]

输出:

>>> [s.strip() for s in data_string.splitlines()]
['Name: John Smith', 'Home: Anytown USA', 'Phone: 555-555-555', 'Other Home: Somewhere Else', 'Notes: Other data', 'Name: Jane Smith', 'Misc: Data with spaces']

现在甚至可以在:上内联拆分:

>>> [s.strip().split(': ') for s in data_string.splitlines()]
[['Name', 'John Smith'], ['Home', 'Anytown USA'], ['Phone', '555-555-555'], ['Other Home', 'Somewhere Else'], ['Notes', 'Other data'], ['Name', 'Jane Smith'], ['Misc', 'Data with spaces']]
>>> for line in s.splitlines():
...     line = line.strip()
...     if not line:continue
...     ary.append(line.split(":"))
...
>>> ary
[['Name', ' John Smith'], ['Home', ' Anytown USA'], ['Misc', ' Data with spaces'
]]
>>> dict(ary)
{'Home': ' Anytown USA', 'Misc': ' Data with spaces', 'Name': ' John Smith'}
>>>

相关问题 更多 >