Python 字符串分割

2 投票

3 回答

832 浏览

提问于 2025-04-16 02:00

在Python中，分割这个信息（地址、城市、州、邮政编码）最好的方法是什么呢？

<div class="adtxt">7616 W Belmont Ave<br />Chicago, IL 60634-3225</div>

在某些情况下，邮政编码的格式是这样的：

 <div class="adtxt">7616 W Belmont Ave<br />Chicago, IL 60634</div>

字符串处理信息提取数据分割

3 个回答

把beautifulsoup和正则表达式结合起来，应该能得到类似这样的东西：

import BeautifulSoup
import re
thestring = r'<div class="adtxt">7616 W Belmont Ave<br />Chicago, IL 60634-3225</div>'
re0 = re.compile(r'(?P<address>[^<]+)')
re1 = re.compile(r'(?P<city>[^,]+), (?P<state>\w\w) (?P<zip>\d{5}-\d{4})')
soup = BeautifulSoup.BeautifulSoup(thestring)
(address,) = re0.search(soup.div.contents[0]).groups()
city, state, zip = re1.search(soup.div.contents[2]).groups()

回答于 2025-04-16 由 Python大师

分享举报

给个提示：用正则表达式来解析HTML并不是最好的方法，还有很多更好的选择，比如Beautiful Soup。

这里有个理由告诉你为什么不应该用正则表达式来做这个。

补充：哦，没关系，@teepark先分享了这个链接。:)

回答于 2025-04-16 由 Python大师

分享举报

根据你对某些方面的要求有多严格或宽松，如果这些方面无法从一个例子中推断出来，下面这样的代码应该可以用...

import re

s = re.compile(r'^<div.*?>([^<]+)<br.*?>([^,]+), (\w\w) (\d{5}-\d{4})</div>$')
mo = s.match(thestring)
if mo is None:
  raise ValueError('No match for %r' % thestring)
address, city, state, zip = mo.groups()

回答于 2025-04-16 由 Python大师

分享举报

Python 字符串分割

3 个回答

撰写回答