如何使用beautifulsoup刮取手机号码

2024-04-23 10:55:29 发布

您现在位置:Python中文网/ 问答频道 /正文

我只想使用以下格式刮取手机:

+1 NXX-NXX-XXXX

N=digits 2–9, X=digits 0–9

+1 is the country code that includes the US, there are 17 other countries, e.g., Canada, Caribbean Islands.

假设我们需要找到以986和965等开始的每个数字(我们有一组数字)作为第一个NXX

这是我获取电子邮件的代码:

    email = soup(text=re.compile(r'[A-Za-z0-9\.\+_-]+@[A-Za-z0-9\._-]+\.[a-zA-Z]*'))

    _emailtokens = str(email).replace("\\t", "").replace("\\n", "").split(' ')

    if len(_emailtokens):
        print([match.group(0) for token in _emailtokens for match in [re.search(r"([a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+)", str(token.strip()))] if match])

但是我需要换一下手机


Tags: thereemailmatch数字replace手机digits
1条回答
网友
1楼 · 发布于 2024-04-23 10:55:29

假设您已经编写了一个刮板将您的数字字符串(移动和非移动)存储在列表中(在您的情况下,您很可能已经根据代码将数字拆分为一个列表),那么下面的代码片段(使用正则表达式)可能会对您有所帮助

代码

import re

#NXX-NXX-XXXX
#NXX 986 or 965
#N=digits 2–9, X=digits 0–9

#here is the regex pattern you need
pattern = r'(?=[2-9]{1}[0-9]{2}-[2-9]{1}[0-9]{2}-[0-9]{4}$)((?P<hello>986.+)|(?P<world>965.+))'

#Note: give your groups (986 and 965) a sensible name, I am using hello and world for demonstration

sent = ['986-233-8901', '965-345-8745', '123-456-7890', '986-134-5987', '1234', '$5@67^73']
#Matched, Matched, None, None, None, None

regexp = re.compile(pattern)

#the matched results
result = [regexp.match(item) for item in sent]
#change to regexp.search() if needed

#a way to retrieve group elements with prefix 986 (group hello)
hello_group = [item.group('hello') for item in result if item is not None]

输出

print(result)
#[<re.Match object; span=(0, 12), match='986-233-8901'>, <re.Match object; span=(0, 12), match='965-345-8745'>, None, None]

print(hello_group)
#['986-233-8901', None]

相关问题 更多 >