正在尝试对配置文件URL的Yelp搜索结果页面进行爬网

2024-05-23 21:11:09 发布

男 | 程序猿一只，喜欢编程写python代码。

我正在尝试使用BeautifulSoup从Yelp搜索结果页面中刮取个人资料URL。这是我目前拥有的代码：

url="https://www.yelp.com/search?find_desc=tree+-+removal+-+&find_loc=Baltimore+MD&start=40"

response=requests.get(url)

data=response.text

soup = BeautifulSoup(data,'lxml')

for a in soup.find_all('a', href=True):
   with open(r'C:\Users\my.name\Desktop\Yelp-URLs.csv',"a") as f:
         print(a,file=f)

这为我提供了页面上的每个href链接，而不仅仅是配置文件URL。另外，当我只需要业务概要URL时，我得到了完整的类字符串（一个类lemon…）

请帮忙

Tags：代码 https url data response www 页面 find

1条回答

网友

1楼 · 发布于 2024-05-23 21:11:09

您可以使用select缩小href限制

for a in soup.select('a[href^="/biz/"]'):
    with open(r'/Users/my.name/Desktop/Yelp-URLs.csv',"a") as f:
        print(a.attrs['href'], file=f)

正在尝试对配置文件URL的Yelp搜索结果页面进行爬网

相关问题更多 >

编程相关推荐

热门问题

热门文章

正在尝试对配置文件URL的Yelp搜索结果页面进行爬网

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >