空结果集靓汤

url = "http://query.nytimes.com/search/sitesearch/?action=click&contentCollection&region=TopBar&WT.nav=searchWidget&module=SearchSubmit&pgtype=sectionfront{data}" html = urlopen(url.format(data="#"+'/san+diego/24hours')) soup = BeautifulSoup(html.read().decode('utf-8'),"lxml") section = soup.find("ol",class_='searchResultsList flush') items = section.find_all('li', class_="story") print items

1条回答

网友

1楼 · 发布于 2024-05-16 19:28:00

HTML确实不包含数据。查看Chrome开发人员工具中的Network选项卡，可以看到搜索结果是通过AJAX查询获取的，该查询指向以下URL:http://query.nytimes.com/svc/add/v1/sitesearch.json?q=san%20diego&begin_date=24hoursago&facet=true

下面是一个发现的截图：

您必须打开开发人员工具（尝试查看菜单），选择网络选项卡，重新加载页面，然后环顾四周。XHR=XmlHttpRequest现在被称为AJAX请求。这意味着一些Javascript向服务器请求数据。在

这是JSON，所以您实际上很幸运，因为这比解析HTML要好得多。在

相关问题更多 >

编程相关推荐

热门问题

热门文章