用python抓取NYTimes的搜索结果

2024-04-25 20:51:40 发布

您现在位置:Python中文网/ 问答频道 /正文

我正试图从纽约时报上搜集搜索结果。例如,我用这个开始我的刮削过程

url = "http://query.nytimes.com/search/sitesearch/?action=click&contentCollection&region=TopBar&WT.nav=searchWidget&module=SearchSubmit&pgtype=Homepage#/%22big+data%22/30days/articles/1/allauthors/oldest/"

但是,我可以使用python下载的html没有任何搜索结果。有没有什么方法,我可以访问html,就像我打开网页浏览器上的链接?你知道吗

下面是html的一部分,如果在web浏览器上打开链接,我可以“检查元素”:

<div class="searchResults" id="searchResults" style="display: none;">
    <ol class="searchResultsList flush" style="display: block;">
        <li class="story noThumb">
            <div class="element2">
                <h3>
                    <a href="http://www.nytimes.com/2014/07/16/technology/apple-and-ibm-in-broad-software-deal-for-businesses.html">Apple Joins With IBM on Business Software </a>
                </h3>
                <p class="summary">The applications, Mr. Cook said, will bring “<strong>big data</strong> analytics down to the fingertips” of Apple iPhone and iPad users in corporations. “IBM can&nbsp;...</p>
                <div class="storyMeta">
                    <span class="dateline">July 15, 2014</span> - 
                    <span class="byline">By BRIAN X. CHEN and STEVE LOHR</span> - 
                    <span class="section">Technology - article</span> - 
                    <span class="printHeadline">Print Headline: "Apple Joins With IBM on Business Software"</span>
                </div>
            </div>
        </li>
        <li class="story">

理想输出为:

<a href="http://www.nytimes.com/2014/07/16/technology/apple-and-ibm-in-broad-software-deal-for-businesses.html">Apple Joins With IBM on Business Software </a>

谢谢你!你知道吗


Tags: andindivcomhttpappleonhtml
1条回答
网友
1楼 · 发布于 2024-04-25 20:51:40

返回搜索结果的实际请求是^{}请求。用Python模拟它。你知道吗

使用^{}的示例:

import requests

url = 'http://query.nytimes.com/svc/cse/v2pp/sitesearch.json'
params = {
    'query': "big data",
    'date_range_lower': '30daysago',
    'pt': 'article',
    'sort_order': 'a'
}

response = requests.get(url, params=params)
data = response.json()
for result in data['results']['results']:
    print result.get('og:url')

印刷品:

http://www.nytimes.com/2014/07/15/upshot/politically-18-year-olds-look-a-lot-like-people-in-their-20s.html
http://www.nytimes.com/2014/07/15/business/vw-to-add-suv-production-to-chattanooga-plant.html
http://www.nytimes.com/2014/07/15/business/media/germany-1-world-cup-fever-1000.html
http://www.nytimes.com/2014/07/15/business/international/winding-road-ahead-for-us-europe-trade-talks.html
http://www.nytimes.com/2014/07/15/business/daily-stock-market-activity.html
http://www.nytimes.com/2014/07/14/business/international/airlines-step-up-investment-to-meet-passenger-growth.html
http://www.nytimes.com/2014/07/15/business/international/eurozone-industrial-production-drops.html
http://www.nytimes.com/2014/07/14/business/international/airline-passengers-weigh-in-with-online-reviews.html
http://www.nytimes.com/2014/07/16/technology/a-deluge-of-comment-on-net-rules.html
http://www.nytimes.com/2014/07/16/upshot/as-growth-in-health-care-spending-slows-asking-if-a-trend-will-last.html

相关问题 更多 >