我的cod似乎得到了不正确的tweet计数

twiturl = "http://search.twitter.com/search.json?q=" + urlinfo + "&rpp=99&page=15" + "&since_id=" + str(tweetdate) for x in arg1: urlinfo = x[2] idnum = int(x[1]) name = x[0] twiturl = "http://search.twitter.com/search.json?q=" + urlinfo + "&rpp=99&page=15" + "&since_id=" + str(tweetdate) response = urllib2.urlopen(twiturl) twitseek = simplejson.load(response) twitsearch = twitseek['results'] tweets = [x['text'] for x in twitsearch] tweetlist = [tweets, name] namelist.append(tweetlist)

2条回答

网友

1楼 · 编辑于 2024-06-16 13:25:52

单个结果页上返回的最大结果数为100。为了获得所有结果，您需要使用响应中包含的next_pageURL来“分页”它们（有关文档，请参见here）。然后可以循环响应，调用每个响应的next_page参数，直到该参数不再存在（表明您已经收集了所有结果）。在

import json
import urllib
import urllib2


# General query stub
url_stub = 'http://search.twitter.com/search.json'

# Parameters to pass
params = {
    'q': 'tennis',
    'rpp': 100,
    'result_type': 'mixed'
    }

# Variable to store our results
results = []

# Outside of our loop, we pull the first page of results
# The '?' is included in the 'next_page' parameter we receive
# later, so here we manually add it
resp = urllib2.urlopen('{0}?{1}'.format(url_stub, urllib.urlencode(params)))
contents = json.loads(resp.read())
results.extend(contents['results'])

# Now we loop until there is either no longer a 'next_page' variable
# or until we max out our number of results
while 'next_page' in contents:

  # Print some random information
  print 'Page {0}: {1} results'.format(
      contents['page'], len(contents['results']))

  # Capture the HTTPError that will appear once the results have maxed
  try:
    resp = urllib2.urlopen(url_stub + contents['next_page'])
  except urllib2.HTTPError:
    print 'No mas'
    break

  # Load our new contents
  contents = json.loads(resp.read())

  # Extend our results
  results.extend(contents['results'])

# Print out how many results we received
print len(results)

输出：

^{pr2}$

网友

2楼 · 编辑于 2024-06-16 13:25:52

Twitter Search API状态的文档：

rpp (optional): The number of tweets to return per page, up to a max of 100.

以及

page (optional): The page number (starting at 1) to return, up to a max of roughly 1500 results (based on rpp * page).

因此，您应该发出多个请求，每个请求都有不同的页码，最多可为每个请求发送100条tweet：

import urllib, json

twiturl = "http://search.twitter.com/search.json?q=%s&rpp=99&page=%d"

def getmanytweets(topic):
    'Return a list of upto 1500 tweets'
    results = []
    for page in range(1, 16):
        u = urllib.urlopen(twiturl % (topic, page))
        data = u.read()
        u.close()
        t = json.loads(data)
        results += t['results']
    return results

if __name__ == '__main__':
    import pprint
    pprint.pprint(getmanytweets('obama'))

相关问题更多 >

编程相关推荐

热门问题

热门文章