pythontwitterc的While循环问题

2024-05-23 16:25:40 发布

您现在位置:Python中文网/ 问答频道 /正文

我在继续写我的twitter爬虫,遇到了更多的问题。请看下面的代码:

from BeautifulSoup import BeautifulSoup
import re
import urllib2

url = 'http://mobile.twitter.com/NYTimesKrugman'

def gettweets(soup):
    tags = soup.findAll('div', {'class' : "list-tweet"})#to obtain tweet of a follower
    for tag in tags: 
        print tag.renderContents()
        print ('\n\n')

def are_more_tweets(soup):#to check whether there is more than one page on mobile 
    links = soup.findAll('a', {'href': True}, {id: 'more_link'})
    for link in links:
        b = link.renderContents()
        test_b = str(b)
        if test_b.find('more'):
            return True
        else:
            return False

def getnewlink(soup): #to get the link to go to the next page of tweets on twitter 
    links = soup.findAll('a', {'href': True}, {id : 'more_link'})
    for link in links:
        b = link.renderContents()
        if str(b) == 'more':
            c = link['href']
            d = 'http://mobile.twitter.com' +c
            return d

def checkforstamp(soup): # the parser scans a webpage to check if any of the tweets are   
    times = soup.findAll('a', {'href': True}, {'class': 'status_link'})
    for time in times:
        stamp = time.renderContents()
        test_stamp = str(stamp)
        if test_stamp.find('month'): 
            return True
        else:
            return False


response = urllib2.urlopen(url)
html = response.read()
soup = BeautifulSoup(html)
gettweets(soup)
stamp = checkforstamp(soup)
tweets = are_more_tweets(soup)
print 'stamp' + str(stamp)
print 'tweets' +str (tweets)
while (not stamp) and tweets: 
    b = getnewlink(soup)
    print b
    red = urllib2.urlopen(b)
    html = red.read()
    soup = BeautifulSoup(html)
    gettweets(soup)
    stamp = checkforstamp(soup)
    tweets = are_more_tweets(soup)
print 'done' 

代码的工作方式如下: 对于单个用户NYTimesKrugman -我在一个页面上获取所有tweets(gettweets) -下个月的tweet(tweet)的链接还没有获得更多的tweet(tweet)的链接 -我转到tweets的下一页(进入while循环)并继续这个过程,直到违反了上述条件之一

但是,我做了大量的测试,并确定我实际上无法进入while循环。相反,该计划并没有这样做。这很奇怪,因为我的代码是这样编写的,tweets是真的,stamp应该生成false。然而,我得到了以下结果:我真的很困惑!在

^{pr2}$

如果有人能帮忙那就太好了。为什么我不能得到超过1页的微博?我在checkstamp中的解析是否没有正确完成?萨克斯。在


Tags: totruereturndefmorestamplinktwitter
2条回答
if test_stamp.find('month'): 

如果找不到month,则将计算为True,因为它在找不到子字符串时返回-1。如果month在字符串的开头,那么它在这里只计算False,因此它的位置是0。在

你需要

^{pr2}$

或者只是

return test_stamp.find('month') != -1

您的checkforstamp函数返回非空的已定义字符串:

return 'True'

因此(not stamp)将始终为false。在

将其更改为返回布尔值,如are_more_tweets所做的:

^{pr2}$

应该没事的。在

有关参考,请参阅boolean operations文档:

In the context of Boolean operations, and also when expressions are used by control flow statements, the following values are interpreted as false: False, None, numeric zero of all types, and empty strings and containers (including strings, tuples, lists, dictionaries, sets and frozensets). All other values are interpreted as true.

...

The operator not yields True if its argument is false, False otherwise.

编辑:

if测试中的if相同的问题。由于find('substr')在找不到子字符串时返回-1,因此根据上述规则,如果没有匹配项,str.find('substr')在布尔上下文中将是{}。在

这不是代码中出现此问题的唯一地方。请复习你所有的测验。在

相关问题 更多 >