在这个来自https://letterboxd.com/shesnicky/list/top-50-favourite-films/的html片段中,我尝试遍历所有不同的li标记并从“data target link”中获取信息,这样我就可以使用它创建一个新的链接,将我带到该电影的页面,但是每次尝试获取数据时,它只会返回no或沿这些行的错误。你知道吗
<li class="poster-container numbered-list-item" data-owner-rating="10"> <div class="poster film-poster really-lazy-load" data-image-width="125" data-image-height="187" data-film-slug="/film/donnie-darko/" data-linked="linked" data-menu="menu" data-target-link="/film/donnie-darko/" > <img src="https://s3.ltrbxd.com/static/img/empty-poster-125.c6227b2a.png" class="image" width="125" height="187" alt="Donnie Darko"/><span class="frame"><span class="frame-title"></span></span> </div> <p class="list-number">1</p> </li>
我将使用这些链接为一个twitter机器人抓取img,所以我尝试在代码中这样做:
class BotStreamer(tweepy.StreamListener):
print "Bot Streamer"
#on_data method of Tweepy’s StreamListener
#passes data from statuses to the on_status method
def on_status(self, status):
print "on status"
link = 'https://letterboxd.com/shesnicky/list/top-50-favourite-films/'
page = requests.get(link)
soup = BS(page.content, 'html.parser')
movies_ul = soup.find('ul', {'class':'poster-list -p125 -grid film-list'})
movies = []
for mov in movies_ul.find('data-film-slug'):
movies.append(mov)
rand = randint(0,51)
newLink = "https://letterboxd.com%s" % (str(movies[rand]))
newPage = requests.get(newLink)
code = BS(newPage.content, 'html.parser')
code_div = code.find\
('div', {'class':'react-component film-poster film-poster-51910 poster'})
image = code_div.find('img')
url = image.get('src')
username = status.user.screen_name
status_id = status.id
tweet_reply(url, username, status_id)
然而,我不断得到关于列表超出范围,或者不能在NoneType上迭代的错误。所以我做了一个prgrm测试,看看能不能得到数据:
import requests
from bs4 import BeautifulSoup as BS
link = 'https://letterboxd.com/shesnicky/list/top-50-favourite-films/'
page = requests.get(link)
soup = BS(page.content, 'html.parser')
movies_ul = soup.find('ul', {'class':'poster-list -p125 -grid film-list'})
more = movies_ul.find('li', {'class':'poster-container numbered-list-item'})
k = more.find('data-target-link')
print k
再说一遍,我只得到一个也没有。非常感谢您的帮助。你知道吗
读取doc:find(),因为第一个参数需要标记名,而不是属性。你知道吗
你可以这样做
或者
完整示例
相关问题 更多 >
编程相关推荐