你的友好邻里网络刮板
pyrobot的Python项目详细描述
主页:http://pyrobot.readthedocs.org/
importrefrompyrobotimportRoboBrowser# Browse to Rap Geniusbrowser=RoboBrowser(history=True)browser.open('http://rapgenius.com/')# Search for Queenform=browser.get_form(action=re.compile(r'search'))form['q'].value='queen'browser.submit_form(form)# Look up the first songsongs=browser.select('.song_name')browser.follow_link(songs[0])lyrics=browser.find(class_=re.compile(r'\blyrics\b'))lyrics.text# \n[Intro]\nIs this the real life...# Back to results pagebrowser.back()# Look up my favorite songbrowser.follow_link('death on two legs')lyrics=browser.find(class_=re.compile(r'\blyrics\b'))lyrics.text# \n[Verse 1]\nYou suck my blood like a leech...
Pyrobot结合了两个优秀的Python库: 请求和美化组。Pyrobot表示使用 使用beautifulsoup的请求和html响应,透明地公开 两个库的方法:
importrefrompyrobotimportRoboBrowserbrowser=RoboBrowser(user_agent='a python robot')browser.open('https://github.com/')# Inspect the browser sessionbrowser.session.cookies['_gh_sess']# BAh7Bzo...browser.session.headers['User-Agent']# a python robot# Searched the parsed HTMLbrowser.select('div.teaser-icon')# [<div class="teaser-icon"># <span class="mega-octicon octicon-checklist"></span># </div>,# ...browser.find(class_=re.compile(r'column',re.I))# <div class="one-third column"># <div class="teaser-icon"># <span class="mega-octicon octicon-checklist"></span># ...