page = urllib2.urlopen(url)
soup = BeautifulSoup(page.read())
for f in soup.findAll('form'):
target_url = f['action']
#do something with each one of the forms
Scrapy is a fast high-level screen scraping and web crawling framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.
如果要对站点进行爬网,请参见this post。如果您只想处理一些页面并分析其内容(意味着您知道要处理的url),请尝试BeautifulSoup,它允许您执行以下操作:
您可以使用Scrapy:
相关问题 更多 >
编程相关推荐