Python爬取Facebook的网络爬虫

2 投票

2 回答

6582 浏览

提问于 2025-04-19 22:16

我正在尝试用Python编写一个网络爬虫，目的是打印出Facebook上的推荐人数。例如，在这篇来自天空新闻的文章中（http://news.sky.com/story/1330046/are-putins-little-green-men-back-in-ukraine），大约有60个Facebook推荐。我想在Python程序中用网络爬虫打印出这个数字。

我试过这样做，但程序没有打印出任何内容：

import requests
from bs4 import BeautifulSoup

def get_single_item_data(item_url):
    source_code = requests.get(item_url)
    plain_text = source_code.text
    soup = BeautifulSoup(plain_text)
    # if you want to gather information from that page
    for item_name in soup.findAll('span', {'class': 'pluginCountTextDisconnected'}):
        try:
                print(item_name.string)
        except:
                print("error")

get_single_item_data("http://news.sky.com/story/1330046/are-putins-little-green-men-back-in-ukraine")

自动化脚本网络爬虫 facebook api 信息提取网页解析数据抓取爬虫技术推荐系统

2 个回答

Facebook的推荐内容是通过JavaScript动态加载的，所以你的HTML解析器无法直接获取这些内容。你需要使用Graph API和FQL来直接从Facebook获取你想要的信息。

这里有一个网页控制台，你可以在生成访问令牌后探索查询。

回答于 2025-04-19 由 Python大师

分享举报

Facebook建议在一个叫做iframe的框架中加载内容。你可以查看iframe的源地址，然后获取里面一个叫的文本内容：

import requests
from bs4 import BeautifulSoup

url = 'http://news.sky.com/story/1330046/are-putins-little-green-men-back-in-ukraine'
r = requests.get(url) # get the page through requests
soup = BeautifulSoup(r.text) # create a BeautifulSoup object from the page's HTML

url = soup('iframe')[0]['src'] # search for the iframe element and get its src attribute
r = requests.get('http://' + url[2:]) # get the next page from requests with the iframe URL
soup = BeautifulSoup(r.text) # create another BeautifulSoup object

print(soup.find('span', class_='pluginCountTextDisconnected').string) # get the directed information

第二个requests.get的写法是这样的，因为src属性返回的地址是//www.facebook.com/plugins/like.php?href=http%3A%2F%2Fnews.sky.com%2Fstory%2F1330046&send=false&layout=button_count&width=120&show_faces=false&action=recommend&colorscheme=light&font=arial&height=21。我在前面加了http://，并且忽略了开头的//。

BeautifulSoup文档
 Requests文档

回答于 2025-04-19 由 Python大师

分享举报

Python爬取Facebook的网络爬虫

2 个回答

撰写回答