如何抓取Quora个人资料页面的“更多”部分？

1 投票

1 回答

674 浏览

提问于 2025-04-17 03:26

为了找出Quora上所有的话题，我决定从一个有很多关注话题的个人资料页面开始抓取数据，比如这个链接：http://www.quora.com/Charlie-Cheever/topics。我从这个页面抓取了话题，但现在我需要从一个Ajax页面抓取话题，这个页面是在你点击页面底部的“更多”按钮时加载的。我正在尝试找到在点击“更多”按钮时执行的javascript函数，但到现在为止还没有找到。以下是页面html中的三个代码片段，可能和这个有关：

<div class=\"pager_next action_button\" id=\"__w2_mEaYKRZ_more\">More</div>
{\"more_button\": \"mEaYKRZ\"}

\"dPs6zd5\": {\"more_button\": \"more_button\"}

new(PagedListMoreButton)(\"mEaYKRZ\",\"more_button\",{},\"live:ld_c5OMje_9424:cls:a.view.paged_list:PagedListMoreButton:/TW7WZFZNft72w\",{})

你们中有没有人知道点击“更多”按钮时执行的javascript函数的名字？任何帮助都非常感谢 :)

目前这个Python脚本（参考了这个教程）看起来是这样的：

#just prints topics followed by Charlie Cheevers from the 1st page
#!/usr/bin/python
import httplib2,time,re
from BeautifulSoup import BeautifulSoup
SCRAPING_CONN = httplib2.Http(".cache")

def fetch(url,method="GET"):
    return SCRAPING_CONN.request(url,method)

def extractTopic(s):
    d = {}
    d['url'] = "http://www.quora.com" + s['href']
    d['topicName'] = s.findChildren()[0].string
    return d

def fetch_stories():
    page = fetch(u"http://www.quora.com/Charlie-Cheever/topics")
    soup = BeautifulSoup(page[1])
    stories = soup.findAll('a', 'topic_name')
    topics = [extractTopic(s) for s in stories]
    for t in topics:
        print u"%s, %s\n" % (t['topicName'],t['url'])

stories = fetch_stories()

javascript ajax 网页解析数据抓取前端开发动态内容加载 quora 话题抓取

1 个回答

你可以在浏览器的DOM检查器里找到它，查看事件监听器。它是一个匿名函数，长得像这样：

function (){return typeof d!=="undefined"&&!d.event.triggered?d.event.handle.apply(l.elem,arguments):b}

这个网站看起来不太好抓取数据，你可以考虑使用selenium工具。

回答于 2025-04-17 由 Python大师

分享举报

如何抓取Quora个人资料页面的“更多”部分？

1 个回答

撰写回答