如何使用googleapi每天从youtube上抓取10k条记录

2024-04-23 14:24:14 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在使用googleapi从youtube v3上抓取数据。基于搜索关键字,我试图取消像likescont,viewscount,dislikescont等数据

问题是,默认情况下,我们最多可以获得50条记录。我需要更多的记录,我们可以通过分页来实现。你知道吗

2019年1月11日,谷歌的记录从每天100万条减至每天10万条。要请求每天10k记录,我们需要做分页,我不知道如何设置在我的代码分页。你知道吗

from apiclient.discovery import build
import argparse
import csv
import unidecode

DEVELOPER_KEY = "xxxxxxx"
YOUTUBE_API_SERVICE_NAME = "youtube"
YOUTUBE_API_VERSION = "v3"

def youtube_search(options):

    youtube = build(YOUTUBE_API_SERVICE_NAME, YOUTUBE_API_VERSION, developerKey=DEVELOPER_KEY)

    search_response = youtube.search().list(q=options.q,part="id,snippet",maxResults=options.max_results).execute()

    videos = []
    channels = []
    playlists = []

    csvFile = open('checking_for_no_of_records.csv','w')
    csvWriter = csv.writer(csvFile)
    csvWriter.writerow(["title","videoId","viewCount","likeCount","dislikeCount", "commentCount","favoriteCount"])

    for search_result in search_response.get("items", []):
        if search_result["id"]["kind"] == "youtube#video":
            title = search_result["snippet"]["title"]
            title = unidecode.unidecode(title)
            videoId = search_result["id"]["videoId"]
            video_response = youtube.videos().list(id=videoId,part="statistics").execute()
            for video_result in video_response.get("items",[]):
                    viewCount = video_result["statistics"]["viewCount"]
                    if 'likeCount' not in video_result["statistics"]:
                            likeCount = 0
                    else:
                            likeCount = video_result["statistics"]["likeCount"]
                    if 'dislikeCount' not in video_result["statistics"]:
                            dislikeCount = 0
                    else:
                            dislikeCount = video_result["statistics"]["dislikeCount"]
                    if 'commentCount' not in video_result["statistics"]:
                            commentCount = 0
                    else:
                            commentCount = video_result["statistics"]["commentCount"]
                    if 'favoriteCount' not in video_result["statistics"]:
                            favoriteCount = 0
                    else:
                            favoriteCount = video_result["statistics"]["favoriteCount"]
            csvWriter.writerow([title,videoId,viewCount,likeCount,dislikeCount, commentCount,favoriteCount])

    csvFile.close()  


if __name__ == '__main__':
  parser = argparse.ArgumentParser()
  parser.add_argument('--q', help='Search term', default='Google')
  parser.add_argument('--max-results', help='Max results',default = 50)
  args = parser.parse_args()

  youtube_search(args)

有了上面的代码,我只能得到50条记录,每天需要得到10万条记录。你知道吗


Tags: inimportsearchiftitleyoutubevideo记录