网站垃圾分析维迪亚得到的课程和他们的名字和评论总数

2024-05-12 23:07:32 发布

您现在位置:Python中文网/ 问答频道 /正文

我已经刮分析维迪亚网站,以获得他们的课程,课程名称和课程的总评论。获得他们的课程没有问题,但是我很难获得课程名称和他们的总评论。你知道吗

这是我的密码:


    import requests
    from bs4 import BeautifulSoup

    for page in range(1,5):
        url = "https://courses.analyticsvidhya.com/collections?category=courses&page="+str(page)
        page_request = requests.get(url)
        data = page_request.content
        soup = BeautifulSoup(data,"html.parser")
        for courses in soup.find_all('div', {'class': 'collections__product-cards collections__product-cards___0b9ab'}):
            for course_name in soup.find_all('ul', {'class': 'products__list'}):
                for names in soup.find_all('li', {'class': 'products__list-item'}):
                    for divs in soup.find_all('div', {'class':'course-card__body'}):
                        for revs in soup.find_all('div', {'class': 'course-card__reviews'}):
                            reviews = soup.find('span', {'class': 'review__stars-count'})
                    title = soup.find('h3')
                    review = reviews.text
                    course_title = title.text
                    print(course_title + " "+str(review) +" "+ "https://courses.analyticsvidhya.com"+ names.find('a')['href'])

运行这个python脚本时的问题是,它总是给出相同的“课程标题”(课程名称)和评论。你知道吗


Tags: indivfortitlepage评论allfind
1条回答
网友
1楼 · 发布于 2024-05-12 23:07:32

import requests from bs4 import BeautifulSoup for page in range(1,6): url = "https://courses.analyticsvidhya.com/collections?category=courses&page="+str(page) page_request = requests.get(url) data = page_request.content soup = BeautifulSoup(data,"html.parser") for courses in soup.find_all('div', {'class': 'collections__product-cards collections__product-cards___0b9ab'}): for names in courses.find_all('li', {'class': 'products__list-item'}): for divs in names.find_all('div', {'class':'course-card__body'}): title = divs.find_all('h3') for revs in divs.find_all('div', {'class': 'course-card__reviews'}): rev=revs.find_all('span', {'class': 'review__stars-count'}) for i,j in zip(title,rev): title =i.text review=j.text print(title + " "+str(review) +" "+ "https://courses.analyticsvidhya.com"+ names.find('a')['href'])

我已经做了一些代码编辑,现在它可以取消课程名称,审查内容和链接。 enter image description here

相关问题 更多 >