用“更多”tex刮评论

2024-05-16 14:18:49 发布

您现在位置:Python中文网/ 问答频道 /正文

正如标题所述,我需要帮助从这个名为TripAdivsor的网站抓取评论。我使用的特定链接是https://www.tripadvisor.co.uk/Restaurant_Review-g60834-d4106745-Reviews-McDonald_s-Page_Arizona.html

问题是,在某些评论中,有“更多”文本可以查看其余的评论(例如,上面链接上的第二次评论)。我如何才能刮的评论,其中包含这个'更多'的文字

有没有一种方法可以让我在点击链接时打开它们,或者这是一个找到包含整个评论的正确标签的问题


Tags: https标题网站链接www评论restaurantreview
2条回答

使用硒和美丽的汤。检查更多的按钮,如果有点击,并获得网页的来源

from selenium import webdriver
from bs4 import BeautifulSoup
import time
driver = webdriver.Chrome()
driver.get('https://www.tripadvisor.co.uk/Restaurant_Review-g60834-d4106745-Reviews-McDonald_s-Page_Arizona.html')
if len(driver.find_elements_by_xpath("//span[@class='taLnk ulBlueLinks'][contains(.,'More')]"))>0:
    driver.find_elements_by_xpath("//span[@class='taLnk ulBlueLinks'][contains(.,'More')]")[0].click()

time.sleep(3)
soup=BeautifulSoup(driver.page_source,'html.parser')
driver.quit()
items=[item.text for item in soup.select("p.partial_entry")]
print(items)

输出:

['Stopped by to get some chicken strips to go.  They were out of soft drinks, but I was getting coffee.  Restrooms were clean.', "We live in page Arizona and go to McDonald's on the occasion that we don't want to cook but almost every time that we stop in the service is horrible. There has been times where the drive thru would not say anything to us until we decided to drive back around to really let them know we were ready to order food. The manager whom i have talked to on multiple occasions acts like it's bo big deal that their restaurant shows no respect for the customers. Finally i decided to write a review before calling corporate. I understand not wanting or liking your job at McDonald's but you made the life decisions to be where you are the least you could do is show some respect for your customers especially the locals of this tourist town.", 'The location was newer, clean and kept up very well. The hot fudge sundaes were great . Stopped by for a snack', 'We stopped in to grab a little snack before heading to Horseshoe Bend. My husband got a double cheeseburger, I ordered an apple pie. His burger was fine. The apples in the pie were all shriveled up. It looked old. I looked at the time on the box and it had expired 4 hours before. I walked back in and asked for a new one, explaining the one they just gave me was quite old. Then he handed me one and said try this one. I looked at the date and it expired 2 hours before. I asked if the had any fresh ones. He went into the back for awhile and came out with a new one.', 'I like the coffee, there was few times they messed up coffee 3x in a row. but its okay i had patience for them to get it right. I only like their fries, coffee, and a very few sandwiches. plus the nuggets. clean restrooms. clean tables but rude managers', 'Ordered mg nuggets and Big Mac for two and waited 25 minutes I decided to go ask for a refund or compensation but the manager did not want He said if I refund you ,you will not have your mealI find that not acceptable to wait that long and Big Macs were coldI am a big traveller and never saw a Manager like that Don’t go there Go to Taco Bell ...', "the employees were very fast and efficient at the service they provided whilst giving me my food. McDonald's is always reliable whenever you want a quick snack.", "It is a newer looking location with a huge amount of parking. The dining area was very large and quite clean. The service was very good. The food was just like any other McD's.", 'win i eat at the best restaurant the meals are the best i love the fries it gives me taste of joy . i like to eat their again i like to eat their win im on the road and i like to never stop eating its my great place to eat', "This is a new facility in what looks like a newer area of Page. Typical McDonald's but great service and new building makes this a good stop if you are looking for a quick fill up."]

目前您无法获得评论的全文,因为它没有包含在html中

您可以通过以下方式获得:

  • 废纸
  • 查找所有评论
  • 如果评论有“更多”链接:
    • 获取id
    • 取消“评论url”

代码:

import requests
from bs4 import BeautifulSoup as soup

website = "https://www.tripadvisor.co.uk/"
r_review_str = "Restaurant_Review-"
u_review_str = "ShowUserReviews-"
restaurant_id = "g60834-d4106745"
restaurant_name = "McDonald_s-Page_Arizona"

base_url = website + r_review_str + restaurant_id + " -Reviews-" + restaurant_name + ".html"
req = requests.get(base_url)
page = soup(req.text,'html.parser')

reviews_text =[]
reviews = page.find_all('div',{'class':'reviewSelector'})
for r in reviews:
    r_id = r.get('id').replace('review_','')
    p_text = r.find('p',{'class':'partial_entry'})
    text = ""
    if p_text.find('span',{'class':'ulBlueLinks'}):
        url = website + u_review_str + restaurant_id + "-r" + r_id + "-" + restaurant_name + ".html"
        req_u = requests.get(url)
        page_u = soup(req_u.text, "html.parser")
        text = page_u.find('div',{'id':'review_'+r_id}).find('p',{'class':'partial_entry'}).text
    else:
        text = p_text.text
    reviews_text.append(text)

from pprint import pprint
pprint(reviews_text)

输出:

['Stopped by to get some chicken strips to go.  They were out of soft drinks, '
 'but I was getting coffee.  Restrooms were clean.',
 "We live in page Arizona and go to McDonald's on the occasion that we don't "
 'want to cook but almost every time that we stop in the service is horrible. '
 'There has been times where the drive thru would not say anything to us until '
 'we decided to drive back around to really let them know we were ready to '
 'order food. The manager whom i have talked to on multiple occasions acts '
 "like it's bo big deal that their restaurant shows no respect for the "
 'customers. Finally i decided to write a review before calling corporate. I '
 "understand not wanting or liking your job at McDonald's but you made the "
 'life decisions to be where you are the least you could do is show some '
 'respect for your customers especially the locals of this tourist town.',
 'The location was newer, clean and kept up very well. The hot fudge sundaes '
 'were great . Stopped by for a snack',
 'We stopped in to grab a little snack before heading to Horseshoe Bend. My '
 'husband got a double cheeseburger, I ordered an apple pie. His burger was '
 'fine. The apples in the pie were all shriveled up. It looked old. I looked '
 'at the time on the box and it had expired 4 hours before. I walked back in '
 'and asked for a new one, explaining the one they just gave me was quite old. '
 'Then he handed me one and said try this one. I looked at the date and it '
 'expired 2 hours before. I asked if the had any fresh ones. He went into the '
 'back for awhile and came out with a new one.',
 'I like the coffee, there was few times they messed up coffee 3x in a row. '
 'but its okay i had patience for them to get it right. I only like their '
 'fries, coffee, and a very few sandwiches. plus the nuggets. clean restrooms. '
 'clean tables but rude managers',
 'Ordered mg nuggets and Big Mac for two and waited 25 minutes I decided to go '
 'ask for a refund or compensation but the manager did not want He said if I '
 'refund you ,you will not have your mealI find that not acceptable to wait '
 'that long and Big Macs were coldI am a big traveller and never saw a Manager '
 'like that Don’t go there Go to Taco Bell ...',
 'the employees were very fast and efficient at the service they provided '
 "whilst giving me my food. McDonald's is always reliable whenever you want a "
 'quick snack.',
 'It is a newer looking location with a huge amount of parking. The dining '
 'area was very large and quite clean. The service was very good. The food was '
 "just like any other McD's.",
 'win i eat at the best restaurant the meals are the best i love the fries it '
 'gives me taste of joy . i like to eat their again i like to eat their win im '
 'on the road and i like to never stop eating its my great place to eat',
 'This is a new facility in what looks like a newer area of Page. Typical '
 "McDonald's but great service and new building makes this a good stop if you "
 'are looking for a quick fill up.']

相关问题 更多 >