我正在制作一个脚本,它可以访问一个页面并获取一些数据。此页面的URL是从Google电子表格加载的。我想对A列中包含文本的每个单元格重复此脚本
列A有多个行,这些行都包含不同的URL: A1:https://www.bol.com/nl/p/m-line-athletic-pillow/9200000042954350/?suggestionType=typedsearch&bltgh=oOLF6wrL80g-ozfXiYFIZg.1.2.ProductImage A2:https://www.bol.com/nl/p/apollo-bonell-matras-90x200-cm-medium/9200000046271731/?suggestionType=typedsearch&bltgh=i745aole4Xm4c6Gl23BM3w.1.2.ProductTitle A3:等等 ... 你知道吗
脚本只在A1上工作,我如何定制它以便它在所有行上重复?请帮帮我!你知道吗
我想创建一个“for循环”,但它不起作用。你知道吗
from selenium import webdriver
import time
from bs4 import BeautifulSoup
import gspread
from oauth2client.service_account import ServiceAccountCredentials
import datetime
import re
scope = ["https://spreadsheets.google.com/feeds",'https://www.googleapis.com/auth/spreadsheets',"https://www.googleapis.com/auth/drive.file","https://www.googleapis.com/auth/drive"]
creds = ServiceAccountCredentials.from_json_keyfile_name("/Users/Jeffrey/Downloads/bolscraper.json", scope)
client = gspread.authorize(creds)
sheet = client.open("Scraper")
results = sheet.sheet1
itemList = sheet.worksheet('LoadThisList')
date = str(datetime.date.today().strftime("%d-%m-%Y"))
def inject_scraping():
browser = webdriver.Chrome('/Users/Jeffrey/Downloads/chromedriver')
browser.get(itemlist.acell('A1').value)
time.sleep(1)
browser.find_element_by_xpath('//*[@id="quantityDropdown"]').send_keys('5')
time.sleep(1)
browser.find_element_by_xpath('//*[@id="quantityDropdown"]').send_keys('meer')
time.sleep(1)
browser.find_element_by_css_selector('.text-input--two-digits').click()
time.sleep(0.5)
browser.find_element_by_css_selector('.text-input--two-digits').send_keys('00')
time.sleep(0.5)
browser.find_element_by_link_text('OK').click()
time.sleep(0.5)
browser.find_element_by_partial_link_text("In winkelwagen").click()
time.sleep(2)
page_source = browser.page_source
soup = BeautifulSoup(page_source, 'lxml')
browser.find_element_by_css_selector('.modal__window--close-hitarea').click()
page_soup = BeautifulSoup(page_source, "html.parser")
seller = page_soup.select_one('div.buy-block__seller > div > a')
sellertext = seller.findAll(text=True)
sellername = str(sellertext)
actualseller = re.sub("[^a-zA-Z0-9\s:]", "", sellername)
bucket = page_soup.select_one('#basket')
bucketnumber = bucket.findAll(text=True)
bucketDef = str(bucketnumber)
bucketactual = re.sub("\D", "", bucketDef)
producttitle = page_soup.select_one('body > div.main > div > div.constrain.constrain--main.h-bottom--m > div.pdp-header.slot.slot--pdp-header.js_slot-title > h1 > span')
producttitleText = producttitle.findAll(text=True)
producttitleDef = str(producttitleText)
actualproducttitle = re.sub("[\[\]\']", "", producttitleDef)
productprice = page_soup.select_one('body > div.main > div.product_page_two-column > div.constrain.constrain--main.h-bottom--m > div.\[.fluid-grid.fluid-grid--rwd--l.\].new_productpage > div:nth-child(2) > div.slot.slot--buy-block.slot--seperated > div > wsp-visibility-switch > section > section > div > div > span')
productpriceText = productprice.findAll(text=True)
productpriceDef = str(productpriceText)
actualprice = re.sub("[\D]", "", productpriceDef)
newRow = [date, actualseller, bucketactual, actualprice, actualproducttitle]
results.append_row(newRow)
inject_scraping()
你建议使用for循环应该可以完成这项工作。将所有单元格名称放入一个列表中,并按如下所示循环遍历该列表
注意:请确保将硬编码的cellname“A1”替换为“cell”(在循环中定义),否则它仍将仅对该单个(硬编码)单元格运行。你可能早就忘了这一步。你知道吗
在扩展中,您可以编写一个类似的函数来填充“cells”列表,这样就不必像上面那样将它们硬编码。你知道吗
确保输入
browser.close()
以关闭webdriver。或者更好的方法是用setup()
和teardown()
方法定义一个类,您可以在其中定义这些东西。你知道吗相关问题 更多 >
编程相关推荐