如何为Google电子表格中的每一行重复一个脚本?

2024-04-25 13:53:37 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在制作一个脚本,它可以访问一个页面并获取一些数据。此页面的URL是从Google电子表格加载的。我想对A列中包含文本的每个单元格重复此脚本

列A有多个行,这些行都包含不同的URL: A1:https://www.bol.com/nl/p/m-line-athletic-pillow/9200000042954350/?suggestionType=typedsearch&bltgh=oOLF6wrL80g-ozfXiYFIZg.1.2.ProductImage A2:https://www.bol.com/nl/p/apollo-bonell-matras-90x200-cm-medium/9200000046271731/?suggestionType=typedsearch&bltgh=i745aole4Xm4c6Gl23BM3w.1.2.ProductTitle A3:等等 ... 你知道吗

脚本只在A1上工作,我如何定制它以便它在所有行上重复?请帮帮我!你知道吗

我想创建一个“for循环”,但它不起作用。你知道吗

from selenium import webdriver
import time
from bs4 import BeautifulSoup
import gspread
from oauth2client.service_account import ServiceAccountCredentials
import datetime
import re

scope = ["https://spreadsheets.google.com/feeds",'https://www.googleapis.com/auth/spreadsheets',"https://www.googleapis.com/auth/drive.file","https://www.googleapis.com/auth/drive"]
creds = ServiceAccountCredentials.from_json_keyfile_name("/Users/Jeffrey/Downloads/bolscraper.json", scope)
client = gspread.authorize(creds)
sheet = client.open("Scraper")
results = sheet.sheet1
itemList = sheet.worksheet('LoadThisList')
date = str(datetime.date.today().strftime("%d-%m-%Y"))


def inject_scraping():
   browser = webdriver.Chrome('/Users/Jeffrey/Downloads/chromedriver')
   browser.get(itemlist.acell('A1').value)
   time.sleep(1)
   browser.find_element_by_xpath('//*[@id="quantityDropdown"]').send_keys('5')
   time.sleep(1)
   browser.find_element_by_xpath('//*[@id="quantityDropdown"]').send_keys('meer')
   time.sleep(1)
   browser.find_element_by_css_selector('.text-input--two-digits').click()
   time.sleep(0.5)
   browser.find_element_by_css_selector('.text-input--two-digits').send_keys('00')
   time.sleep(0.5)
   browser.find_element_by_link_text('OK').click()
   time.sleep(0.5)
   browser.find_element_by_partial_link_text("In winkelwagen").click()
   time.sleep(2)
   page_source = browser.page_source
   soup = BeautifulSoup(page_source, 'lxml')
   browser.find_element_by_css_selector('.modal__window--close-hitarea').click()
   page_soup = BeautifulSoup(page_source, "html.parser")
   seller = page_soup.select_one('div.buy-block__seller > div > a')
   sellertext = seller.findAll(text=True)
   sellername = str(sellertext)
   actualseller = re.sub("[^a-zA-Z0-9\s:]", "", sellername)
   bucket = page_soup.select_one('#basket')
   bucketnumber = bucket.findAll(text=True)
   bucketDef = str(bucketnumber)
   bucketactual = re.sub("\D", "", bucketDef)

   producttitle = page_soup.select_one('body > div.main > div > div.constrain.constrain--main.h-bottom--m > div.pdp-header.slot.slot--pdp-header.js_slot-title > h1 > span')
   producttitleText = producttitle.findAll(text=True)
   producttitleDef = str(producttitleText)
   actualproducttitle = re.sub("[\[\]\']", "", producttitleDef)
   productprice = page_soup.select_one('body > div.main > div.product_page_two-column > div.constrain.constrain--main.h-bottom--m > div.\[.fluid-grid.fluid-grid--rwd--l.\].new_productpage > div:nth-child(2) > div.slot.slot--buy-block.slot--seperated > div > wsp-visibility-switch > section > section > div > div > span')
   productpriceText = productprice.findAll(text=True)
   productpriceDef = str(productpriceText)
   actualprice = re.sub("[\D]", "", productpriceDef)

   newRow = [date, actualseller, bucketactual, actualprice, actualproducttitle]
   results.append_row(newRow)
inject_scraping()

Tags: texthttpsimportdivbrowsercombytime
1条回答
网友
1楼 · 发布于 2024-04-25 13:53:37

你建议使用for循环应该可以完成这项工作。将所有单元格名称放入一个列表中,并按如下所示循环遍历该列表

注意:请确保将硬编码的cellname“A1”替换为“cell”(在循环中定义),否则它仍将仅对该单个(硬编码)单元格运行。你可能早就忘了这一步。你知道吗

from selenium import webdriver
import time
from bs4 import BeautifulSoup
import gspread
from oauth2client.service_account import ServiceAccountCredentials
import datetime
import re

scope = ["https://spreadsheets.google.com/feeds",'https://www.googleapis.com/auth/spreadsheets',"https://www.googleapis.com/auth/drive.file","https://www.googleapis.com/auth/drive"]
creds = ServiceAccountCredentials.from_json_keyfile_name("/Users/Jeffrey/Downloads/bolscraper.json", scope)
client = gspread.authorize(creds)
sheet = client.open("Scraper")
results = sheet.sheet1
itemList = sheet.worksheet('LoadThisList')
date = str(datetime.date.today().strftime("%d-%m-%Y"))

cells = ['A1', 'A2', 'A3']

def inject_scraping():
    for cell in cells:
        browser = webdriver.Chrome('/Users/Jeffrey/Downloads/chromedriver')
        browser.get(itemlist.acell(cell).value)
        ## ... Rest of your scraper code ...
        browser.close()

在扩展中,您可以编写一个类似的函数来填充“cells”列表,这样就不必像上面那样将它们硬编码。你知道吗

确保输入browser.close()以关闭webdriver。或者更好的方法是用setup()teardown()方法定义一个类,您可以在其中定义这些东西。你知道吗

相关问题 更多 >

    热门问题