我需要在中注册值的保存。csv,但是每个产品中的值的数量发生了变化,我无法理解如何正确地做,所以每个值都记录在自己的参数下,如文件所示,请告诉我
我还将附上一个文件,以便更容易理解我需要什么
from bs4 import BeautifulSoup
import requests
import time
HOST = 'https://samara.vseinstrumenti.ru'
URL = 'https://samara.vseinstrumenti.ru/santehnika/vse-dlya-vodosnabzheniya/avtonomnaya-kanalizatsiya/'
HEADERS = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.3'}
def get_html(url, params=None):
r = requests.get(url, headers=HEADERS, params=params)
return r
def get_url(html):
soup = BeautifulSoup(html, 'html.parser')
urls = soup.find_all('div',class_='product-tile grid-item')
for item in urls:
time.sleep(5)
data_collection(HOST + item.find(class_='title').find('a').get('href'))
def get_name(html):
soup = BeautifulSoup(html, 'html.parser')
name = soup.find('h1',class_='title').text
return name
def get_description(html):
soup = BeautifulSoup(html, 'html.parser')
description = soup.find('div',itemprop="description").text
return description
def get_specifications_parameter(html):
soup = BeautifulSoup(html, 'html.parser')
dotted_list = soup.find('ul',class_='dotted-list')
parameters = dotted_list.find_all('span',class_='text')
return parameters
def get_specifications_meaning(html):
soup = BeautifulSoup(html, 'html.parser')
dotted_list = soup.find('ul',class_='dotted-list')
meaning = dotted_list.find_all('span',class_='value')
return meaning
def get_photo(html):
soup = BeautifulSoup(html, 'html.parser')
photo = soup.find('div',class_="item -active").find('img').get('src')
return photo
def get_price(html):
soup = BeautifulSoup(html, 'html.parser')
price = soup.find('span',class_='current-price').text
return price
def data_collection(URL):
html = get_html(URL)
name = get_name(html.text)
description = get_description(html.text)
specifications_parameter = get_specifications_parameter(html.text)
meaning = get_specifications_meaning(html.text)
# photo = get_photo(html.text)
price = get_price(html.text)
def start():
html = get_html(URL)
if html.status_code == 200:
get_url(html.text)
else:
print('Network error')
start()
我试过这么做,但不是这样的
def save_file_walid(items, path):
with open(path, 'w', newline='') as file:
writer = csv.writer(file, delimiter=';')
for item in items:
writer.writerow(item)
https://drive.google.com/file/d/1uGoW1kpsDGDA-Zh7SiiCDcg9cf2lHQUd/view?usp=sharing
我想知道更多的真实情况。 首先,源路径的名称是否正确?我的意思是,例如,正确的路径名应该是:
查看您的代码,在data_collection函数的末尾,您可以添加:
将数据存储到csv文件中。然后,在save_file_walid()中,如果只写入数据列表,则不需要使用for循环。你只需要:
最后,在存储数据之前,您只能在代码中的某个位置添加一次:
使用每个列的名称创建文件(如果尚未创建)
希望这对你有帮助)
相关问题 更多 >
编程相关推荐