我已经在def中导入了请求并运行了它。但是仍然有一个名字错误。。。 我一次导入所有项,并将所有函数汇总为一个
#import all the libraries
def import_all_modules():
from bs4 import BeautifulSoup as soup
import requests
import pandas as pd
from google.colab import drive
#Get
def get_html_from_url(url:str):
html=requests.get(url).content
return html
#load the html content
def load_page_and_filter(html):
soup_page=soup(html, "html")
shoes=soup_page.find_all("div",{"class":"good-box"})
return shoes
#Create Pandas DataFrame from HTML
def generate_detaframe_from_soup(soup):
names=[]
prices=[]
for shoe in soup:
names.append(shoe.a.span.text)
prices.append(shoe.div.p.text)
adidas_shoes_dict={
"Name":names,
"Price":prices
}
df= pd.DataFrame(data=adidas_shoes_dict)
df["Price"]=df["Price"].apply(lambda x: float(x.split("\xa0")[-1]))
return df
#save CSV
def save_csv(file_name, df):
drive.mount("/content/drive")
df.to_csv(file_name)
一次运行所有方法并将它们放在一个函数中
def run_web_scraping(url,file_name):
import_all_modules()
html=get_html_from_url(url)
soup=load_page_and_filter(html)
df=generate_detaframe_from_soup(soup)
save_csv(file_name, df)
已保存的URL和文件名,用于运行\u web\u报废
url="https://www.adidas.com.hk/men/shoes/basketball"
file_name="/content/drive/MyDrive/adidas.csv"
run_web_scraping(url,file_name)
导入绑定到当前作用域中,因此如果在函数中进行导入,则一旦退出该函数,导入将不可用
只要把它们放在文件的顶部,而不是放在函数中,就可以了
相关问题 更多 >
编程相关推荐