for循环中的pandas无法附加从selenium获取的dict

2024-06-02 07:50:40 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试抓取包含关键字的URL并将其保存到我的csv中。但是脚本无法附加它们

from selenium import webdriver
import pandas as pd
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

def find_film_link(link):
    if "film" in link:
        return True
    else:
        return False

film_list =  pd.read_csv("film_list.csv", index_col=0)

################################################################################

driver = webdriver.Chrome("D:\Documents\ADAM\Project\CSFD Bot\chromedriver.exe")
driver.get("https://www.csfd.cz/zebricky/nejlepsi-filmy/?show=complete")
elems = driver.find_elements_by_xpath("//a[@href]")
for elem in elems:
    scraped_link = elem.get_attribute("href")
    if find_film_link(scraped_link) == True:
        film_list_updated = film_list.append({"link": scraped_link}, ignore_index=True)
        print(film_list_updated)
        film_list_updated.to_csv("film_list.csv")
    else:
        pass
driver.quit()

.csv已经包含一些手动条目(前8个)。执行脚本后,.csv最终看起来是这样的(只添加了一个链接,但添加了三次?):

0   https://www.csfd.cz/film/231260-star-wars-klon...
1   https://www.csfd.cz/film/820012-drsny-mesto/pr...
2   https://www.csfd.cz/film/902757-damsky-gambit/...
3   https://www.csfd.cz/film/622365-the-mandaloria...
4   https://www.csfd.cz/film/281929-borat-subseque...
5   https://www.csfd.cz/film/818525-delete-history...
6   https://www.csfd.cz/film/4952-kocar-do-vidne/p...
7   https://www.csfd.cz/film/823303-last-and-first...
8    https://www.csfd.cz/film/43582-posledni-samuraj/
9    https://www.csfd.cz/film/43582-posledni-samuraj/
10   https://www.csfd.cz/film/43582-posledni-samuraj/

任何帮助都将不胜感激


1条回答
网友
1楼 · 发布于 2024-06-02 07:50:40

试试这个

from selenium import webdriver
import pandas as pd
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

film_list =  pd.read_csv("film_list.csv", index_col=0)
driver = webdriver.Chrome("D:\Documents\ADAM\Project\CSFD Bot\chromedriver.exe")
driver.get("https://www.csfd.cz/zebricky/nejlepsi-filmy/?show=complete")
elems = driver.find_elements_by_xpath("//a[contains(@href, 'film')]")

for elem in elems:
    film_list_updated = film_list.append({"link": elem.get_attribute("href")}, ignore_index=True)
    film_list_updated.to_csv("film_list.csv", mode='a', header=False)

driver.quit()

相关问题 更多 >