将Python文本表转换为DF,然后转换为CSV

2024-04-19 12:17:49 发布

您现在位置:Python中文网/ 问答频道 /正文

[![在此处输入图像描述][1]][1][![在此处输入图像描述][1]][1]按下面的代码将HTML web表打印为python脚本输出。 然后我尝试将其转换为DF,然后导出到CSV,但失败了

import time
from bs4 import BeautifulSoup
from selenium import webdriver
import pandas as pd

url = 'http://www.altrankarlstad.com/wisp'

driver = webdriver.Chrome('C:\\Users\\rugupta\\AppData\\Roaming\\Microsoft\\Windows\\Start Menu\\Programs\\Python 3.7\\chromedriver.exe')

driver.get(url)
time.sleep(100) 

text_field = driver.find_elements_by_xpath('//*[@id="root"]/div/div/div/div[2]/table')
#print (text_field[0].text)
data= text_field[0].text
#Works fine until above section

df= pd.DataFrame(data)
df.to_csv("output.csv")
(but no success here)!

[![enter image description here][1]][1]


  [1]: https://i.stack.imgur.com/NpGk2.jpg

Tags: csvtextfrom图像importdivcomurl
1条回答
网友
1楼 · 发布于 2024-04-19 12:17:49

问题是selenium会检测到页面已经被加载,但是,您需要等待包含您试图获取的数据的表被加载。因此,您需要告诉selenium等待,直到找到表中的元素。对于这种特殊情况,表中的每个“job”都由一个名为“css-58”的特定类名定义。解决方案如下:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
import time
import pandas as pd


url = 'http://www.altrankarlstad.com/wisp'

driver = webdriver.Chrome("C:\\driver path")
driver.get(url)

# delay is how long to wait on loading the page before it gives up
delay = 600

try:
    wait_for_element = WebDriverWait(driver, delay).until(EC.presence_of_element_located((By.CLASS_NAME, 'css-58')))
    text_field = driver.find_elements_by_xpath('//*[@id="root"]/div/div/div/div[2]/table')
    data= text_field[0].text

    # Create your dataframe here
    # This will currently fail due to the error
    # ValueError: DataFrame constructor not properly called!
    # You should be able to define the structure of your data frame to suit your needs
    df= pd.DataFrame(data)
    df.to_csv("output.csv")
except TimeoutException:
    print('It took too long')

在这一点上,您唯一需要做的就是弄清楚您希望如何定义数据帧的结构。你知道吗

相关问题 更多 >