在刮削数据后使用Pandas数据帧写入csv

2024-03-29 01:56:41 发布

您现在位置:Python中文网/ 问答频道 /正文

我成功地刮这个网站和数据刮正确。唯一的问题是导出到csv。我用熊猫导出数据,结果弄得乱七八糟。下面是我的代码:

while next_page is not None:

    results_element = driver.find_elements_by_xpath('//*[contains(concat( " ", @class, " " ), concat( " ", '
                                                    '"label-primary", " " ))]')

    results = [x.text for x in results_element]

    print(results)

    driver.implicitly_wait(5)

    ASIN_element = driver.find_elements_by_xpath(
        '//*[contains(concat( " ", @class, " " ), concat( " ", "asin-column", '
        '" " ))]//a')

    ASIN = [x.text for x in ASIN_element]
    print(ASIN)

    driver.implicitly_wait(5)

    Title_element = driver.find_elements_by_css_selector('.asin-column+ td')

    Title = [x.text for x in Title_element]
    print(Title)

    driver.implicitly_wait(5)

    Date_element = driver.find_elements_by_css_selector(
        '.format-date'), 10

    Date = [x for x in Date_element]
    print(Date)

    driver.implicitly_wait(5)

    df = pd.DataFrame(list(zip(results, ASIN, Title, Date)), columns=['results', 'ASIN', 'Product_Title', 'Date'])

    beach_balls_data = df.to_csv(f, index=False)

    if next_page is not None:
        driver.find_element_by_css_selector('.next a').click()
        driver.implicitly_wait(5)
    elif next_page is None:
        iterate = False
    driver.implicitly_wait(5)
    time.sleep(5)

我只需要正确导出数据而不覆盖任何内容。任何帮助都将不胜感激。你知道吗


Tags: infordatebytitledriverelementselement
1条回答
网友
1楼 · 发布于 2024-03-29 01:56:41

下方(不使用熊猫或任何其他图书馆)

# assuming the scraping output is the 4 lists below

results = ['r1', 'r2', 'r3']
asin_lst = ['asin1', 'asin2', 'asin3']
title_lst = ['t1', 't2', 't3']
date_lst = ['d1', 'd2', 'd3']

with open('out.csv','w') as f:
    f.write('result,asin,title,date\n')
    for entry in list(zip(results,asin_lst,title_lst,date_lst)):
        f.write(','.join(list(entry)) + '\n')

输出('输出.csv')

result,asin,title,date
r1,asin1,t1,d1
r2,asin2,t2,d2
r3,asin3,t3,d3

相关问题 更多 >