从si解析表

2024-04-27 03:55:03 发布

您现在位置:Python中文网/ 问答频道 /正文

有一个站点https://ru.myip.ms/browse/market_bitcoin/%D0%91%D0%B8%D1%82%D0%BA%D0%BE%D0%B8%D0%BD_%D0%B8%D1%81%D1%82%D0%BE%D1%80%D0%B8%D1%8F_%D1%86%D0%B5%D0%BD.html#a,下面是一个BTC价格表,我需要像这样解析这个表。我本来想做的,但不知什么原因,价格表上显示的不是圆点

from time import sleep
import pandas as pd
import requests

host = 'ru.myip.ms'
index_url = 'https://ru.myip.ms'
home_url = "https://ru.myip.ms/browse/market_bitcoin/%D0%91%D0%B8%D1%82%D0%BA%D0%BE%D0%B8%D0%BD_%D0%B8%D1%81%D1%82%D0%BE%D1%80%D0%B8%D1%8F_%D1%86%D0%B5%D0%BD.html#a"
base_ajax_url = "https://ru.myip.ms/ajax_table/market_bitcoin/{page}"


with requests.Session() as session:
    session.headers = {
        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36',
        'Host': host
    }

    # visit home page and parse the initial dataframe
    response = session.get(home_url)

    df = pd.read_html(response.text, attrs={"id": "market_bitcoin_tbl"})[0]
    df = df.rename(columns=lambda x: x.strip())  # remove extra newlines from the column names

    sleep(2)

    # start paginating with page=2
    page = 1
    while True:
        url = base_ajax_url.format(page=page)
        print("Processing {url}...".format(url=url))

        response = session.post(url,
                                data={'getpage': 'yes', 'lang': 'ru'},
                                headers={
                                    'X-Requested-With': 'XMLHttpRequest',
                                    'Origin': index_url,
                                    'Referer': home_url
                                })

        # add data to the existing dataframe
        try:
            new_df = pd.read_html("<table>{0}</table>".format(response.text))[0]
        except ValueError:  # could not extract data from HTML - last page?
            break

        new_df.columns = df.columns
        df = pd.concat([df, new_df])

        page += 1
        sleep(1)


print(df)

Tags: httpsurldfhomeresponsesessionhtmlru
1条回答
网友
1楼 · 发布于 2024-04-27 03:55:03

你做得对。你已经有结果了。 试着这样做看看结果

print(df['Bitcoin Price'])

你看到了点,只是因为df很大,当你运行它的时候,它能显示所有的东西,但是它是存在的

相关问题 更多 >