如何在此网页上从Tableau中提取值

2024-05-20 16:24:48 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图从该网页中提取每个州和县的“流动性指数”值: https://www.cuebiq.com/visitation-insights-mobility-index/

首选输出是按日期列出所有可用地点和日期的地点(州/县)面板数据

还有另一个线程(How can I scrape tooltips value from a Tableau graph embedded in a webpage)有一个类似的问题。我试图在那里遵循解决方案,但似乎对我的情况不起作用

先谢谢你

(我尝试过的一种方法是下载从Tableau生成的PDF文件,该文件将包含所有县在特定日期的值。但是,我仍然需要找到一种方法来请求数据中的每个日期。无论如何,如果您有更好的主意,请告诉我)


Tags: 文件数据方法httpscom网页www指数
1条回答
网友
1楼 · 发布于 2024-05-20 16:24:48

此tableau数据url不返回任何数据。事实上,它只渲染值的图像(可能是画布),我猜它会根据坐标检测单击。很可能,这样做是为了缓存值并快速渲染

但是当你点击一个州时,它实际上会返回数据,但它似乎并不总是返回该州的结果(但对单个县有效)

我找到的解决方案是使用工具提示获取状态的数据。单击状态时,会生成如下请求:

POST https://public.tableau.com/{path}/{session_id}/commands/tabsrv/render-tooltip-server

使用以下表格参数:

worksheet: US Map - State - CMI
dashboard: CMI
tupleIds: [18]
vizRegionRect: {"r":"viz","x":496,"y":148,"w":0,"h":0,"fieldVector":null}
allowHoverActions: false
allowPromptText: true
allowWork: false
useInlineImages: true

其中tupleIds: [18]指的是按如下逆字母顺序排列的状态列表中的状态索引:

stateNames = ["Wyoming","Wisconsin","West Virginia","Washington","Virginia","Vermont","Utah","Texas","Tennessee","South Dakota","South Carolina","Rhode Island","Pennsylvania","Oregon","Oklahoma","Ohio","North Dakota","North Carolina","New York","New Mexico","New Jersey","New Hampshire","Nevada","Nebraska","Montana","Missouri","Mississippi","Minnesota","Michigan","Massachusetts","Maryland","Maine","Louisiana","Kentucky","Kansas","Iowa","Indiana","Illinois","Idaho","Georgia","Florida","District of Columbia","Delaware","Connecticut","Colorado","California","Arkansas","Arizona","Alabama"]

它提供了一个json和工具提示的html,其中包含要提取的CMI和YoY值:

{
    "vqlCmdResponse": {
        "cmdResultList": [{
            "commandName": "tabsrv:render-tooltip-server",
            "commandReturn": {
                "tooltipText": "{\"htmlTooltip\": \"<HTML HERE WITH THE VALUES>\"}]},\"overlayAnchors\":[]}"
            }
        }]
    }
}

唯一需要注意的是,您必须对每个州提出一个请求:

import requests
from bs4 import BeautifulSoup
import json
import time

data_host = "https://public.tableau.com"

r = requests.get(
    f"{data_host}/views/CMI-2_0/CMI",
    params= {
        ":showVizHome":"no",
    }
)
soup = BeautifulSoup(r.text, "html.parser")

tableauData = json.loads(soup.find("textarea",{"id": "tsConfigContainer"}).text)

dataUrl = f'{data_host}{tableauData["vizql_root"]}/bootstrapSession/sessions/{tableauData["sessionid"]}'

r = requests.post(dataUrl, data= {
    "sheet_id": tableauData["sheetId"],
})
data = []

stateNames = ["Wyoming","Wisconsin","West Virginia","Washington","Virginia","Vermont","Utah","Texas","Tennessee","South Dakota","South Carolina","Rhode Island","Pennsylvania","Oregon","Oklahoma","Ohio","North Dakota","North Carolina","New York","New Mexico","New Jersey","New Hampshire","Nevada","Nebraska","Montana","Missouri","Mississippi","Minnesota","Michigan","Massachusetts","Maryland","Maine","Louisiana","Kentucky","Kansas","Iowa","Indiana","Illinois","Idaho","Georgia","Florida","District of Columbia","Delaware","Connecticut","Colorado","California","Arkansas","Arizona","Alabama"]

for stateIndex, state in enumerate(stateNames):
    time.sleep(0.5) #for throttling
    r = requests.post(f'{data_host}{tableauData["vizql_root"]}/sessions/{tableauData["sessionid"]}/commands/tabsrv/render-tooltip-server',
        data = {
        "worksheet": "US Map - State - CMI",
        "dashboard": "CMI",
        "tupleIds": f"[{stateIndex+1}]",
        "vizRegionRect": json.dumps({"r":"viz","x":496,"y":148,"w":0,"h":0,"fieldVector":None}),
        "allowHoverActions": "false",
        "allowPromptText": "true",
        "allowWork": "false",
        "useInlineImages": "true"
    })
    tooltip = json.loads(r.json()["vqlCmdResponse"]["cmdResultList"][0]["commandReturn"]["tooltipText"])["htmlTooltip"]
    soup = BeautifulSoup(tooltip, "html.parser")
    rows = [ 
        t.find("tr").find_all("td")
        for t in soup.find_all("table")
    ]
    entry = { "state": state }
    for row in rows:
        if (row[0].text == "Mobility Index:"):
            entry["CMI"] = "".join([t.text.strip() for t in row[1:]])
        if row[0].text == "YoY (%):":
            entry["YoY"] = "".join([t.text.strip() for t in row[1:]])
    print(entry)
    data.append(entry)

print(data)

Try this on repl.it

要获取县信息,它与使用select端点的this post相同,该端点为您提供与您在问题中链接的帖子相同格式的数据

以下内容将提取所有县和州的数据:

import requests
from bs4 import BeautifulSoup
import json
import time

data_host = "https://public.tableau.com"
worksheet = "US Map - State - CMI"
dashboard = "CMI"

r = requests.get(
    f"{data_host}/views/CMI-2_0/CMI",
    params= {
        ":showVizHome":"no",
    }
)
soup = BeautifulSoup(r.text, "html.parser")

tableauData = json.loads(soup.find("textarea",{"id": "tsConfigContainer"}).text)

dataUrl = f'{data_host}{tableauData["vizql_root"]}/bootstrapSession/sessions/{tableauData["sessionid"]}'

r = requests.post(dataUrl, data= {
    "sheet_id": tableauData["sheetId"],
})
data = []

stateNames = ["Wyoming","Wisconsin","West Virginia","Washington","Virginia","Vermont","Utah","Texas","Tennessee","South Dakota","South Carolina","Rhode Island","Pennsylvania","Oregon","Oklahoma","Ohio","North Dakota","North Carolina","New York","New Mexico","New Jersey","New Hampshire","Nevada","Nebraska","Montana","Missouri","Mississippi","Minnesota","Michigan","Massachusetts","Maryland","Maine","Louisiana","Kentucky","Kansas","Iowa","Indiana","Illinois","Idaho","Georgia","Florida","District of Columbia","Delaware","Connecticut","Colorado","California","Arkansas","Arizona","Alabama"]

for stateIndex, state in enumerate(stateNames):
    time.sleep(0.5) #for throttling
    r = requests.post(f'{data_host}{tableauData["vizql_root"]}/sessions/{tableauData["sessionid"]}/commands/tabsrv/render-tooltip-server',
        data = {
        "worksheet": worksheet,
        "dashboard": dashboard,
        "tupleIds": f"[{stateIndex+1}]",
        "vizRegionRect": json.dumps({"r":"viz","x":496,"y":148,"w":0,"h":0,"fieldVector":None}),
        "allowHoverActions": "false",
        "allowPromptText": "true",
        "allowWork": "false",
        "useInlineImages": "true"
    })
    tooltip = json.loads(r.json()["vqlCmdResponse"]["cmdResultList"][0]["commandReturn"]["tooltipText"])["htmlTooltip"]
    soup = BeautifulSoup(tooltip, "html.parser")
    rows = [ 
        t.find("tr").find_all("td")
        for t in soup.find_all("table")
    ]
    entry = { "state": state }
    for row in rows:
        if (row[0].text == "Mobility Index:"):
            entry["CMI"] = "".join([t.text.strip() for t in row[1:]])
        if row[0].text == "YoY (%):":
            entry["YoY"] = "".join([t.text.strip() for t in row[1:]])

    r = requests.post(f'{data_host}{tableauData["vizql_root"]}/sessions/{tableauData["sessionid"]}/commands/tabdoc/select',
        data = {
        "worksheet": worksheet,
        "dashboard": dashboard,
        "selection": json.dumps({
            "objectIds":[stateIndex+1],
            "selectionType":"tuples"
        }),
        "selectOptions": "select-options-simple"
    })
    entry["county_data"] = r.json()["vqlCmdResponse"]["layoutStatus"]["applicationPresModel"]["dataDictionary"]["dataSegments"]
    print(entry)
    data.append(entry)


print(data)

相关问题 更多 >