我想从以下网站获取所有权数据:
https://www.usnewsdeserts.com/states/california/#1536357227283-a4a9d6e4-ccf9
我使用的代码如下所示:
import requests
from bs4 import BeautifulSoup
import json
import re
import random
url = "https://public.tableau.com/vizql/w/TopOwnersCalifornia/v/Owners/bootstrapSession/sessions/5E565C4C5F7D462BBE8DFEE9246F846E-0:0"
header = random.choice(user_agent_list)
url = "https://public.tableau.com/vizql/w/TopOwnersCalifornia/v/Owners/bootstrapSession/sessions/5E565C4C5F7D462BBE8DFEE9246F846E-0:0"
header = random.choice(user_agent_list)
HEADERS = {"User-Agent": header}
params = {"stickySessionKey": {"dataserverPermissions":"44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a"}}
r = requests.post(url, params=params, headers = HEADERS)
soup = BeautifulSoup(r.text, "html.parser")
print(soup)
我得到:
<br/>
2020-12-12 12:41:46.829
(X9S6ik90vQizHF9Qa-S@CwAAAUk,0:0)
如何获取这些数据
我做了一个tableau scraper library来从Tableau工作表中提取数据。您只需在developer tools的网络选项卡中找到tableau URL,在本例中:
您可以使用以下代码提取数据:
run in this repl.it
相关问题 更多 >
编程相关推荐