刮削外部存储的表。可能吗?

2024-06-08 18:46:16 发布

您现在位置:Python中文网/ 问答频道 /正文

我正试图从这个网页上为大学的一个项目刮取一个zoho分析表。目前我还不知道。我看不到inspect中的值,因此无法在Python中使用Beautifulsoup(我最喜欢的一个)

enter image description here

有人知道吗

非常感谢

约瑟夫


Tags: 项目image网页heredescription大学enterinspect
2条回答

我用BeautifulSoup试过了,似乎你不能处理这些表中的值,因为它们不在网站上,而是存储在外部(?)

编辑:

https://analytics.zoho.com/open-view/938032000481034014

这是存储表及其数据的链接

所以我试着用bs4从中刮取数据,结果它成功了。 行的类是"zdbDataRowDiv" 尝试:

container = page_soup.findAll("div","class":"zdbDataRowDiv")

代码说明:

container   # the variable where your data is stored, name it how you like
page_soup   # your html page you souped with BeautifulSoup
findAll("tag",{"attribute":"value"})   # this function finds every tag which has the specific value inside its attribute

它们以json格式存储在<script>标记中。只需将其取出并进行分析:

from bs4 import BeautifulSoup
import pandas as pd
import requests
import json


url = 'https://flo.uri.sh/visualisation/4540617/embed'

response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
scripts = soup.find_all('script')

for script in scripts:
    if 'var _Flourish_data_column_names = ' in script.text:
        json_str = script.text
        
        col_names = json_str.split('var _Flourish_data_column_names = ')[-1].split(',\n')[0]
        cols = json.loads(col_names)
        data = json_str.split('_Flourish_data = ')[-1].split(',\n')[0]
    
        loop=True
        while loop == True:
            try:
                jsonData = json.loads(data)
                loop = False
                break
            except:
                data = data.rsplit(';',1)[0]
    
rows = []
headers = cols['rows']['columns']
for row in jsonData['rows']:
    rows.append(row['columns'])
    
    
table = pd.DataFrame(rows,columns=headers)
for col in headers[1:]:
    table.loc[table[col] != '', col] = 'A'

输出:

print (table)

                           Company Climate change Forests Water security
0                           Danone              A       A              A
1                     FIRMENICH SA              A       A              A
2           FUJI OIL HOLDINGS INC.              A       A              A
3                           HP Inc              A       A              A
4                  KAO Corporation              A       A              A
..                             ...            ...     ...            ...
308             Woolworths Limited              A                       
309                Workspace Group              A                       
310  Yokogawa Electric Corporation              A                      A
311      Yuanta Financial Holdings              A                       
312                     Zalando SE              A                       

[313 rows x 4 columns]

相关问题 更多 >