刮削外部存储的表。可能吗？

2条回答

网友

1楼 · 编辑于 2024-06-08 18:46:16

我用BeautifulSoup试过了，似乎你不能处理这些表中的值，因为它们不在网站上，而是存储在外部（？）

编辑：

https://analytics.zoho.com/open-view/938032000481034014

这是存储表及其数据的链接

所以我试着用bs4从中刮取数据，结果它成功了。行的类是"zdbDataRowDiv" 尝试：

container = page_soup.findAll("div","class":"zdbDataRowDiv")

代码说明：

container   # the variable where your data is stored, name it how you like
page_soup   # your html page you souped with BeautifulSoup
findAll("tag",{"attribute":"value"})   # this function finds every tag which has the specific value inside its attribute

网友

2楼 · 编辑于 2024-06-08 18:46:16

它们以json格式存储在<script>标记中。只需将其取出并进行分析：

from bs4 import BeautifulSoup
import pandas as pd
import requests
import json


url = 'https://flo.uri.sh/visualisation/4540617/embed'

response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
scripts = soup.find_all('script')

for script in scripts:
    if 'var _Flourish_data_column_names = ' in script.text:
        json_str = script.text
        
        col_names = json_str.split('var _Flourish_data_column_names = ')[-1].split(',\n')[0]
        cols = json.loads(col_names)
        data = json_str.split('_Flourish_data = ')[-1].split(',\n')[0]
    
        loop=True
        while loop == True:
            try:
                jsonData = json.loads(data)
                loop = False
                break
            except:
                data = data.rsplit(';',1)[0]
    
rows = []
headers = cols['rows']['columns']
for row in jsonData['rows']:
    rows.append(row['columns'])
    
    
table = pd.DataFrame(rows,columns=headers)
for col in headers[1:]:
    table.loc[table[col] != '', col] = 'A'

输出：

print (table)

                           Company Climate change Forests Water security
0                           Danone              A       A              A
1                     FIRMENICH SA              A       A              A
2           FUJI OIL HOLDINGS INC.              A       A              A
3                           HP Inc              A       A              A
4                  KAO Corporation              A       A              A
..                             ...            ...     ...            ...
308             Woolworths Limited              A                       
309                Workspace Group              A                       
310  Yokogawa Electric Corporation              A                      A
311      Yuanta Financial Holdings              A                       
312                     Zalando SE              A                       

[313 rows x 4 columns]

相关问题更多 >

编程相关推荐

热门问题

热门文章

刮削外部存储的表。可能吗？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >