我想从https://id.wikipedia.org/wiki/Demografi_Indonesia中刮取。我需要提取一个表
我使用这个脚本
#import library yang dibutuhkan
import requests
import urllib.request
import time
from bs4 import BeautifulSoup
import numpy as np
import pandas as pd
from urllib.request import urlopen
#buatlah request ke website
url = 'https://id.wikipedia.org/wiki/Demografi_Indonesia'
html = urlopen(url)
soup = BeautifulSoup(html, 'html.parser')
#ambil table dengan class 'wikitable sortable'
soup = soup.find("table",{"class":"wikitable sortable"})
#cari data dengan tag 'td'
cells = soup.find_all('td')
#buatlah lists kosong
bps = []
nama = []
ibu_kota = []
populasi = []
luas = []
pulau = []
#memasukkan data ke dalam list berdasarkan pola HTML
if len(cells) > 0:
bps = cells[0]
bps.append(int(bps.text))
nama = cells[2]
nama.append(nama.text.strip())
ibu_kota = cells[4]
ibu_kota.append(ibu_kota.text.strip())
populasi = cells[5]
populasi.append(process_num(populasi.text.strip()))
luas = cells[6]
luas.append(process_num(luas.text.strip()))
pulau = cells[8]
pulau.append(pulau.text.strip())
#buatlah DatFrame dan masukkan ke CSV
df = pd.DataFrame(bps)
但这是一个错误
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-51-6130f70f1b21> in <module>
31 if len(cells) > 0:
32 bps = cells[0]
---> 33 bps.append(int(bps.text))
34
35 nama = cells[2]
~\anaconda3\lib\site-packages\bs4\element.py in append(self, tag)
412 :param tag: A PageElement.
413 """
--> 414 self.insert(len(self.contents), tag)
415
416 def extend(self, tags):
~\anaconda3\lib\site-packages\bs4\element.py in insert(self, position, new_child)
364 new_child.extract()
365
--> 366 new_child.parent = self
367 previous_child = None
368 if position == 0:
AttributeError: 'int' object has no attribute 'parent'
我想要的输出是列:BPS代码、名称(Nama)、首都(Ibu Kota)、人口(Populati)、面积(luas)、岛屿(Pulau)
如何解决这种情况
您可以使用^{} 和} 按位置选择列,并通过列表设置列名称:
[2]
来提取第三个数据帧表单列表,通过^{相关问题 更多 >
编程相关推荐