用漂亮的头发刮

2024-04-28 10:32:20 发布

您现在位置:Python中文网/ 问答频道 /正文

我是python的初学者,所以我想用beauthoulsoup来创建一个网站。页面源代码的一小部分是html:

<table class="swift" width="100%">
   <tr>
     <th class="no">ID</th>
     <th>Bank or Institution</th>
     <th>City</th>
     <th class="branch">Branch</th>
     <th>Swift Code</th>
   </tr>   <tr>
     <td align="center">101</td>
     <td>BANK LEUMI ROMANIA S.A.</td>
     <td>CONSTANTA</td>
     <td>(CONSTANTA BRANCH)</td>
     <td align="center"><a href="/romania/dafbro22cta/">DAFBRO22CTA</a></td>
   </tr>
   <tr>
     <td align="center">102</td>
     <td>BANK LEUMI ROMANIA S.A.</td>
     <td>ORADEA</td>
     <td>(ORADEA BRANCH)</td>
     <td align="center"><a href="/romania/dafbro22ora/">DAFBRO22ORA</a></td>
   </tr>

我设法把它们刮了,但结果是:

^{pr2}$

当我真的想要这样的时候:

ID, Bank or Institution, City, Branch, Swift Code

101, BANK LEUMI ROMANIA S.A., CONSTANTA, (CONSTANTA BRANCH) ,DAFBRO22CTA

102, BANK LEUMI ROMANIA S.A., ORADEA, (ORADEA BRANCH), DAFBRO22ORA

这是我的代码:

base_url = "https://www.theswiftcodes.com/"
nr = 0
page = 'page'
country = 'Romania'
while nr < 4:
    url_country = base_url + country + '/' + 'page' + "/" + str(nr) + "/"
    pages = requests.get(url_country)
    soup = BeautifulSoup(pages.text, 'html.parser')

    for script in soup.find_all('script'):
        script.extract()

    tabel = soup.find_all("table")
    text = ("".join([p.get_text() for p in tabel]))
    nr += 1
    print(text)

    file = open('swiftcodes.txt', 'a')
    file.write(text)
    file.close()

    file = open('swiftcodes.txt', 'r')
    for item in file:
        print(item)
    file.close()

Tags: textbranchurlnrtrfiletdbank
2条回答

这应该能起作用

from bs4 import BeautifulSoup

str = """<table class="swift" width="100%">
   <tr>
     <th class="no">ID</th>
     <th>Bank or Institution</th>
     <th>City</th>
     <th class="branch">Branch</th>
     <th>Swift Code</th>
   </tr>   <tr>
     <td align="center">101</td>
     <td>BANK LEUMI ROMANIA S.A.</td>
     <td>CONSTANTA</td>
     <td>(CONSTANTA BRANCH)</td>
     <td align="center"><a href="/romania/dafbro22cta/">DAFBRO22CTA</a></td>
   </tr>
   <tr>
     <td align="center">102</td>
     <td>BANK LEUMI ROMANIA S.A.</td>
     <td>ORADEA</td>
     <td>(ORADEA BRANCH)</td>
     <td align="center"><a href="/romania/dafbro22ora/">DAFBRO22ORA</a></td>
   </tr>"""

soup = BeautifulSoup(str)

for i in soup.find_all("tr"):
    result = ""
    for j in i.find_all("th"): # find all the header tags
        result += j.text + ", "
    for j in i.find_all("td"): # find the cell tags
        result += j.text + ", "
    print(result.rstrip(', ')) 

输出:

^{pr2}$
from bs4 import BeautifulSoup
import requests
r = requests.get('https://www.theswiftcodes.com/united-states/')
soup = BeautifulSoup(r.text, 'lxml')
rows = soup.find(class_="swift").find_all('tr')
th = [th.text for th in rows[0].find_all('th')]
print(th)
for row in rows[1:]:
    cell = [i.text for i in row.find_all('td', colspan=False)]
    print(cell)

输出:

^{pr2}$

相关问题 更多 >