我有下面的html摘录,注意,两个td对我需要捕获的每一行重复
<table class="ent">
<tbody class=""><tr class="tablestyle">
<td class="hide_on_mobile"> <a href="../" class="">
<img class="ProductImage" src="https://.."></a>
</td>
<td class="hide_on_mobile" align="center">
<strong class="">
<span style="font-size:1.4em;" class="">Scraped okay - col0</span>
<br>
<br>Scrape this text - col1</strong><br>
<br><i><span style="color:indigo;" class="">Scrape this text - col2
<br class="">
<br>Next Event: Scrape this text -col3</span></i>
</td>
我需要捕获4个不同的数据块col0,col1,col2,col3
我已经让col0工作了。我要抓到col1,col2,col3
我在试着用BR I.e 跨距后
将第2个BR后面的文本作为第1列
将第3个BR后面的文本作为第2列
把第5行后面的文字改为第3行
我无法让col1与br>;比尔。有什么办法解决这个问题吗
import sqlite3
import datetime
import requestsnt
import pandas as pd
from bs4 import BeautifulSoup
url = "http:/*"
r = requests.get(url)
source = r.text
t = datetime.datetime.now().date()
soup = BeautifulSoup(source, "lxml")
row_count=200
row_marker = 0
new_table = pd.DataFrame(columns = ["col0", "col1", "col2","col3", "DateAdded"], index = range(0,row_count)) # I don't know the number of rows
# For col0
column_marker = 0
for layout in soup.select("strong > span"):
new_table.iat[row_marker,column_marker] = layout.text.strip()
new_table.iat[row_marker,4] = t
row_marker +=1
# For col 1
column_marker = 1
row_marker = 0
for layout in soup.select("strong > span > br > br"):
new_table.iat[row_marker,column_marker] = layout.text.strip()
row_marker +=1
输出
相关问题 更多 >
编程相关推荐