从第二个位置刮取文本 - 问答

<table class="ent"> <tbody class=""><tr class="tablestyle"> <td class="hide_on_mobile"> <a href="../" class=""> <img class="ProductImage" src="https://.."></a> </td> <td class="hide_on_mobile" align="center"> Scraped okay - col0 Scrape this text - col1 Scrape this text - col2 Next Event: Scrape this text -col3 </td>

import sqlite3 import datetime import requestsnt import pandas as pd from bs4 import BeautifulSoup url = "http:/*" r = requests.get(url) source = r.text t = datetime.datetime.now().date() soup = BeautifulSoup(source, "lxml") row_count=200 row_marker = 0 new_table = pd.DataFrame(columns = ["col0", "col1", "col2","col3", "DateAdded"], index = range(0,row_count)) # I don't know the number of rows # For col0 column_marker = 0 for layout in soup.select("strong > span"): new_table.iat[row_marker,column_marker] = layout.text.strip() new_table.iat[row_marker,4] = t row_marker +=1 # For col 1 column_marker = 1 row_marker = 0 for layout in soup.select("strong > span > br > br"): new_table.iat[row_marker,column_marker] = layout.text.strip() row_marker +=1

1条回答

网友

1楼 · 发布于 2024-04-25 07:54:53

#since you said there are multiple trs
trs = data.find_all('tr')


for tr in trs:
    l = []
    td =  tr.find_all('td')
    #since first td will never have data.. acc to the above posted ques 
    for tags in td[1]:
        try:
            if tags.text:
                print(tags.text)
                l.extend((tags.text).split('\n'))
        except:
            pass

#once there are more trs keep below code inside the loop
#then store the data in a df..since each loop will give new list
str_data = [' '.join(s.split()) for s in l if s]        
str_data.remove('')
print(str_data)

输出

['Scraped okay - col0',
 'Scrape this text - col1',
 'Scrape this text - col2',
 'Next Event: Scrape this text -col3']

从第二个位置刮取文本<BR>

相关问题更多 >

编程相关推荐

热门问题

热门文章

从第二个位置刮取文本<BR>

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >