在pd.read_html
格式更改为123456
后,无法从CCCCCCC
列获取1,2,3,4,5,6
,并且我的预期结果应保持1,2,3,4,5,6
HTML代码
html = """<html>
<body>
<div id="MMMMMMMM" class="MMMMMMMMMMM" style="">
<table class="OOOOOOOO" style="">
<thead>
<tr class="PPPPPPPPPP">
<td colspan="3" style="font-size:14px;font-weight:bold;" class="QQQQQQQQQQ">AAAAAAA</td>
</tr>
<tr class="RRRRRRRRRR">
<td>BBBBBB</td>
<td>CCCCCCC</td>
<td>AAAAAAA</td>
</tr>
</thead>
<tbody>
<tr class="SSSSSSSS">
<td rowspan="1">DDDDDD</td>
<td class="L_LLLL67">1,2,3,4,5,6</td>
<td class="L_LLLL67 f_tar">1234.56</td>
</tr>
<tr class="">
<td rowspan="3">EEEEEEEEE</td>
<td class="L_LLLL67">1,2,3,4,5,6</td>
<td class="L_LLLL67 f_tar">1234.56</td>
</tr>
<tr class="">
<td class="L_LLLL67">1,2,3,4,5,6</td>
<td class="L_LLLL67 f_tar">1234.56</td>
</tr>
<tr class="">
<td class="L_LLLL67">1,2,3,4,5,6</td>
<td class="L_LLLL67 f_tar">1234.56</td>
</tr>
<tr class="">
<td rowspan="1">FFFFFFFFF</td>
<td class="L_LLLL67">1,2,3,4,5,6</td>
<td class="L_LLLL67 f_tar">1234.56</td>
</tr>
<tr class="TTTTTT">
<td rowspan="1">GGGGGGGGG</td>
<td class="L_LLLL67">1,2,3,4,5,6</td>
<td class="L_LLLL67 f_tar">1234.56</td>
</tr>
<tr class="">
<td rowspan="1">HHHHHHHHH</td>
<td class="L_LLLL67">1,2,3,4,5,6</td>
<td class="L_LLLL67 f_tar">1234.56</td>
</tr>
<tr class="TTTTTTT">
<td rowspan="1">IIIIIIIIII</td>
<td class="L_LLLL67">1,2,3,4,5,6</td>
<td class="L_LLLL67 f_tar">1234.56</td>
</tr>
<tr class="">
<td rowspan="1">JJJJJJJJ</td>
<td class="L_LLLL67">1,2,3,4,5,6</td>
<td class="L_LLLL67 f_tar">1234.56</td>
</tr>
<tr class="TTTTT">
<td rowspan="2">KKKKKKKK</td>
<td class="L_LLLL67">1/2/3/4/5/6</td>
<td class="L_LLLL67 f_tar">1234.56</td>
</tr>
<tr class="TTTTTT">
<td class="L_LLLL67">1/2/3/4/5/6</td>
<td class="L_LLLL67 f_tar">1234.56</td>
</tr>
</tbody>
</table>
</body>
</html>"""
Python代码
from bs4 import BeautifulSoup
import pandas as pd
soup = BeautifulSoup(html,'html.parser')
table = soup.find('div', attrs={'id':'MMMMMMMM'})
df_list = pd.read_html(str(table), header=1)
df_list
执行结果
[ BBBBBB CCCCCCC AAAAAAA
0 DDDDDD 123456 1234.56
1 EEEEEEEEE 123456 1234.56
2 EEEEEEEEE 123456 1234.56
3 EEEEEEEEE 123456 1234.56
4 FFFFFFFFF 123456 1234.56
5 GGGGGGGGG 123456 1234.56
6 HHHHHHHHH 123456 1234.56
7 IIIIIIIIII 123456 1234.56
8 JJJJJJJJ 123456 1234.56
9 KKKKKKKK 1/2/3/4/5/6 1234.56
10 KKKKKKKK 1/2/3/4/5/6 1234.56]
预期结果
[ BBBBBB CCCCCCC AAAAAAA
0 DDDDDD 1,2,3,4,5,6 1234.56
1 EEEEEEEEE 1,2,3,4,5,6 1234.56
2 EEEEEEEEE 1,2,3,4,5,6 1234.56
3 EEEEEEEEE 1,2,3,4,5,6 1234.56
4 FFFFFFFFF 1,2,3,4,5,6 1234.56
5 GGGGGGGGG 1,2,3,4,5,6 1234.56
6 HHHHHHHHH 1,2,3,4,5,6 1234.56
7 IIIIIIIIII 1,2,3,4,5,6 1234.56
8 JJJJJJJJ 1,2,3,4,5,6 1234.56
9 KKKKKKKK 1/2/3/4/5/6 1234.56
10 KKKKKKKK 1/2/3/4/5/6 1234.56]
您需要添加
thousands
参数,并将其默认设置为None
,它是','
输出:
相关问题 更多 >
编程相关推荐