打印时均匀间隔元素
我有一段代码:
for t in tables:
print ""
my_table = t
rows = my_table.findAll('tr')
for tr in rows:
cols = tr.findAll('td')
i = 0
for td in cols:
text = str(td.text).strip()
print "{}{}".format(text if text !="" else "IP","|"),
i=i+1
if i == 2:
print ""
i = 0
pass
“tables”是一个包含HTML格式表格的列表。我正在使用beautifulsoup来解析这些表格。
目前,我得到的输出是:
Interface in| port-channel8.53|
IP| 172.18.153.126/255.255.255.252|
Router| bob|
Route| route: 192.168.178.0/255.255.128.0 gw 172.18.145.106|
Interface out| Ethernet2/5.103|
IP| 172.18.145.105/255.255.255.252|
我想要得到的是:
Interface in | port-channel8.53 |
IP | 172.18.153.126/255.255.255.252 |
Router | bob |
Route | route: 192.168.178.0/255.255.128.0 gw 172.18.145.106|
Interface out| Ethernet2/5.103 |
IP | 172.18.145.105/255.255.255.252 |
"Placeholder"| another ip in the same td as the one up |
"Placeholder"| another ip in the same td as the one up |
我该如何得到这个输出呢?
编辑:
这是一个表格的制作方式:
<table>
<tr>
<td>Interface in</td>
<td>Vlan800 (bob)</td>
</tr>
<tr>
<td></td>
<td>172.26.128.3/255.255.255.224<br></br></td>
</tr>
<tr>
<td>Router</td>
<td>bob2</td>
</tr>
<tr>
<td>Route</td>
<td>route: 0.0.0.0/0.0.0.0 gw 172.26.144.241</td>
</tr>
<tr>
<td>Interface out</td>
<td>Vlan1145 (bob3)</td>
</tr>
<tr>
<td></td>
<td>172.26.144.245/255.255.255.240<br></br></td>
</tr>
</table>
(是的,空的地方在真实页面上是存在的)
编辑2:有问题的代码:
<td>
195.233.112.4/255.255.255.0<br>
195.233.112.15/255.255.255.0<br>
195.233.112.3/255.255.255.0<br>
<br><br><br></td>
编辑3:
示例代码2(与提出的解决方案有问题)
<table class="nitrestable">
<tr>
<td>Interface in</td>
<td>GigabitEthernet1/1.103 (*global)</td>
</tr>
<tr>
<td></td>
<td>172.18.145.106/255.255.255.252<br></br></td>
</tr>
<tr>
<td>Router</td>
<td>*grt</td>
</tr>
<tr>
<td>Route</td>
<td>route: 172.18.145.106/255.255.255.128 gw 172.18.145.106</td>
</tr>
<tr>
<td>Interface out</td>
<td>Vlan71 (*global)</td>
</tr>
<tr>
<td></td>
<td>172.18.145.106/255.255.255.0<br>
172.18.146.106/255.255.255.0<br>
172.18.147.106/255.255.255.0<br></br></br></br></td></tr>
</table>
3 个回答
0
这段话的意思是,它可以把行和列解析成一个列表,然后再对这些数据进行评估。这样就能很简单地计算出每一列的最大宽度(代码里的w1和w2)。正如其他人所说,一旦确定了宽度,接下来就可以使用str.format()来处理了。
for t in tables:
col = [[],[]]
my_table = t
rows = my_table.findAll('tr')
for tr in rows:
cols = tr.findAll('td')
i = 0
for td in cols:
text = str(td.text).strip()
col[i].append(text if text else "IP")
i=i+1
if i == 2:
if '<br>' in text:
text = text.replace('</br>','') #ignore </br>
for t in text.split('<br>')[1:]: #first element has already been processed
if t: #only append if there is content
col[0].append(col[0][-1]) #duplicate the last entry of col[0]
col[1].append(t)
i = 0
w1 = max([len(x) for x in col[0]])
w2 = max([len(x) for x in col[1]])
for i in range(len(col[1]))
s='{: <{}}|{: <{}}|'.format(col[0][i],w1,col[1][i],w2)
print(s)
接下来解释一下str.format():'{: <{}}'.format(x,y)
这个代码的作用是从文本y
的字符串,并且这个字符串是左对齐的,前面会用空格填充。
补充说明:我添加了对多个IP地址或任何字段的解析,这些字段的第二列是用<br>
分隔的。
0
这是一个“简单一点”的脚本。你可以查一下Python中的enumerate
这个关键词。
import BeautifulSoup
raw_str = \
'''
<table>
<tr>
<td>Interface in</td>
<td>Vlan800 (bob)</td>
</tr>
<tr>
<td></td>
<td>172.26.128.3/255.255.255.224<br></br></td>
</tr>
<tr>
<td>Router</td>
<td>bob2</td>
</tr>
<tr>
<td>Route</td>
<td>route: 0.0.0.0/0.0.0.0 gw 172.26.144.241</td>
</tr>
<tr>
<td>Interface out</td>
<td>Vlan1145 (bob3)</td>
</tr>
<tr>
<td></td>
<td>172.26.144.245/255.255.255.240<br></br></td>
</tr>
</table>
'''
org_str = \
'''
Interface in| port-channel8.53|
IP| 172.18.153.126/255.255.255.252|
Router| bob|
Route| route: 192.168.178.0/255.255.128.0 gw 172.18.145.106|
Interface out| Ethernet2/5.103|
IP| 172.18.145.105/255.255.255.252|
'''
print org_str
soup = BeautifulSoup.BeautifulSoup(raw_str)
tables = soup.findAll('table')
for cur_table in tables:
print ""
col_sizes = {}
# Figure out the column sizes
for tr in cur_table.findAll('tr'):
tds = tr.findAll('td')
cur_col_sizes = {col : max(len(td.text), col_sizes.get(col, 0)) for (col, td) in enumerate(tds)}
col_sizes.update(cur_col_sizes)
# Print the data, padded using the detected column sizes
for tr in cur_table.findAll('tr'):
tds = tr.findAll('td')
line_strs = [("%%-%ds" % col_sizes[col]) % (td.text or "IP") for (col, td) in enumerate(tds)]
line_str = "| %s |" % " | ".join(line_strs)
print line_str