<p>要写入csv,您需要知道head和body中的值,在本例中,head值应该是html元素contain <code><label</code></p>
<pre><code>from urllib2 import urlopen
from bs4 import BeautifulSoup
html = 'http://rerait.telangana.gov.in/PrintPreview/PrintPreview/UHJvamVjdElEPTQmRGl2aXNpb249MSZVc2VySUQ9MjAyODcmUm9sZUlEPTEmQXBwSUQ9NSZBY3Rpb249U0VBUkNIJkNoYXJhY3RlckQ9MjImRXh0QXBwSUQ9'
page = urlopen(html)
data = BeautifulSoup(page, 'html.parser')
name_box = data.findAll('div', attrs={'class': 'col-md-3 col-sm-3'}) #edited companyName_99a4824b -> companyName__99a4824b
heads = []
values = []
for i in range(len(name_box)):
data = name_box[i].text.strip()
dataHTML = str(name_box[i])
if 'PInfoType' in dataHTML:
# <div class="col-md-3 col-sm-3" id="PInfoType">
# empty value, maybe additional data for "Information Type"
continue
if 'for="2"' in dataHTML:
# <label for="2">No</label>
# it should be head but actually value
values.append(data)
elif '<label' in dataHTML:
# <label for="PersonalInfoModel_InfoTypeValue">Information Type</label>
# head or top row
heads.append(data)
else:
# <div class="col-md-3 col-sm-3">Individual</div>
# value for second row
values.append(data)
csvData = ', '.join(heads) + '\n' + ', '.join(values)
with open("results.csv", 'w') as f:
f.write(csvData)
print "finish."
</code></pre>