条件etree lxml的错误
我正在尝试删除所有在数字66之间的内容:
我遇到了以下错误:TypeError: argument of type 'NoneType' is not iterable...if element.tag == 'answer' and '-66' in element.text:
这有什么问题吗?有人能帮忙吗?
#!/usr/local/bin/python2.7
# -*- coding: UTF-8 -*-
from lxml import etree
planhtmlclear_utf=u"""
<questionaire>
<question>
<questiontext>What's up?</questiontext>
<answer></answer>
</question>
<question>
<questiontext>Cool?</questiontext>
<answer>-66</answer>
</question>
</questionaire>
"""
html = etree.fromstring(planhtmlclear_utf)
questions = html.xpath('/questionaire/question')
for question in questions:
for element in question.getchildren():
if element.tag == 'answer' and '-66' in element.text:
html.xpath('/questionaire')[0].remove(question)
print etree.tostring(html)
2 个回答
1
检查 element.text
是否为 None
的另一种方法是优化你的 XPath。
questions = html.xpath('/questionaire/question[answer/text()="-66"]')
for question in questions:
question.getparent().remove(question)
方括号 [...]
的意思是“满足某种条件”。所以
question # find all question elements
[ # such that
answer # it has an answer subelement
/text() # whose text
= # equals
"-66" # "-66"
]
1
在某些情况下,element.text 可能是 None。错误提示说它无法在 None 中查找 "-66",所以首先要检查 element.text 是否不是 None,可以这样做:
html = etree.fromstring(planhtmlclear_utf)
questions = html.xpath('/questionaire/question')
for question in questions:
for element in question.getchildren():
if element.tag == 'answer' and element.text and '-66' in element.text:
html.xpath('/questionaire')[0].remove(question)
print etree.tostring(html)
在 XML 中出错的那一行是 <answer></answer>
,这里的标签之间没有任何文本。
编辑(关于你提到的合并标签的第二部分问题):
你可以这样使用 BeautifulSoup
:
from lxml import etree
import BeautifulSoup
planhtmlclear_utf=u"""
<questionaire>
<question>
<questiontext>What's up?</questiontext>
<answer></answer>
</question>
<question>
<questiontext>Cool?</questiontext>
<answer>-66</answer>
</question>
</questionaire>"""
html = etree.fromstring(planhtmlclear_utf)
questions = html.xpath('/questionaire/question')
for question in questions:
for element in question.getchildren():
if element.tag == 'answer' and element.text and '-66' in element.text:
html.xpath('/questionaire')[0].remove(question)
soup = BeautifulSoup.BeautifulStoneSoup(etree.tostring(html))
print soup.prettify()
输出结果:
<questionaire>
<question>
<questiontext>
What's up?
</questiontext>
<answer>
</answer>
</question>
</questionaire>
这里有一个链接,你可以在这里下载 BeautifulSoup 模块。
或者,你可以用一种更简洁的方式来做到这一点:
from lxml import etree
import BeautifulSoup
# abbreviating to reduce answer length...
planhtmlclear_utf=u"<questionaire>.........</questionaire>"
html = etree.fromstring(planhtmlclear_utf)
[question.getparent().remove(question) for question in html.xpath('/questionaire/question[answer/text()="-66"]')]
print BeautifulSoup.BeautifulStoneSoup(etree.tostring(html)).prettify()