条件etree lxml的错误

1 投票

2 回答

578 浏览

提问于 2025-04-17 03:56

我正在尝试删除所有在数字66之间的内容：

我遇到了以下错误：TypeError: argument of type 'NoneType' is not iterable...if element.tag == 'answer' and '-66' in element.text:

这有什么问题吗？有人能帮忙吗？

#!/usr/local/bin/python2.7
# -*- coding: UTF-8 -*- 

from lxml import etree

planhtmlclear_utf=u"""
<questionaire>
<question>
<questiontext>What's up?</questiontext>
<answer></answer>
</question>
<question>
<questiontext>Cool?</questiontext>
<answer>-66</answer>
</question>
</questionaire>

"""

html = etree.fromstring(planhtmlclear_utf)
questions = html.xpath('/questionaire/question')
for question in questions:
    for element in question.getchildren():
        if element.tag == 'answer' and '-66' in element.text:
            html.xpath('/questionaire')[0].remove(question)
print etree.tostring(html)

2 个回答

检查 element.text 是否为 None 的另一种方法是优化你的 XPath。

questions = html.xpath('/questionaire/question[answer/text()="-66"]')
for question in questions:
    question.getparent().remove(question)

方括号 [...] 的意思是“满足某种条件”。所以

question                          # find all question elements
[                                 # such that 
  answer                          # it has an answer subelement
    /text()                       # whose text 
  =                               # equals
  "-66"                           # "-66"
]

回答于 2025-04-17 由 Python大师

分享举报

在某些情况下，element.text 可能是 None。错误提示说它无法在 None 中查找 "-66"，所以首先要检查 element.text 是否不是 None，可以这样做：

html = etree.fromstring(planhtmlclear_utf)
questions = html.xpath('/questionaire/question')
for question in questions:
    for element in question.getchildren():   
        if element.tag == 'answer' and element.text and '-66' in element.text:
            html.xpath('/questionaire')[0].remove(question)
print etree.tostring(html)

在 XML 中出错的那一行是 <answer></answer>，这里的标签之间没有任何文本。

编辑（关于你提到的合并标签的第二部分问题）：

你可以这样使用 BeautifulSoup：

from lxml import etree
import BeautifulSoup

planhtmlclear_utf=u"""
<questionaire>
<question>
<questiontext>What's up?</questiontext>
<answer></answer>
</question>
<question>
<questiontext>Cool?</questiontext>
<answer>-66</answer>
</question>
</questionaire>"""

html = etree.fromstring(planhtmlclear_utf)
questions = html.xpath('/questionaire/question')
for question in questions:
    for element in question.getchildren():   
        if element.tag == 'answer' and element.text and '-66' in element.text:
            html.xpath('/questionaire')[0].remove(question)

soup = BeautifulSoup.BeautifulStoneSoup(etree.tostring(html))
print soup.prettify()

输出结果：

<questionaire>
 <question>
  <questiontext>
   What's up?
  </questiontext>
  <answer>
  </answer>
 </question>
</questionaire>

这里有一个链接，你可以在这里下载 BeautifulSoup 模块。

或者，你可以用一种更简洁的方式来做到这一点：

from lxml import etree
import BeautifulSoup    

# abbreviating to reduce answer length...
planhtmlclear_utf=u"<questionaire>.........</questionaire>"

html = etree.fromstring(planhtmlclear_utf)
[question.getparent().remove(question) for question in html.xpath('/questionaire/question[answer/text()="-66"]')]
print BeautifulSoup.BeautifulStoneSoup(etree.tostring(html)).prettify()

回答于 2025-04-17 由 Python大师

分享举报

条件etree lxml的错误

2 个回答

撰写回答