使用BeautifulSoup解析XML时的Unicode对象错误

0 投票

1 回答

948 浏览

提问于 2025-04-18 04:03

使用BeautifulSoup解析XML输出中的'name'标签内容时，我遇到了以下错误：

AttributeError: 'unicode' object has no attribute 'get_text'

XML输出：

<show>
  <stud>
    <__readonly__>
      <TABLE_stud>
        <ROW_stud>
          <name>rice</name>
          <dept>chem</dept>
          .
          .
          .
        </ROW_stud>
      </TABLE_stud>
    </__readonly__>
  </stud>
</show>

不过，如果我访问其他标签，比如'dept'，就没有问题，能正常工作。

stud_info = output_xml.find_all('row_stud')
for eachStud in range(len(stud_info)):

    print stud_info[eachStud].dept.get_text()   #Gives 'chem'
    print stud_info[eachStud].name.get_text()   #---Unicode Error---

有没有Python或BeautifulSoup方面的专家能帮我解决这个问题？（我知道BeautifulSoup并不是解析XML的最佳选择，但我现在必须使用它）

数据解析编程问题 beautifulsoup xml解析 unicode错误标签内容

1 个回答

Tag.name 是一个属性，用来表示标签的名称；在这里它的值是 row_stud。

通过属性访问包含的标签是一种快捷方式，等同于使用 .find(attributename)，但只有在API中没有同名属性的情况下才能使用。如果有同名属性，建议使用 .find()：

print stud_info[eachStud].find('name').get_text()

你可以直接遍历 stud_info 的结果列表，不需要在这里使用 range()：

stud_info = output_xml.find_all('row_stud')
for eachStud in stud_info:
    print eachStud.dept.get_text()
    print eachStud.find('name').get_text()

我注意到你在用小写字母搜索 row_stud。如果你在用 BeautifulSoup 解析 XML，确保你已经安装了 lxml，并告诉 BeautifulSoup 你正在处理的是 XML，这样它就不会把你的标签变成小写（HTML化）：

soup = BeautifulSoup(source, 'xml')

回答于 2025-04-18 由 Python大师

分享举报

使用BeautifulSoup解析XML时的Unicode对象错误

1 个回答

撰写回答