通过xml.dom.minidom处理RSS/RDF

2 投票
1 回答
659 浏览
提问于 2025-04-15 21:04

我正在尝试用Python处理一个很不错的RSS订阅源。这里有一个示例:

...
  <item rdf:about="http://weblist.me/">
    <title>WebList - The Place To Find The Best List On The Web</title>
    <dc:date>2009-12-24T17:46:14Z</dc:date>
    <link>http://weblist.me/</link>
    ...
  </item>
  <item rdf:about="http://thumboo.com/">
    <title>Thumboo! Free Website Thumbnails and PHP Script to Generate Web Screenshots</title>
    <dc:date>2006-10-24T18:11:32Z</dc:date>
    <link>http://thumboo.com/</link>
...

相关的代码是:

def getText(nodelist):
    rc = ""
    for node in nodelist:
        if node.nodeType == node.TEXT_NODE:
            rc = rc + node.data
    return rc

dom = xml.dom.minidom.parse(file)
items = dom.getElementsByTagName("item")
for i in items:
    title = i.getElementsByTagName("title")
    print getText(title)

我本以为这段代码会打印出每个标题,但实际上我得到的输出几乎是空白。我肯定是哪里搞错了,但我不知道问题出在哪里?

1 个回答

4

你现在把节点传给了<code>getText</code>,但是这些节点的类型不是<code>node.TEXT_NODE</code>。你需要在<code>getText</code>方法里遍历这个节点的所有子节点。</p> <pre><code>def getTextSingle(node): parts = [child.data for child in node.childNodes if child.nodeType == node.TEXT_NODE] return u"".join(parts) def getText(nodelist): return u"".join(getTextSingle(node) for node in nodelist) </code></pre> <p>更好的做法是,在调用<code>getTextSingle</code>之前,先调用<code>node.normalize()</code>,这样可以把连续的<code>node.TEXT_NODE</code>类型的子节点合并成一个单独的<code>node.TEXT_NODE</code>。</p> </div> </div> <div class="answer-footer"> <div class="answer-author"> 回答于 2025-04-15 由 <a href="#" class="author-name">Python大师</a> </div> <div class="answer-actions"> <a href="#" class="answer-action">分享</a> <a href="#" class="answer-action">举报</a> </div> </div> </div> <div class="answer-form"> <h3 class="form-title">撰写回答</h3> <form> <div class="form-control"> <label for="answer" class="form-label">您的回答</label> <textarea id="answer" class="form-input" placeholder="编写您的回答..."></textarea> </div> <button type="submit" class="btn btn-primary">提交回答</button> </form> </div> </div> </main> <aside class="sidebar"> <!-- 侧边栏顶部广告位 --> <div class="card sidebar-box ad-container"> <script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-9314860051288758" crossorigin="anonymous"></script> <!-- qa_detail_sidebar_top --> <ins class="adsbygoogle" style="display:inline-block;width:320px;height:600px" data-ad-client="ca-pub-9314860051288758" data-ad-slot="5193841686"></ins> <script> (adsbygoogle = window.adsbygoogle || []).push({}); </script> </div> <div class="card sidebar-box"> <div class="card-header"> <h3 class="card-title">推荐教程</h3> </div> <ul class="related-questions"> <li class="related-question"> <a target="_blank" href="/python/class-instance.html" class="related-link">Python类(Class)与实例(Instance)</a> </li> <li class="related-question"> <a target="_blank" href="/python/inheritance.html" class="related-link">Python 继承关系</a> </li> <li class="related-question"> <a target="_blank" href="/python/variables.html" class="related-link">Python 变量</a> </li> <li class="related-question"> <a target="_blank" href="/python/read-file.html" class="related-link">Python 读取文件</a> </li> <li class="related-question"> <a target="_blank" href="/python/tuple-access.html" class="related-link">Python 元组访问</a> </li> <li class="related-question"> <a target="_blank" href="/python/tuple-methods.html" class="related-link">Python元组方法</a> </li> <li class="related-question"> <a target="_blank" href="/python/mysql-drop-table.html" class="related-link">使用 MySQL 语句删除表</a> </li> <li class="related-question"> <a target="_blank" href="/python/arrays.html" class="related-link">Python 数组</a> </li> <li class="related-question"> <a target="_blank" href="/python/mysql-where.html" class="related-link">MySQL查询WHERE子句</a> </li> <li class="related-question"> <a target="_blank" href="/python/global-variables.html" class="related-link">Python全局变量</a> </li> <li class="related-question"> <a target="_blank" href="/python/mongodb-insert.html" class="related-link">MongoDB:如何插入单条与多条文档</a> </li> <li class="related-question"> <a target="_blank" href="/python/join-lists.html" class="related-link">Python 列表合并</a> </li> </ul> </div> <div class="card sidebar-box"> <div class="card-header"> <h3 class="card-title">热门标签</h3> </div> <div style="padding: 1.25rem;"> <a href="#" class="tag">python</a> <a href="#" class="tag">json</a> <a href="#" class="tag">大数据</a> <a href="#" class="tag">内存优化</a> <a href="#" class="tag">pandas</a> <a href="#" class="tag">性能优化</a> <a href="#" class="tag">数据处理</a> <a href="#" class="tag">文件处理</a> </div> </div> <div class="card sidebar-box"> <div class="card-header"> <h3 class="card-title">最新问题</h3> </div> <ul class="related-questions"> <li class="related-question"> <a href="/q/122272" class="related-link">Python向SQL Server插入数据</a> <div class="related-stats">1 回答 · 5214 浏览</div> </li> <li class="related-question"> <a href="/q/122271" class="related-link">标签矩阵转换为邻接矩阵</a> <div class="related-stats">1 回答 · 883 浏览</div> </li> <li class="related-question"> <a href="/q/122270" class="related-link">numpy 从索引列表创建 2D 掩码 [+ 然后从掩码数组绘制]</a> <div class="related-stats">1 回答 · 3371 浏览</div> </li> <li class="related-question"> <a href="/q/122269" class="related-link">Yosemite安装后Python configparser错误</a> <div class="related-stats">2 回答 · 3600 浏览</div> </li> <li class="related-question"> <a href="/q/122268" class="related-link">扫描匹配算法在平移上给出错误值,但在旋转上给出正确值</a> <div class="related-stats">1 回答 · 520 浏览</div> </li> </ul> </div> </aside> </div> <!-- 页脚 --> <footer class="footer"> <div class="footer-container"> <div class="footer-section"> <h3>关于我们</h3> <ul class="footer-links"> <li><a href="#" class="footer-link">关于Python问答</a></li> <li><a href="#" class="footer-link">团队介绍</a></li> <li><a href="#" class="footer-link">加入我们</a></li> </ul> </div> <div class="footer-section"> <h3>帮助中心</h3> <ul class="footer-links"> <li><a href="#" class="footer-link">常见问题</a></li> <li><a href="#" class="footer-link">使用指南</a></li> <li><a href="#" class="footer-link">反馈建议</a></li> </ul> </div> <div class="footer-section"> <h3>社区</h3> <ul class="footer-links"> <li><a href="#" class="footer-link">技术博客</a></li> <li><a href="#" class="footer-link">活动中心</a></li> <li><a href="#" class="footer-link">用户故事</a></li> </ul> </div> <div class="footer-section"> <h3>联系方式</h3> <ul class="footer-links"> <li><a href="#" class="footer-link">联系我们</a></li> <li><a href="#" class="footer-link">商务合作</a></li> <li><a href="#" class="footer-link">微信公众号</a></li> </ul> </div> </div> <div class="footer-bottom"> <p>© 2013~2025 Python问答社区 | 京ICP备07000037号</p> </div> </footer> <script> // 移动端导航菜单切换 const navToggle = document.getElementById('navToggle'); const navMenu = document.getElementById('navMenu'); navToggle.addEventListener('click', () => { navMenu.classList.toggle('active'); }); // 主题切换功能 const themeToggle = document.getElementById('themeToggle'); const body = document.body; // 检查本地存储中的主题设置 const currentTheme = localStorage.getItem('theme') || 'green'; if (currentTheme === 'blue') { body.setAttribute('data-theme', 'blue'); } themeToggle.addEventListener('click', () => { const currentTheme = body.getAttribute('data-theme'); if (currentTheme === 'blue') { body.removeAttribute('data-theme'); localStorage.setItem('theme', 'green'); } else { body.setAttribute('data-theme', 'blue'); localStorage.setItem('theme', 'blue'); } themeToggle.classList.add('active'); setTimeout(() => { themeToggle.classList.remove('active'); }, 300); }); </script> <!-- prism.js 主库 --> <script src="https://unpkg.com/prismjs@1.29.0/prism.js"></script> <!-- prism.js python 语法支持 --> <script src="https://unpkg.com/prismjs@1.29.0/components/prism-python.min.js"></script> <script> // 页面加载完成后执行 document.addEventListener('DOMContentLoaded', function () { // 查找所有没有指定语言的代码块 const unlabeledCodeBlocks = document.querySelectorAll('pre > code:not([class*="language-"])'); unlabeledCodeBlocks.forEach(block => { block.classList.add('language-python'); }); const plaintextBlocks = document.querySelectorAll('pre > code.language-plaintext'); plaintextBlocks.forEach(block => { block.classList.remove('language-plaintext'); block.classList.add('language-python'); }); // 重新高亮所有代码块 Prism.highlightAll(); }); </script> </body> </html>