Python 获取 <title>

5 投票
4 回答
5778 浏览
提问于 2025-04-15 15:35

我想用urllib2来获取我打开的网页的标题。有什么好的方法可以做到这一点呢?我需要解析网页的HTML,找到我想要的内容(现在只需要标签,以后可能还需要其他的)。</p> <p>有没有什么好的解析库可以用来实现这个目的?</p> <script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-9314860051288758" crossorigin="anonymous"></script> <ins class="adsbygoogle" style="display:block; text-align:center;" data-ad-layout="in-article" data-ad-format="fluid" data-ad-client="ca-pub-9314860051288758" data-ad-slot="2721561324"></ins> <script> (adsbygoogle = window.adsbygoogle || []).push({}); </script> </div> <div class="tags-section"> <a target="_blank" href="/tags/urllib2" class="tag">urllib2</a> <a target="_blank" href="/tags/html%E8%A7%A3%E6%9E%90" class="tag">html解析</a> <a target="_blank" href="/tags/%E7%BD%91%E9%A1%B5%E8%A7%A3%E6%9E%90" class="tag">网页解析</a> <a target="_blank" href="/tags/%E6%A0%87%E9%A2%98%E6%8F%90%E5%8F%96" class="tag">标题提取</a> </div> </div> </div> <!-- 回答区域 --> <div class="card"> <div class="card-header"> <div class="answers-header"> <h2 class="answers-title">4 个回答</h2> <div class="answer-sort"> <select> <option>按票数排序</option> <option>按时间排序</option> </select> </div> </div> </div> <div class="answer-item"> <div class="answer-wrapper"> <div class="answer-voting"> <button class="vote-button up"> <svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"> <polyline points="18 15 12 9 6 15"></polyline> </svg> </button> <div class="vote-count">0</div> <button class="vote-button down"> <svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"> <polyline points="6 9 12 15 18 9"></polyline> </svg> </button> </div> <div class="answer-content"> <p>使用 <a href="http://www.crummy.com/software/BeautifulSoup/" rel="nofollow noreferrer">Beautiful Soup</a> 这个工具。</p> <pre><code>html = urllib2.urlopen("...").read() from BeautifulSoup import BeautifulSoup soup = BeautifulSoup(html) print soup.title.string </code></pre> </div> </div> <div class="answer-footer"> <div class="answer-author"> 回答于 2025-04-15 由 <a href="#" class="author-name">Python大师</a> </div> <div class="answer-actions"> <a href="#" class="answer-action">分享</a> <a href="#" class="answer-action">举报</a> </div> </div> </div> <div class="answer-item"> <div class="answer-wrapper"> <div class="answer-voting"> <button class="vote-button up"> <svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"> <polyline points="18 15 12 9 6 15"></polyline> </svg> </button> <div class="vote-count">5</div> <button class="vote-button down"> <svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"> <polyline points="6 9 12 15 18 9"></polyline> </svg> </button> </div> <div class="answer-content"> <p>试试这个叫做 <a href="http://www.crummy.com/software/BeautifulSoup/" rel="noreferrer">Beautiful Soup</a> 的工具:</p> <pre><code>url = 'http://www.example.com' response = urllib2.urlopen(url) html = response.read() soup = BeautifulSoup(html) title = soup.html.head.title print title.contents </code></pre> </div> </div> <div class="answer-footer"> <div class="answer-author"> 回答于 2025-04-15 由 <a href="#" class="author-name">Python大师</a> </div> <div class="answer-actions"> <a href="#" class="answer-action">分享</a> <a href="#" class="answer-action">举报</a> </div> </div> </div> <div class="answer-item"> <div class="answer-wrapper"> <div class="answer-voting"> <button class="vote-button up"> <svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"> <polyline points="18 15 12 9 6 15"></polyline> </svg> </button> <div class="vote-count">9</div> <button class="vote-button down"> <svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"> <polyline points="6 9 12 15 18 9"></polyline> </svg> </button> </div> <div class="answer-content"> <p>是的,我推荐使用 <a href="http://www.crummy.com/software/BeautifulSoup/" rel="noreferrer">BeautifulSoup</a></p> <p>如果你想获取网页的标题,可以简单地这样做:</p> <pre><code>soup = BeautifulSoup(html) myTitle = soup.html.head.title </code></pre> <p>或者</p> <pre><code>myTitle = soup('title') </code></pre> <p>这个内容来自于 <a href="http://www.crummy.com/software/BeautifulSoup/documentation.html" rel="noreferrer">官方文档</a></p> <p>它非常强大,可以处理各种杂乱的HTML代码。</p> </div> </div> <div class="answer-footer"> <div class="answer-author"> 回答于 2025-04-15 由 <a href="#" class="author-name">Python大师</a> </div> <div class="answer-actions"> <a href="#" class="answer-action">分享</a> <a href="#" class="answer-action">举报</a> </div> </div> </div> <div class="answer-form"> <h3 class="form-title">撰写回答</h3> <form> <div class="form-control"> <label for="answer" class="form-label">您的回答</label> <textarea id="answer" class="form-input" placeholder="编写您的回答..."></textarea> </div> <button type="submit" class="btn btn-primary">提交回答</button> </form> </div> </div> </main> <aside class="sidebar"> <!-- 侧边栏顶部广告位 --> <div class="card sidebar-box ad-container"> <script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-9314860051288758" crossorigin="anonymous"></script> <!-- qa_detail_sidebar_top --> <ins class="adsbygoogle" style="display:inline-block;width:320px;height:600px" data-ad-client="ca-pub-9314860051288758" data-ad-slot="5193841686"></ins> <script> (adsbygoogle = window.adsbygoogle || []).push({}); </script> </div> <div class="card sidebar-box"> <div class="card-header"> <h3 class="card-title">推荐教程</h3> </div> <ul class="related-questions"> <li class="related-question"> <a target="_blank" href="/python/list-sort.html" class="related-link">Python 列表排序</a> </li> <li class="related-question"> <a target="_blank" href="/python/string-concatenation.html" class="related-link">Python 字符串拼接</a> </li> <li class="related-question"> <a target="_blank" href="/python/user-input.html" class="related-link">Python 用户输入</a> </li> <li class="related-question"> <a target="_blank" href="/python/mysql-limit.html" class="related-link">MySQL:限制查询结果记录数的操作</a> </li> <li class="related-question"> <a target="_blank" href="/python/mongodb-create-collection.html" class="related-link">MongoDB:创建集合的基础教程</a> </li> <li class="related-question"> <a target="_blank" href="/python/tuple-change.html" class="related-link">Python 元组更新修改</a> </li> <li class="related-question"> <a target="_blank" href="/python/dict-loop.html" class="related-link">Python 字典(Dictionary)遍历</a> </li> <li class="related-question"> <a target="_blank" href="/python/mysql-drop-table.html" class="related-link">使用 MySQL 语句删除表</a> </li> <li class="related-question"> <a target="_blank" href="/python/dict-access.html" class="related-link">Python字典(Dictionary)访问</a> </li> <li class="related-question"> <a target="_blank" href="/python/sets.html" class="related-link">Python 集合(Set)</a> </li> <li class="related-question"> <a target="_blank" href="/python/mongodb-insert.html" class="related-link">MongoDB:如何插入单条与多条文档</a> </li> <li class="related-question"> <a target="_blank" href="/python/list-change.html" class="related-link">Python 修改列表项</a> </li> </ul> </div> <div class="card sidebar-box"> <div class="card-header"> <h3 class="card-title">热门标签</h3> </div> <div style="padding: 1.25rem;"> <a href="#" class="tag">python</a> <a href="#" class="tag">json</a> <a href="#" class="tag">大数据</a> <a href="#" class="tag">内存优化</a> <a href="#" class="tag">pandas</a> <a href="#" class="tag">性能优化</a> <a href="#" class="tag">数据处理</a> <a href="#" class="tag">文件处理</a> </div> </div> <div class="card sidebar-box"> <div class="card-header"> <h3 class="card-title">最新问题</h3> </div> <ul class="related-questions"> <li class="related-question"> <a href="/q/121944" class="related-link">升级到Yosemite后Python的'site.py'不见了,可以吗?</a> <div class="related-stats">2 回答 · 1005 浏览</div> </li> <li class="related-question"> <a href="/q/121943" class="related-link">通过WLST在Weblogic中监控应用部署状态</a> <div class="related-stats">1 回答 · 3957 浏览</div> </li> <li class="related-question"> <a href="/q/121942" class="related-link">Django 获取多个模型的关联数据</a> <div class="related-stats">1 回答 · 728 浏览</div> </li> <li class="related-question"> <a href="/q/121941" class="related-link">Python的relativedelta中的非确定性行为</a> <div class="related-stats">3 回答 · 992 浏览</div> </li> <li class="related-question"> <a href="/q/121940" class="related-link">Kivy应用在Android上崩溃</a> <div class="related-stats">1 回答 · 2132 浏览</div> </li> </ul> </div> </aside> </div> <!-- 页脚 --> <footer class="footer"> <div class="footer-container"> <div class="footer-section"> <h3>关于我们</h3> <ul class="footer-links"> <li><a href="#" class="footer-link">关于Python问答</a></li> <li><a href="#" class="footer-link">团队介绍</a></li> <li><a href="#" class="footer-link">加入我们</a></li> </ul> </div> <div class="footer-section"> <h3>帮助中心</h3> <ul class="footer-links"> <li><a href="#" class="footer-link">常见问题</a></li> <li><a href="#" class="footer-link">使用指南</a></li> <li><a href="#" class="footer-link">反馈建议</a></li> </ul> </div> <div class="footer-section"> <h3>社区</h3> <ul class="footer-links"> <li><a href="#" class="footer-link">技术博客</a></li> <li><a href="#" class="footer-link">活动中心</a></li> <li><a href="#" class="footer-link">用户故事</a></li> </ul> </div> <div class="footer-section"> <h3>联系方式</h3> <ul class="footer-links"> <li><a href="#" class="footer-link">联系我们</a></li> <li><a href="#" class="footer-link">商务合作</a></li> <li><a href="#" class="footer-link">微信公众号</a></li> </ul> </div> </div> <div class="footer-bottom"> <p>© 2013~2025 Python问答社区 | 京ICP备07000037号</p> </div> </footer> <script> // 移动端导航菜单切换 const navToggle = document.getElementById('navToggle'); const navMenu = document.getElementById('navMenu'); navToggle.addEventListener('click', () => { navMenu.classList.toggle('active'); }); // 主题切换功能 const themeToggle = document.getElementById('themeToggle'); const body = document.body; // 检查本地存储中的主题设置 const currentTheme = localStorage.getItem('theme') || 'green'; if (currentTheme === 'blue') { body.setAttribute('data-theme', 'blue'); } themeToggle.addEventListener('click', () => { const currentTheme = body.getAttribute('data-theme'); if (currentTheme === 'blue') { body.removeAttribute('data-theme'); localStorage.setItem('theme', 'green'); } else { body.setAttribute('data-theme', 'blue'); localStorage.setItem('theme', 'blue'); } themeToggle.classList.add('active'); setTimeout(() => { themeToggle.classList.remove('active'); }, 300); }); </script> <!-- prism.js 主库 --> <script src="https://unpkg.com/prismjs@1.29.0/prism.js"></script> <!-- prism.js python 语法支持 --> <script src="https://unpkg.com/prismjs@1.29.0/components/prism-python.min.js"></script> <script> // 页面加载完成后执行 document.addEventListener('DOMContentLoaded', function () { // 查找所有没有指定语言的代码块 const unlabeledCodeBlocks = document.querySelectorAll('pre > code:not([class*="language-"])'); unlabeledCodeBlocks.forEach(block => { block.classList.add('language-python'); }); const plaintextBlocks = document.querySelectorAll('pre > code.language-plaintext'); plaintextBlocks.forEach(block => { block.classList.remove('language-plaintext'); block.classList.add('language-python'); }); // 重新高亮所有代码块 Prism.highlightAll(); }); </script> </body> </html>