在两个P之间提取文本问题的回答

在两个P之间提取文本

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

我试图提取两个元素“高管”和“分析师”之间的数据，但我不知道如何继续。我的html是： <pre><code><div class="content_part hid" id="article_participants"> Wabash National Corporation (NYSE:<a title="" href="http://seekingalpha.com/symbol/wnc">WNC</a>)Q4 2014 Earnings Conference CallFebruary 04, 2015 10:00 AM ET Executives Mike Pettit - Vice President of Finance and Investor Relations Richard Giromini - President and Chief Executive Officer Jeffery Taylor - Senior Vice President and Chief Financial Officer Analysts </code></pre> 我想对一大堆文件执行此操作，到目前为止，我的代码是： <pre><code>from bs4 import BeautifulSoup import requests import textwrap import os from lxml import html import csv directory ='C:/Research syntheses - Meta analysis/SeekingAlpha' for filename in os.listdir(directory): if filename.endswith('.html'): fname = os.path.join(directory,filename) with open(fname, 'r') as f: page=f.read() soup = BeautifulSoup(f.read(),'html.parser') match = soup.find('div',class_='content_part hid', id='article_participants') print(match) </code></pre> 我是Python方面的新手，请容忍我 我喜欢的输出是： <a href="https://i.stack.imgur.com/Tou7b.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/Tou7b.png" alt="Example output"/></a> 标题可在以下HTML中找到： <pre><code><div class="page_header_email_alerts" id="page_header"> <h1> Wabash National's (WNC) CEO Richard Giromini on Q4 2014 Results - Earnings Call Transcript </h1> <div id="article_info"> <div class="article_info_pos"> Feb. 4, 2015 4:48 PM ET &nbsp;|&nbsp; About: <a title="Wabash National Corporation" href="/symbol/WNC" sasource="article_primary_about_trc">Wabash National Corporation (WNC)</a> by: SA Transcripts </div> ''' </code></pre>

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

在两个P之间提取文本

1 个回答

相关Python问题