我有下面的html代码(或者至少我认为是html),我正在Python上与BeautifulSoup一起工作
我已经正确地使用Beautiful soup解析了html。我接下来要做的是检索与包含特定数据标签的'div'相关联的内容(例如,在代码的底部,data label=“Relation”)。特别地,我想获得一个字典,它以数据标签的文本作为键,即在我的示例“Relation”中,以同一个“div”的内容作为值,即在我的示例中href“http://documenti.camera.it/apps/commonServices/getDocumento.ashx?sezione=bollettini=comunicato=17=2016=06=14=03=data.20160614.com03.bollettino.sede00020.tit00010.int00010=data.20160614.com03.bollettino.sede00020.tit00010.int00010#data.20160614.com03.bollettino.sede00020.tit00010.int00010”
我尝试过几种方法,但据我所知,数据标签似乎不是一个有效的属性,所以我不知道如何处理这个问题
(请注意,这只是一个例子,但我将不得不对成千上万,如果不是数百万,这些网页,与此类似的结构做同样的事情)
感谢您的帮助。谢谢你
<div id="directs">
<label class="c1"><a data-comment="A human-readable name for the subject." data-label="label" href="http://www.w3.org/2000/01/rdf-schema#label">
rdfs:<span>label</span>
</a></label>
<div class="c2 value ">
<div class="toMultiLine ">
<div class="fixed">
<span class="dType">xsd:string</span>
intervento di Fabrizio CICCHITTO
</div>
</div>
</div>
<label class="c1"><a data-comment="A name given to the resource." data-label="Title" href="http://purl.org/dc/elements/1.1/title">
dc:<span>title</span>
</a></label>
<div class="c2 value ">
<div class="toMultiLine ">
<div class="fixed">
intervento di Fabrizio CICCHITTO
</div>
</div>
</div>
<label class="c1"><a data-comment="" data-label="" href="http://lod.xdams.org/ontologies/ods/modified">
ods:<span>modified</span>
</a></label>
<div class="c2 value ">
<div class="toMultiLine ">
<div class="fixed">
<span class="dType">xsd:dateTime</span>
2016-07-05T12:26:02Z
</div>
</div>
</div>
<label class="c1"><a data-comment="The subject is an instance of a class." data-label="type" href="http://www.w3.org/1999/02/22-rdf-syntax-ns#type">
rdf:<span>type</span>
</a></label>
<div class="c2 value">
<div class="toOneLine">
<a class=" isLocal" href="http://dati.camera.it/ocd/intervento" title="<http://dati.camera.it/ocd/intervento>">
ocd:intervento
</a>
</div>
</div>
<label class="c1"><a data-comment="propriet generica utilizzata per puntare alla risorsa deputato in vari punti dell'ontologia" data-label="rierimento a deputato" href="http://dati.camera.it/ocd/rif_deputato">
ocd:<span>rif_deputato</span>
</a></label>
<div class="c2 value">
<div class="toOneLine">
<a class=" isLocal" href="http://dati.camera.it/ocd/deputato.rdf/d15080_17" title="<http://dati.camera.it/ocd/deputato.rdf/d15080_17>">
http://dati.camera.it/ocd/deputato.rdf/d15080_17
</a>
</div>
</div>
<label class="c1"><a data-comment="A related resource." data-label="Relation" href="http://purl.org/dc/elements/1.1/relation">
dc:<span>relation</span>
</a></label>
<div class="c2 value">
<div class="toOneLine">
<a class=" " href="http://documenti.camera.it/apps/commonServices/getDocumento.ashx?sezione=bollettini=comunicato=17=2016=06=14=03=data.20160614.com03.bollettino.sede00020.tit00010.int00010=data.20160614.com03.bollettino.sede00020.tit00010.int00010#data.20160614.com03.bollettino.sede00020.tit00010.int00010"
target="_blank" title="<http://documenti.camera.it/apps/commonServices/getDocumento.ashx?sezione=bollettini=comunicato=17=2016=06=14=03=data.20160614.com03.bollettino.sede00020.tit00010.int00010=data.20160614.com03.bollettino.sede00020.tit00010.int00010#data.20160614.com03.bollettino.sede00020.tit00010.int00010>">
http://documenti.camera.it/apps/commonServices/getDocumento.ashx?sezione=bollettini=comunicato=17=2016=06=14=03=data.20160614.com03.bollettino.sede00020.tit00010.int00010=data.20160614.com03.bollettino.sede00020.tit00010.int00010#data.20160614.com03.bollettino.sede00020.tit00010.int00010
</a>
</div>
</div>
</div>
您可以在一个过程中找到
data-labels
,在另一个过程中找到div
内容。然后,可以将结果压缩到一起以创建字典:输出:
相关问题 更多 >
编程相关推荐