使用Python在一个包含html代码的txt文件中创建代码

2024-04-25 06:10:31 发布

您现在位置:Python中文网/ 问答频道 /正文

我想使用一个文本文件作为源文件,在该文件中我有我的html标记,而不是实际的站点((sauce=urllib.request.urlopen('https://sitex.com').read(),在该站点中我有文件中的标记。你知道吗

import urllib.request
import bs4
import requests

with open('words.txt','r') as f:
    soup =BeautifulSoup (r.text, "html.parser")
    for line in f:
        print(soup.find_all("a"))

Tags: 文件https标记importcom站点requesthtml
2条回答

像这样?你知道吗

with open('words.txt','r') as f:
    soup = BeautifulSoup (f.read(), "html.parser")
    for a in soup.find_all("a"):
        # do sth. here

你知道吗文字.txt地址:

<html>
<!  Text between angle brackets is an HTML tag and is not displayed.
Most tags, such as the HTML and /HTML tags that surround the contents of
a page, come in pairs; some tags, like HR, for a horizontal rule, stand 
alone. Comments, such as the text you're reading, are not displayed when
the Web page is shown. The information between the HEAD and /HEAD tags is 
not displayed. The information between the BODY and /BODY tags is displayed. >
<head>
<title>Enter a title, displayed at the top of the window.</title>
</head>
<!  The information between the BODY and /BODY tags is displayed. >
<body>
<a href="">Visit XYZ.com!</a>
<h1>Enter the main heading, usually the same as the title.</h1>
<p>Be <b>bold</b> in stating your key points. Put them in a list: </p>
<a href="">Visit W3Schools.com!</a>
<ul>
<li>The first item in your list</li>
<li>The second item; <i>italicize</i> key words</li>
</ul>
<p>Improve your image by including an image. </p>
<a href="">Visit ABC.com!</a>
<p><img src="http://www.mygifs.com/CoverImage.gif" alt="A Great HTML Resource"></p>
<p>Add a link to your favorite <a href="https://www.dummies.com/">Web site</a>.
Break up your page with a horizontal rule or two. </p>
<hr>
<p>Finally, link to <a href="">another page</a> in your own Web site.</p>
<!  And add a copyright notice. >
<p>&#169; Wiley Publishing, 2011</p>
</body>
</html>

然后:

from bs4 import BeautifulSoup

with open('words.txt','r') as f:
    soup = BeautifulSoup(f.read(), "html.parser")
    for line in soup.find_all('a'):
         print(line.text)

输出:

Visit XYZ.com!
Visit W3Schools.com!
Visit ABC.com!
Web site
another page

相关问题 更多 >