希望识别shopify/电子商务网站

2024-05-21 08:20:35 发布

您现在位置:Python中文网/ 问答频道 /正文

嗨,我正在制作一个webscraping工具来解析网站,并确定它们是否是电子商务网站(包含shopify、magento等标志),我正在从csv文件输入URL。我想看一下标题,找到这个词,这是一段html或类似的东西

来自https://www.thecriticalslidesociety.com/的示例

现在我可以单独查看头部信息,但当我试图查找html字符串时,我什么也得不到

有人能告诉我下一步该做什么吗

import csv
from bs4 import BeautifulSoup as soup
import requests
import csv
from urllib.request import urlopen as uReq
import re

with open('URL.csv', 'r') as csv_file:
    csv_reader = csv.reader(csv_file)

    with open('Approval.csv', 'w') as Approval_file:
        csv_writer = csv.writer(Approval_file, delimiter='_')

    next(csv_reader)
    for line in csv_reader:
        my_URL = line[0]
        Uclient = uReq(my_URL)

        page_html = Uclient.read()

        web_header={'User-Agent':'Mozilla/5.0(Macintosh;IntelMacOSX10_13_2)AppleWebKit/537.36(KHTML,likeGecko)Chrome/63.0.3239.132Safari/537.36'}

        Uclient.close()

        page_soup = soup(page_html, "html.parser")

        search = page_soup.find("head")

Tags: csvfromimporturl网站htmlaswith
1条回答
网友
1楼 · 发布于 2024-05-21 08:20:35
from bs4 import BeautifulSoup
import requests

url = "http://example.com/"
html = requests.get(url).text

soup = BeautifulSoup(html, "html.parser")
search = soup.find("head")

print(search)

输出<head>标记:

<head>
<title>Example Domain</title>
<meta charset="utf-8"/>
<meta content="text/html; charset=utf-8" http-equiv="Content-type"/>
<meta content="width=device-width, initial-scale=1" name="viewport"/>
<style type="text/css">
    body {
        background-color: #f0f0f2;
        margin: 0;
        padding: 0;
        font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;

    }
    div {
        width: 600px;
        margin: 5em auto;
        padding: 2em;
        background-color: #fdfdff;
        border-radius: 0.5em;
        box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);
    }
    a:link, a:visited {
        color: #38488f;
        text-decoration: none;
    }
    @media (max-width: 700px) {
        div {
            margin: 0 auto;
            width: auto;
        }
    }
    </style>
</head>

相关问题 更多 >