Python Get Links脚本需要通配符搜索

from bs4 import BeautifulSoup import requests url = "" # Getting the webpage, creating a Response object. response = requests.get(url) # Extracting the source code of the page. data = response.text # Passing the source code to BeautifulSoup to create a BeautifulSoup object for it. soup = BeautifulSoup(data, 'lxml') # Extracting all the <a> tags into a list. tags = soup.find_all('a') # Extracting URLs from the attribute href in the <a> tags. for tags in tags: print(tags.get('href'))

3条回答

网友

1楼 · 编辑于 2024-04-20 05:26:51

关于你的第二个问题：有没有导出到Excel的方法-我一直在使用python模块XlsxWriter。你知道吗

import xlsxwriter

# Create a workbook and add a worksheet.
workbook = xlsxwriter.Workbook('Expenses01.xlsx')
worksheet = workbook.add_worksheet()

# Some data we want to write to the worksheet.
expenses = (
    ['Rent', 1000],
    ['Gas',   100],
    ['Food',  300],
    ['Gym',    50],
)

# Start from the first cell. Rows and columns are zero indexed.
row = 0
col = 0

# Iterate over the data and write it out row by row.
for item, cost in (expenses):
    worksheet.write(row, col,     item)
    worksheet.write(row, col + 1, cost)
    row += 1

# Write a total using a formula.
worksheet.write(row, 0, 'Total')
worksheet.write(row, 1, '=SUM(B1:B4)')

workbook.close()

XlsxWriter允许编码遵循基本的excel约定-我是python的新手，第一次尝试就很容易建立、运行和工作。你知道吗

网友

2楼 · 编辑于 2024-04-20 05:26:51

以下是您的代码的更新版本，它将从该页获取所有https HREF：

from bs4 import BeautifulSoup
import requests

url = "https://www.google.com"

# Getting the webpage, creating a Response object.
response = requests.get(url)

# Extracting the source code of the page.
data = response.text

# Passing the source code to BeautifulSoup to create a BeautifulSoup object for it.
soup = BeautifulSoup(data)

# Extracting all the <a> tags into a list.
tags = soup.find_all('a')

# Extracting URLs from the attribute href in the <a> tags.
for tag in tags:
    if str.startswith(tag.get('href'), 'https'):
        print(tag.get('href'))

如果要获取以https以外的内容开头的HREF，请将第2行更改为最后一行：）

参考文献： https://www.tutorialspoint.com/python/string_startswith.htm

网友

3楼 · 编辑于 2024-04-20 05:26:51

您可以使用startswith()：

for tag in tags:
    if tag.get('href').startswith('pre'):
        print(tag.get('href'))

相关问题更多 >

编程相关推荐

热门问题

热门文章