Python BeautifulSoup HTML刮削问题

2024-05-20 02:04:24 发布

您现在位置：Python中文网/ 问答频道 /正文

3525

网友

男 | 程序猿一只，喜欢编程写python代码。

所以，我最近一直在玩python，我一直在尝试学习一些新的东西，通过混合一些我找到的代码，使之成为我将来可能最终使用的东西。今天，我几乎完成了这个项目，虽然当我打印出链接时，它说

https://v3rmillion.net/showthread.php

与其成为那样的人，我更愿意成为：

https://v3rmillion.net/showthread.php?tid=393794

import requests,os,urllib,sys, webbrowser, bs4

from bs4 import BeautifulSoup

def startup():
    os.system('cls')
    print('Discord To Profile')
    user = raw_input('Discord Tag: ')
    r = requests.get('https://www.google.ca/search?source=hp&q=' + user + ' site:v3rmillion.net')
    soup = BeautifulSoup(r.text, "html.parser")
    print soup.find('div',{'id':'resultStats'}).text
    content=r.content.decode('UTF-8','replace')

    #Attempting to scrape links, although I'd like the full length instead of just .php
    links=[]
    while '<h3 class="r">' in content:
        content=content.split('<h3 class="r">', 1)[1]
        split_content=content.split('</h3>', 1)
        link='http'+split_content[1].split(':http',1)[1].split('%',1)[0]
        links.append(link)
        content=split_content[1]
    for link in links[:5]:  # max number of links 5
        print(link)

startup()

Tags： https import net os link links content requests

0条回答

目前没有回答

Python BeautifulSoup HTML刮削问题

相关问题更多 >

编程相关推荐

热门问题

热门文章

Python BeautifulSoup HTML刮削问题

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >