Web抓取和python数据类型

2024-05-23 21:01:18 发布

您现在位置:Python中文网/ 问答频道 /正文

使用BeautifulSoup4和Python3.0的Web抓取脚本 我想从price value中删除$sign(in result),使其成为float类型并对其执行一些数值操作。但它在文本中。在

import requests
from bs4 import BeautifulSoup

def bitcoin_scheduler():
    url = "https://coinmarketcap.com/currencies/bitcoin/"
    r = requests.get(url)
    offline_data = r.content
    soup = BeautifulSoup(offline_data, 'html.parser')

    name_box = soup.find('small', attrs={'class': 'bold hidden-xs'})
    name = name_box.text.strip()

    price_box = soup.find('span', attrs={'class': 'text-large'})
    price = price_box.text.strip()

    print(time.ctime(), name, price)
    threading.Timer(5.0, bitcoin_scheduler).start()

bitcoin_scheduler()

结果:

^{pr2}$

Tags: textnameimportboxurldatafindrequests
3条回答

You can check with isdigit() but default isdigit() method only works for int not for float so you can define your own isdigit() which will work for both:

import requests
from bs4 import BeautifulSoup
import time
import threading

new=[]

def isdigit(d):
    try:
        float(d)
        return True
    except ValueError:
        return False

def bitcoin_scheduler():
    url = "https://coinmarketcap.com/currencies/bitcoin/"
    r = requests.get(url)
    offline_data = r.content
    soup = BeautifulSoup(offline_data, 'html.parser')

    name_box = soup.find('small', attrs={'class': 'bold hidden-xs'})
    name = name_box.text.strip()

    price_box = soup.find('span', attrs={'class': 'text-large'})
    price = price_box.text.strip('$')
    if isdigit(price)==True:
        price=float(price)
        #do your stuff with price
        print(time.ctime(), name,price)
        print(type(price))


    threading.Timer(5.0, bitcoin_scheduler).start()

bitcoin_scheduler()

输出:

^{pr2}$

使用replace()方法,或者使用strip()方法

import requests
from bs4 import BeautifulSoup

def bitcoin_scheduler():
    url = "https://coinmarketcap.com/currencies/bitcoin/"
    r = requests.get(url)
    offline_data = r.content
    soup = BeautifulSoup(offline_data, 'html.parser')

    name_box = soup.find('small', attrs={'class': 'bold hidden-xs'})
    name = name_box.text.strip()

    price_box = soup.find('span', attrs={'class': 'text-large'})
    price = price_box.text.strip()

    print(time.ctime(), name, price.replace('$',''))
    threading.Timer(5.0, bitcoin_scheduler).start()

bitcoin_scheduler()

下面是一个简单的例子:

temp = "$6962.29"
temp = temp.strip("$")  # Removes $ from both sides
temp = float(temp)      # Converts to float
temp += 2               # Adding 2
print(temp)

它的输出应该是6264.29,因为我们在这个数字上加了2。在

相关问题 更多 >