pythonwhois库对活动域名不返回任何内容

2024-05-16 09:01:55 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图使用pythonwhois库收集一些网站的whois记录。在

问题是我对一些网站一无所获,比如国家卫生部这是一个活动域名!在

w = whois.whois("nih.gov")
print w
{u'updated_date': None, u'status': u'ACTIVE', u'name': None, u'dnssec': None, u'city': None, u'expiration_date': None, u'zipcode': None, u'domain_name': u'NIH.GOV', u'country': None, u'whois_server': None, u'state': None, u'registrar': None, u'referral_url': None, u'address': None, u'name_servers': None, u'org': None, u'creation_date': None, u'emails': None}

我不明白问题出在哪里,我应该用哪个库来覆盖所有情况?在


Tags: namenonedate网站status记录whoisactive
1条回答
网友
1楼 · 发布于 2024-05-16 09:01:55

这里有一些代码可以完成这个任务。在

import sys
import socket
from datetime import datetime as dt
import time

def whois(ip):

    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    s.connect(("whois.arin.net", 43))
    s.send(('n ' + ip + '\r\n').encode())

    response = b""

    # setting time limit in secondsmd
    startTime = time.mktime(dt.now().timetuple())
    timeLimit = 3
    while True:
        elapsedTime = time.mktime(dt.now().timetuple()) - startTime
        data = s.recv(4096)
        response += data
        if (not data) or (elapsedTime >= timeLimit):
            break
    s.close()

    print(response.decode())

def main():
    domain = sys.argv[1];
    ip = socket.gethostbyname(domain);
    whois(ip)

main()

例如:

^{pr2}$

特别是为了www.nih.gov网站我们得到:

c:\Temp>py test.py www.nih.gov

#
# ARIN WHOIS data and services are subject to the Terms of Use
# available at: https://www.arin.net/whois_tou.html
#
# If you see inaccuracies in the results, please report at
# https://www.arin.net/public/whoisinaccuracy/index.xhtml
#


#
# The following results may also be obtained via:
# https://whois.arin.net/rest/nets;q=23.21.241.1?showDetails=true&showARIN=false&showNonArinTopLevelNet=false&ext=netref2
#

NetRange:       23.20.0.0 - 23.23.255.255
CIDR:           23.20.0.0/14
NetName:        AMAZON-EC2-USEAST-10
NetHandle:      NET-23-20-0-0-1
Parent:         NET23 (NET-23-0-0-0-0)
NetType:        Direct Allocation
OriginAS:       AS16509
Organization:   Amazon.com, Inc. (AMAZO-4)
RegDate:        2011-09-19
Updated:        2014-09-03
Comment:        The activity you have detected originates from a dynamic hosting environment.
Comment:        For fastest response, please submit abuse reports at http://aws-portal.amazon.com/gp/aws/html-forms-controller/contactus/AWSAbuse
Comment:        For more information regarding EC2 see:
Comment:        http://ec2.amazonaws.com/
Comment:        All reports MUST include:
Comment:        * src IP
Comment:        * dest IP (your IP)
Comment:        * dest port
Comment:        * Accurate date/timestamp and timezone of activity
Comment:        * Intensity/frequency (short log extracts)
Comment:        * Your contact details (phone and email) Without these we will be unable to identify the correct owner of the IP address at that point in time.
Ref:            https://whois.arin.net/rest/net/NET-23-20-0-0-1


OrgName:        Amazon.com, Inc.
OrgId:          AMAZO-4
Address:        Amazon Web Services, Inc.
Address:        P.O. Box 81226
City:           Seattle
StateProv:      WA
PostalCode:     98108-1226
Country:        US
RegDate:        2005-09-29
Updated:        2017-01-28
Comment:        For details of this service please see
Comment:        http://ec2.amazonaws.com/
Ref:            https://whois.arin.net/rest/org/AMAZO-4


OrgAbuseHandle: AEA8-ARIN
OrgAbuseName:   Amazon EC2 Abuse
OrgAbusePhone:  +1-206-266-4064
OrgAbuseEmail:  abuse@amazonaws.com
OrgAbuseRef:    https://whois.arin.net/rest/poc/AEA8-ARIN

OrgTechHandle: ANO24-ARIN
OrgTechName:   Amazon EC2 Network Operations
OrgTechPhone:  +1-206-266-4064
OrgTechEmail:  amzn-noc-contact@amazon.com
OrgTechRef:    https://whois.arin.net/rest/poc/ANO24-ARIN

OrgNOCHandle: AANO1-ARIN
OrgNOCName:   Amazon AWS Network Operations
OrgNOCPhone:  +1-206-266-4064
OrgNOCEmail:  amzn-noc-contact@amazon.com
OrgNOCRef:    https://whois.arin.net/rest/poc/AANO1-ARIN


#
# ARIN WHOIS data and services are subject to the Terms of Use
# available at: https://www.arin.net/whois_tou.html
#
# If you see inaccuracies in the results, please report at
# https://www.arin.net/public/whoisinaccuracy/index.xhtml
#

不同的选项

还有一个选择。在

这段代码在脚本的文件夹中创建一个文件,其中包含来自不同服务的whois请求的HTML。你可以修改它以适应你的需要,我刚刚写了基础。在

import urllib.request
import tempfile
import io
from bs4 import BeautifulSoup
import sys

def writeFile(text):
    with io.open('whoisData.txt', "w", encoding="utf-8") as f:
        f.write(text)
    f.close()

def readHTML(domain):
    url = 'https://www.whois.com/whois/' + domain
    html = urllib.request.urlopen(url).read()
    soup = BeautifulSoup(html)

    # kill all script and style elements
    for script in soup(["script", "style"]):
        script.extract()    # rip it out

    # get text
    text = soup.get_text()

    # break into lines and remove leading and trailing space on each
    lines = (line.strip() for line in text.splitlines())
    # break multi-headlines into a line each
    chunks = (phrase.strip() for line in lines for phrase in line.split("  "))
    # drop blank lines
    text = '\n'.join(chunk for chunk in chunks if chunk)
    writeFile(text)

def main():
    domain = sys.argv[1]
    readHTML(domain)

main()

引用了here(关于解析HTMLs)。在

相关问题 更多 >