通过python并行运行for循环

2024-05-17 00:08:17 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个进程循环访问IP地址列表并返回有关它们的一些信息。simple for循环工作得很好,我的问题是由于Python的全局解释器锁(GIL)而大规模地运行这个循环。在

我的目标是让这个函数并行运行,并充分利用我的4个核心。这样,当我运行100K这些,它不会花我24小时通过一个正常的for循环。在

在阅读了其他人的答案之后,特别是这个,How do I parallelize a simple Python loop?,我决定使用joblib。当我运行10个记录通过它(上面的例子),它花了10分钟运行。这听起来不太对劲。我知道有些事情我做错了或者不理解。非常感谢任何帮助!在

import pandas as pd
import numpy as np
import os as os
from ipwhois import IPWhois
from joblib import Parallel, delayed
import multiprocessing

num_core = multiprocessing.cpu_count()

iplookup = ['174.192.22.197',\
            '70.197.71.201',\
            '174.195.146.248',\
            '70.197.15.130',\
            '174.208.14.133',\
            '174.238.132.139',\
            '174.204.16.10',\
            '104.132.11.82',\
            '24.1.202.86',\
            '216.4.58.18']

正常的for循环,工作正常!在

^{pr2}$

函数传递给joblib在所有核心上运行!在

def run_ip_process(iplookuparray):
    asn=[]
    asnid=[]
    asncountry=[]
    asndesc=[]
    asnemail = []
    asnaddress = []
    asncity = []
    asnstate = []
    asnzip = []
    asndesc2 = []
    ipaddr=[]
    b=1
    totstolookup=len(iplookuparray)

for i in iplookuparray:
    i = str(i)
    print("Running #{} out of {}".format(b,totstolookup))
    try:
        obj=IPWhois(i,timeout=15)
        result=obj.lookup_whois()
        asn.append(result['asn'])
        asnid.append(result['asn_cidr'])
        asncountry.append(result['asn_country_code'])
        asndesc.append(result['asn_description'])
        try:
            asnemail.append(result['nets'][0]['emails'])
            asnaddress.append(result['nets'][0]['address'])
            asncity.append(result['nets'][0]['city'])
            asnstate.append(result['nets'][0]['state'])
            asnzip.append(result['nets'][0]['postal_code'])
            asndesc2.append(result['nets'][0]['description'])
            ipaddr.append(i)
        except:
            asnemail.append(0)
            asnaddress.append(0)
            asncity.append(0)
            asnstate.append(0)
            asnzip.append(0)
            asndesc2.append(0)
            ipaddr.append(i)
    except:
        pass
    b+=1

ipdataframe = pd.DataFrame({'ipaddress':ipaddr,
              'asn': asn,
              'asnid':asnid,
              'asncountry':asncountry,
              'asndesc': asndesc,
            'emailcontact': asnemail,
              'address':asnaddress,
              'city':asncity,
              'state': asnstate,
                'zip': asnzip,
              'ipdescrip':asndesc2})

return ipdataframe 

通过joblib使用所有核心运行进程

Parallel(n_jobs=num_core)(delayed(run_ip_process)(iplookuparray) for i in iplookup)

Tags: importforresultjoblibappendasnnetsasnemail