Python for循环和理解for循环

2024-06-01 00:20:57 发布

您现在位置:Python中文网/ 问答频道 /正文

有人能告诉我为什么这两个陈述(for循环和理解)会返回两个不同的答案吗。我以为他们是一样的,只是写声明的方式不同。你知道吗

使用的数据:

Top152['% Renewable'] 
Country
China                 19.754910
United States         11.570980
Japan                 10.232820
United Kingdom        10.600470
Russian Federation    17.288680
Canada                61.945430
Germany               17.901530
India                 14.969080
France                17.020280
South Korea            2.279353
Italy                 33.667230
Spain                 37.968590
Iran                   5.707721
Australia             11.810810
Brazil                69.648030

For循环:

def answer_ten():
    Top15 = answer_one()
    Top152 = Top15.copy()

    for x in Top152['% Renewable']:
        if x >= Top152['% Renewable'].median():
            Top152['HighRenew'] = 1
        else:
            Top152['HighRenew'] = 0
return Top152['HighRenew']
    answer_ten()

输出:

    Country
    China                 1
    United States         1
    Japan                 1
    United Kingdom        1
    Russian Federation    1
    Canada                1
    Germany               1
    India                 1
    France                1
    South Korea           1
    Italy                 1
    Spain                 1
    Iran                  1
    Australia             1
    Brazil                1     

理解力:

def answer_ten():
Top15 = answer_one()
Top152 = Top15.copy()

    Top152['HighRenew'] = [1 if x >= Top152['% Renewable'].median() else 0 for x in Top152['% Renewable']]


return Top152['HighRenew']
answer_ten()

输出:

Country
China                 1
United States         0
Japan                 0
United Kingdom        0
Russian Federation    1
Canada                1
Germany               1
India                 0
France                1
South Korea           0
Italy                 1
Spain                 1
Iran                  0
Australia             0
Brazil                1

Tags: answerforcountryunitedkingdomstateschinarussian
3条回答

第二种方法是编辑向量。而for循环将保存它(在后台)以避免不必要的编辑!你知道吗

更好的方法是将boolean mask转换为int,因为pandas使用非常快的矢量化函数最快:

print (Top152['% Renewable']> Top152['% Renewable'].median())
China                  True
United States         False
Japan                 False
United Kingdom        False
Russian Federation     True
Canada                 True
Germany                True
India                 False
France                False
South Korea           False
Italy                  True
Spain                  True
Iran                  False
Australia             False
Brazil                 True
Name: % Renewable, dtype: bool

def answer_ten():
    return (Top152['% Renewable'] > Top152['% Renewable'].median())
            .astype(int).rename('HighRenew')


print (answer_ten())
China                 1
United States         0
Japan                 0
United Kingdom        0
Russian Federation    1
Canada                1
Germany               1
India                 0
France                0
South Korea           0
Italy                 1
Spain                 1
Iran                  0
Australia             0
Brazil                1
Name: HighRenew, dtype: int32

对于循环,可以使用^{}使用非常慢的解决方案,但更快的是第一个解决方案:

def answer_ten():
    for idx, x in Top152.iterrows():
        if Top152.loc[idx, '% Renewable'] >= Top152['% Renewable'].median():
            Top152.loc[idx, 'HighRenew'] = 1
        else:
            Top152.loc[idx, 'HighRenew'] = 0
    return Top152['HighRenew'].astype(int)

print (answer_ten())
China                 1
United States         0
Japan                 0
United Kingdom        0
Russian Federation    1
Canada                1
Germany               1
India                 0
France                1
South Korea           0
Italy                 1
Spain                 1
Iran                  0
Australia             0
Brazil                1
Name: HighRenew, dtype: int32

时间安排:

#[15000 rows x 1 columns]
Top152 = pd.concat([Top152]*1000).reset_index(drop=True)  

def answer_ten1():
    return (Top152['% Renewable']> Top152['% Renewable'].median()).astype(int).rename('HighRenew')

def answer_ten2():
    for idx, x in Top152.iterrows():
        if Top152.loc[idx, '% Renewable'] >= Top152['% Renewable'].median():
            Top152.loc[idx, 'HighRenew'] = 1
        else:
            Top152.loc[idx, 'HighRenew'] = 0
    return Top152['HighRenew'].astype(int)


def answer_ten3():
    Top152['HighRenew'] = [1 if x >= Top152['% Renewable'].median() else 0 for x in Top152['% Renewable']]
    return Top152['HighRenew']

print (answer_ten1())   
print (answer_ten2())
print (answer_ten3())  

In [169]: %timeit (answer_ten1())
1000 loops, best of 3: 528 µs per loop

In [170]: %timeit answer_ten2()
1 loop, best of 3: 16 s per loop

In [171]: %timeit (answer_ten3())
1 loop, best of 3: 2.67 s per loop

在每个迭代步骤中设置整列(向量):

Top152['HighRenew'] = 1

请尝试这种矢量化方法:

Top152['HighRenew'] = (Top152['% Renewable'] >= Top152['% Renewable'].median()).astype(int)

因此,您的功能可以实现如下:

def answer_ten():
    return (Top15['% Renewable'] >= Top15['% Renewable'].median()).astype(int)

相关问题 更多 >