从Dataframe的列创建元组

2024-05-13 00:45:30 发布

您现在位置:Python中文网/ 问答频道 /正文

enter image description here

我有一个这样的数据集,我想创建一个List of tuplesas

(Name_of_State , Literacy_rate)
(JAMMU&KASHMIR, 89.78) #example

我不得不做一些清理工作,清除一些地区,只是保留一些州

data=data[data['Name']!='India']    #removing the India's row 
data=data[data['TRU']=='Total']    
 #Only keeping total and excluding the rural and urban rows
states_group=data[data['Level']=='State']
states_group

之后,这里是我要关注的主要代码-

literacy_rate=[]
total_state_pop=0
total_literate_pop=0
for key,group in states_group.iterrows():
    total_state_pop+=states_group['TOT_P']
    
    total_literate_pop+=states_group['P_LIT']
    total_literate_pop+=states_group['F_LIT']
    rate=(total_literate_pop/total_state_pop)*100
    literacy_rate.append((states_group['Name'],rate))
    
print(literacy_rate) 

但我得到的结果是——

(3            JAMMU & KASHMIR
72          HIMACHAL PRADESH
111                   PUNJAB
174               CHANDIGARH
180              UTTARAKHAND
222                  HARYANA
288             NCT OF DELHI
318                RAJASTHAN
420            UTTAR PRADESH
636                    BIHAR
753                   SIKKIM
768                  MANIPUR
798                  MIZORAM
825                  TRIPURA
840                MEGHALAYA
864                    ASSAM
948              WEST BENGAL
1008               JHARKHAND
1083                  ODISHA
1176            CHHATTISGARH
1233          MADHYA PRADESH
1386                 GUJARAT
1467             DAMAN & DIU
1476    DADRA & NAGAR HAVELI
1482             MAHARASHTRA
1590          ANDHRA PRADESH
1662               KARNATAKA
1755                     GOA
1764                  KERALA
1809              TAMIL NADU
1908              PUDUCHERRY
Name: Name, dtype: object, 3        85.484832
72       99.946393
111      80.810862
174      93.793637
180      89.689123
222      79.608418
288      97.531743
318      67.745833
420      69.971651
636      52.937273
753      98.691424
768      96.236438
798     109.113300
825     116.065370
840      84.108326
864      96.451609
948      87.437511
1008     63.211190
1083     85.260257
1176     85.104889
1233     78.055310
1386     99.236215
1467    121.848465
1476    112.301972
1482    100.968386
1590     79.671587
1662     81.400129
1755    110.110417
1764    120.140132
1809     94.529868
1908    101.165414
dtype: float64), (3            JAMMU & KASHMIR
72          HIMACHAL PRADESH
111                   PUNJAB
174               CHANDIGARH
180              UTTARAKHAND
222                  HARYANA
288             NCT OF DELHI
318                RAJASTHAN
420            UTTAR PRADESH
636                    BIHAR
753                   SIKKIM
768                  MANIPUR
798                  MIZORAM
825                  TRIPURA
840                MEGHALAYA
864                    ASSAM
948              WEST BENGAL
1008               JHARKHAND
1083                  ODISHA
1176            CHHATTISGARH
1233          MADHYA PRADESH
1386                 GUJARAT
1467             DAMAN & DIU
1476    DADRA & NAGAR HAVELI
1482             MAHARASHTRA
1590          ANDHRA PRADESH
1662               KARNATAKA
1755                     GOA
1764                  KERALA
1809              TAMIL NADU
1908              PUDUCHERRY
Name: Name, dtype: object, 3        85.484832
72       99.946393
111      80.810862
174      93.793637
180      89.689123
222      79.608418
288      97.531743
318      67.745833
420      69.971651
636      52.937273
753      98.691424
768      96.236438
798     109.113300
825     116.065370
840      84.108326
864      96.451609
948      87.437511
1008     63.211190
1083     85.260257
1176     85.104889
1233     78.055310
1386     99.236215
1467    121.848465
1476    112.301972
1482    100.968386
1590     79.671587
1662     81.400129
1755    110.110417
1764    120.140132
1809     94.529868
1908    101.165414
dtype: float64), (3            JAMMU & KASHMIR
72          HIMACHAL PRADESH
111                   PUNJAB
174               CHANDIGARH
180              UTTARAKHAND
222                  HARYANA
288             NCT OF DELHI
318                RAJASTHAN
420            UTTAR PRADESH
636                    BIHAR
753                   SIKKIM
768                  MANIPUR
798                  MIZORAM
825                  TRIPURA
840                MEGHALAYA
864                    ASSAM
948              WEST BENGAL
1008               JHARKHAND
1083                  ODISHA
1176            CHHATTISGARH
1233          MADHYA PRADESH
1386                 GUJARAT
1467             DAMAN & DIU
1476    DADRA & NAGAR HAVELI
1482             MAHARASHTRA
1590          ANDHRA PRADESH
1662               KARNATAKA
1755                     GOA
1764                  KERALA
1809              TAMIL NADU
1908              PUDUCHERRY
Name: Name, dtype: object, 3        85.484832
72       99.946393
111      80.810862
174      93.793637
180      89.689123
222      79.608418
288      97.531743
318      67.745833
420      69.971651
636      52.937273
753      98.691424
768      96.236438
798     109.113300
825     116.065370
840      84.108326
864      96.451609
948      87.437511
1008     63.211190
1083     85.260257
1176     85.104889
1233     78.055310
1386     99.236215
1467    121.848465
1476    112.301972

未来的日子会更漫长 这是link整个数据集 我哪里做错了?提前谢谢


Tags: namedatarategrouppoptotalstatedtype
2条回答

for循环中,如何将每个states_group更改为group 或者,使用.iterrows()进行for循环没有任何意义

literacy_rate=[]
total_state_pop=0
total_literate_pop=0
for key,group in states_group.iterrows():
    total_state_pop+=group['TOT_P']
    
    total_literate_pop+=group['P_LIT']
    total_literate_pop+=group['F_LIT']
    rate=(total_literate_pop/total_state_pop)*100
    literacy_rate.append((group['Name'],rate))

尽可能避免迭代,因为这是熊猫的反模式good read

import pandas as pd
data = pd.read_excel('state_dist_sc.xls')
data=data[data['Name']!='India']
data=data[data['TRU']=='Total']
states_group=data[data['Level']=='State']

#create a copy of data on which we will be calculating literacy rate.
states_group = states_group.copy()

#Calculate litracy rate using vector formula which is faster and more.
states_group['literacy_rate'] = 100*(states_group['P_LIT'] + states_group['F_LIT'])/states_group['TOT_P']

# use to_records to get list of tuples
ans = states_group[['Name','literacy_rate']].to_records(index=False)
ans

输出:

rec.array([('JAMMU & KASHMIR',  85.48483174),
           ('HIMACHAL PRADESH',  99.94639301), ('PUNJAB',  80.81086172),
           ('CHANDIGARH',  93.79363692), ('UTTARAKHAND',  89.68912284),
           ('HARYANA',  79.60841792), ('NCT OF DELHI',  97.53174349),
           ('RAJASTHAN',  67.74583313), ('UTTAR PRADESH',  69.97165068),
           ('BIHAR',  52.93727261), ('SIKKIM',  98.69142352),
           ('MANIPUR',  96.23643761), ('MIZORAM', 109.11330049),
           ('TRIPURA', 116.06537002), ('MEGHALAYA',  84.10832613),
           ('ASSAM',  96.45160871), ('WEST BENGAL',  87.43751069),
           ('JHARKHAND',  63.21118996), ('ODISHA',  85.26025661),
           ('CHHATTISGARH',  85.10488906),
           ('MADHYA PRADESH',  78.05530967), ('GUJARAT',  99.23621537),
           ('DAMAN & DIU', 121.84846506),
           ('DADRA & NAGAR HAVELI', 112.3019722 ),
           ('MAHARASHTRA', 100.96838647),
           ('ANDHRA PRADESH',  79.67158709), ('KARNATAKA',  81.40012899),
           ('GOA', 110.11041691), ('KERALA', 120.14013153),
           ('TAMIL NADU',  94.529868  ), ('PUDUCHERRY', 101.16541449)],
          dtype=[('Name', 'O'), ('literacy_rate', '<f8')])

相关问题 更多 >