迭代列以分割数据

2024-05-13 20:27:35 发布

您现在位置:Python中文网/ 问答频道 /正文

我有以下数据集: 名为:2、3、4…9的列中填充了相互重叠的主题名称。Pageviews是一个结果变量。你知道吗

        2                           3                       Pageviews
0       Financial Services          Consumer Products       4106.0
1       Consumer Products           ...                     3368.0
2       Consumer Products           ...                     1025.0
3       Collaboration               ...                     7840.0
4       Future of Supply Chains     ...                     2076.0

我想将每个主题列(2,3,4,…)与Pageviews一起切片并附加它们,以便只创建一个包含1个主题列和Pageviews的数据帧。你知道吗

我习惯于在Stata中循环,您可以使用x循环列的名称,但我知道这与Pyhton完全不同。你知道吗

我从

for x in range(2, 9):
    df_x = df[['Pageviews',  df.x]]

但是Python不识别df.x

如何循环浏览列名?是否可以使用迭代器来创建新的数据帧?你知道吗

谢谢!你知道吗

编辑

我想要的输出是

                                       Col        Pageviews
0                           Financial Services      4106.0
1                            Consumer Products      3368.0
2                            Consumer Products      1025.0
3                                 Collaboration     7840.0
4                      Future of Supply Chains      2076.0
5                          Future of Reporting      2123.0
6                    Sustainability Management     15576.0
7                                 Human Rights        52.0
8                                      BSR News      903.0
9                       Energy and Extractives      1232.0
10                                  HERproject       616.0
11                   Sustainability Management     10697.0

其中col是附加第2、3、4列的结果。。。Pageviews是附加相应Pageviews列的结果。。你知道吗


Tags: of数据名称df主题consumerfuturemanagement
2条回答

我认为您正在寻找某种^{}方法,而不是迭代(通常,在处理数据帧时,迭代是最后的手段,因为通常有矢量化方法来实现大多数数据重组任务)。你知道吗

以数据帧为例:

>>> df
                    0                        1                        2  \
0   Consumer Products        Consumer Products       Financial Services   
1       Collaboration  Future of Supply Chains       Financial Services   
2  Financial Services        Consumer Products            Collaboration   
3       Collaboration       Financial Services  Future of Supply Chains   
4   Consumer Products  Future of Supply Chains       Financial Services   

   Pageviews  
0       1210  
1       1528  
2       1716  
3       1403  
4       1090  

您可以执行以下操作:

new_df = (df.set_index('Pageviews')
          .stack()
          .reset_index(0))

>>> new_df
    Pageviews                        0
0        1210        Consumer Products
1        1210        Consumer Products
2        1210       Financial Services
3        1528            Collaboration
4        1528  Future of Supply Chains
5        1528       Financial Services
6        1716       Financial Services
7        1716        Consumer Products
8        1716            Collaboration
9        1403            Collaboration
10       1403       Financial Services
11       1403  Future of Supply Chains
12       1090        Consumer Products
13       1090  Future of Supply Chains
14       1090       Financial Services

使用melt

df.melt('Pageviews').drop('variable',1)
Out[644]: 
    Pageviews                 value
0        1210      ConsumerProducts
1        1528         Collaboration
2        1716     FinancialServices
3        1403         Collaboration
4        1090      ConsumerProducts
5        1210      ConsumerProducts
6        1528  FutureofSupplyChains
7        1716      ConsumerProducts
8        1403     FinancialServices
9        1090  FutureofSupplyChains
10       1210     FinancialServices
11       1528     FinancialServices
12       1716         Collaboration
13       1403  FutureofSupplyChains
14       1090     FinancialServices

相关问题 更多 >