如何编写一个程序来计算相同元素的数量并将它们连接到一个数据帧中?

2024-04-20 10:40:04 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个由大量街道名称组成的数据框。我试图计算数据帧中每个元素的数量,并将它们放入一个新的数据帧中。然后我希望将所有新的数据帧连接到一个数据帧中。但是由于包含大量元素,我只能编写一个自动计数的程序。你知道吗

数据帧如下所示:

                                               Location
0                           BISHOPSGATE J/W HOUNDSDITCH
1                           BISHOPSGATE J/W HOUNDSDITCH
2                          LONDON WALL J/W CIRCUS PLACE
3                 VICTORIA EMBANKMENT J/W TEMPLE AVENUE
4                        HIGH HOLBORN J/W HATTON GARDEN
5            UPPER THAMES STREET J/W QUEEN STREET PLACE
6                        CANNON STREET J/W QUEEN STREET
7               QUEEN VICTORIA STREET J/W FRIDAY STREET
8                 KING WILLIAM STREET J/W ARTHUR STREET
9           LOWER THAMES STREET J/W LOWER THAMES STREET
10                     CORNHILL J/W KING WILLIAM STREET
11                     LONDON WALL J/W OLD BROAD STREET
12              QUEEN VICTORIA STREET J/W FRIDAY STREET
.
.
.
36735              MERIDIAN WAY J/W PICKETT'S LOCK LANE
36736                    WATERMEAD WAY J/W LEESIDE ROAD
36737                    WATERMEAD WAY J/W LEESIDE ROAD
36738                    WATERMEAD WAY J/W LEESIDE ROAD

我有以下代码:

condition = (df[['Location']] == 'BISHOPSGATE J/W HOUNDSDITCH').any(axis=1)
df2 = df[condition]

condition = (df[['Location']] == 'LONDON WALL J/W CIRCUS PLACE').any(axis=1)
df3 = df[condition]

以此类推,对于dataframe中的其余元素,我希望将所有df连接到一个dataframe中

df_final = pd.concat([df2, df3, df4...dfn], axis=1)

例如:

df2=

                          Location
0      BISHOPSGATE J/W HOUNDSDITCH
1      BISHOPSGATE J/W HOUNDSDITCH
46     BISHOPSGATE J/W HOUNDSDITCH
68     BISHOPSGATE J/W HOUNDSDITCH
23108  BISHOPSGATE J/W HOUNDSDITCH

df3=

                           Location
2      LONDON WALL J/W CIRCUS PLACE
4481   LONDON WALL J/W CIRCUS PLACE
4515   LONDON WALL J/W CIRCUS PLACE
13705  LONDON WALL J/W CIRCUS PLACE
13744  LONDON WALL J/W CIRCUS PLACE
23172  LONDON WALL J/W CIRCUS PLACE
32341  LONDON WALL J/W CIRCUS PLACE

df\最终=

df_final = pd.concat([df2, df3], axis=1)

                         Location                        Location
0     BISHOPSGATE J/W HOUNDSDITCH    NaN 
1     BISHOPSGATE J/W HOUNDSDITCH    NaN
2     NaN                            LONDON WALL J/W CIRCUS PLACE
46    BISHOPSGATE J/W HOUNDSDITCH    NaN
68    BISHOPSGATE J/W HOUNDSDITCH    NaN
4481  NaN                            LONDON WALL J/W CIRCUS PLACE
4515  NaN                            LONDON WALL J/W CIRCUS PLACE
13705 NaN                            LONDON WALL J/W CIRCUS PLACE
13744 NaN                            LONDON WALL J/W CIRCUS PLACE
23108 BISHOPSGATE J/W HOUNDSDITCH    NaN
23172 NaN                            LONDON WALL J/W CIRCUS PLACE
32341 NaN                            LONDON WALL J/W CIRCUS PLACE

基本上,这个连接的数据帧将继续增加大小,直到所有元素都被计数。你知道吗

正如您所看到的,由于有大量的元素(大约34000+),我需要编写一个程序来执行从第一个索引到最后一个索引的上述过程。有什么办法可以解决这个问题吗?你知道吗


Tags: 数据元素streetdfplacelocationnancondition