Pandas分割列条目(彼此分隔)

2024-06-11 06:25:24 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个熊猫数据框,看起来像这样

| id | name | latlon | 0 sat -28,14 | -23, 12 | -21, 13...

latlon列条目包含多个纬度/经度条目,用|符号分隔,我需要将它们拆分为如下列表:lat = [-28,-23,-21] lon = [14,12,13]

运行以下命令将创建所有值的列表

sat_df["latlon"]= sat_df["latlon"].str.split("|", expand=False)

示例:indexnumber [-58.562242560404705,52.82662430990185, -61.300361184039964,64.0645716165538, -62.8683906074927,76.96557954998904, -63.078154849236505,90.49660509514713, -61.95530287454162,103.39930010176977, -59.727998547544765,114.629246065411, -56.63116878989326,124.07501384844198, -52.9408690779807,131.75498199669985, -48.85803704806645,137.9821558270659, -44.56621244973711,143.03546934613863, -40.08092215592037,147.27807367743728, -35.5075351924213,150.86679792543603,]

如何继续分割数据,使每个条目分别关联到整个数据帧的lat/lon列表。Alternativley,有没有办法创建两个列(lat/lon),这两个列都包含一个包含所有值的列表对象? 编辑:

import pandas as pd

sat_df = pd.DataFrame({'卫星编号': {0: 38858, 1: 5, 2: 16}, 'path': {0: '-2023240,1636954,-1409847|-2120945,1594435,-1311586|-2213791,1547970,-1209918|', 1: '8847,-974294,-168045|69303,-972089,-207786|129332,-963859,-246237|189050,-949637,-283483|', 2: '283880,751564,538726|214030,782804,550729|142133,808810,558964|69271,829348,563411|'}, 'latlon': {0: '-28.566504816706743,-58.42623323318429|-26.424915546197877,-58.03051668423269|-24.24957760771616,-57.709052434729294|-22.049419348341488,-57.45429550739338|-19.82765114196696,-57.258197633964414|-17.58719794818057,-57.113255687570714|-15.33074070109176,-57.01245109909582|-13.060755383916138,-56.949188922655416|-10.779548173615462,-56.91723753411087|-8.48928513939462,-56.910669632641685|-6.192021225701933,-56.92380598464241|-3.8897270110140494,-56.951159278680606|-1.5843114029280712,-56.987381318629815|0.7223533959819478,-57.02721062232328|3.028411197431552,-57.06542107180802|5.331999106238248,-57.09677071391785|7.631224662503422,-57.115951252231326|9.924144733525859,-57.11753523668981|12.20873984934678,-57.09592379302077|14.482890506579363,-57.045292032888945|16.744349099342163,-56.95953284633186|18.99070929829218,-56.83219872719919|', 1: '-9.826016080133869,71.12640824438319|-12.077961267269185,74.17040194928683|-14.251942328865088,77.22102880126546|-16.362232784638383,80.31943171515469|-18.372371674164317,83.43158582640798|-20.311489634835258,86.62273098947678|-22.14461262803909,89.85609377674561|-23.896490600856566,93.19765633031801|-25.53339979617313,96.60696767976263|-27.063070616439813,100.12254137641649|-28.488648081761962,103.78528610926675|-29.778331008010497,107.54645547637602|-30.942622037767002,111.47495996053523|-31.95152016226762,115.51397654947516|-32.80866797590735,119.73211812295206|-33.486858278098815,124.06227007574186|-33.98257678066123,128.57116785317814|-34.27304876808886,133.17990028392123|-34.34804732039687,137.91355482600457|-34.19053759979979,142.79776551711302|-33.788689805715364,147.73758823197466|-33.12248489727676,152.7937677542324|', 2: '34.00069374375586,-130.03583418452314|34.3070000099521,-125.16691893340256|34.37547230320849,-120.37930544344802|34.219644836708575,-115.72548686095767|33.8599777210809,-111.25048787484094|33.307236654159695,-106.89130089454063|32.579218893589676,-102.68672977394559|31.69071108398145,-98.63657044455137|30.663892680279847,-94.76720076317056|29.49498481622457,-91.01231662520239|28.20247456939903,-87.39472628213446|26.796048279088225,-83.90476041381801|25.29620394685256,-80.5572008057606|23.686627724590036,-77.28791855670698|21.984668849769005,-74.1108962902788|20.209508481020038,-71.0367205896831|18.337433788359615,-68.00383542959851|16.385207987194672,-65.02251732177939|14.355346635752394,-62.078279068092414|12.266387624465171,-59.17870114389838|10.087160866120724,-56.262880710180255|7.8348695447113235,-53.336971029542006|'}})

#splits latlon data into a list
sat_df.dropna(inplace=True)
sat_df["latlon"]= sat_df["latlon"].str.split("|", expand=False)

sat_df

#need to write each entries latlon list as two lists (alternating lat and lon)
lat = []
lon = []

#for sat_df["latlon"]:




Tags: 数据falsedf列表as条目satlatlon
2条回答

让我们从您的str.strip后退一步,并利用在pandas0.25中添加的explode

然后根据索引将其合并回来

df = sat_df['latlon'].str.split('|').explode().str.split(',',expand=True)

new_df = pd.merge(sat_df.drop('latlon',axis=1),
df,left_index=True,
right_index=True).rename(columns={0 : 'Lat', 1 : 'Lon'})

print(new_df.drop('path',axis=1))
     卫星编号                  Lat                  Lon
0   38858  -28.566504816706743   -58.42623323318429
0   38858  -26.424915546197877   -58.03051668423269
0   38858   -24.24957760771616  -57.709052434729294
0   38858  -22.049419348341488   -57.45429550739338
0   38858   -19.82765114196696  -57.258197633964414
..    ...                  ...                  ...
2      16   14.355346635752394  -62.078279068092414
2      16   12.266387624465171   -59.17870114389838
2      16   10.087160866120724  -56.262880710180255
2      16   7.8348695447113235  -53.336971029542006
2      16                                      None

为此,我们正在使用熊猫库

最初,我创建了一个数据帧,正如您所提到的

Code:

import pandas as pd

latlon = [-58.562242560404705,52.82662430990185, -61.300361184039964,64.0645716165538, -62.8683906074927,76.96557954998904, -63.078154849236505,90.49660509514713, -61.95530287454162,103.39930010176977, -59.727998547544765,114.629246065411, -56.63116878989326,124.07501384844198, -52.9408690779807,131.75498199669985, -48.85803704806645,137.9821558270659, -44.56621244973711,143.03546934613863, -40.08092215592037,147.27807367743728, -35.5075351924213,150.86679792543603,]
# print(latlon)
data = pd.DataFrame({'id':[0],'name':['sat'],'latlon':[latlon]})

print(data)

Output:

   id name                                             latlon
0   0  sat  [-58.562242560404705, 52.82662430990185, -61.3...

现在我已经将latlon转换为string以进行迭代,因为如果尝试迭代float值,可能会出现错误。然后将latitude和longitude值传递给数据帧的相应列

即使您的数据框中有更多任意数量的记录或行,此代码也可以工作

Code:

#splittint latlon and making adding the values to lat and lon columns
lats = []
lons = []
for i in range(len(data)):
    lat_lon = [str(x) for x in (data['latlon'].tolist()[i])]

    lat = []
    lon = []
    for i in range(len(lat_lon)):
        if i%2==0:
            lat.append(float(lat_lon[i]))
        else:
            lon.append(float(lat_lon[i]))
    lats.append(lat)
    lons.append(lon)

data = data.drop('latlon',axis=1)  #dropping latlon column
data.insert(2,'lat',lats)  #adding lat column
data.insert(3,'lon',lons)  #adding lon column
# print(data)
data  #displaying dataframe

Output:

    id  name    lat     lon
0   0   sat     [-58.562242560404705, -61.300361184039964, -62...   [52.82662430990185, 64.0645716165538, 76.96557...

我希望这会有帮助

相关问题 更多 >