使用来自文件名的唯一标签创建pandas数据帧

2024-03-29 00:55:17 发布

您现在位置:Python中文网/ 问答频道 /正文

我是熊猫的新手,我正试图用它来创建一个卷积神经网络的数据集。我想要实现的是一个DataFrame,其中每个列表示数据项的标签。在

首先,我找到所有的数据项,并按它们各自的路径读成两个dict

video_path='/home/richard/Documents/datasets/ucf_sports/mod'

all_videos_path = []
all_videos = []

for root, dirs, files in os.walk(video_path):
    for file in files:
        if file.endswith(".avi"):
            all_videos.append(os.path.join(root, file))
            all_videos_path.append(root)

因此all_videos_path输出如下:

^{pr2}$

然后使用以下方法查找数据项的标签:

all_labels = map(lambda x: x.split('/')[8], all_videos_path)

然后,我发现独特的标签使用:

unique_labels = np.unique(all_labels)

输出:

array(['GolfSwing','Lifting'], 
  dtype='|S13')

然后我创建了一系列独特的标签:

label_dict = pd.Series(range(len(unique_labels)), index=unique_labels)

输出:

GolfSwing        0
Lifting          1
dtype: int64

所以现在我想创建一个DataFrame,它以惟一的标签作为列标题,将所有的数据项排序到各自的列中。如您所见,有些类别具有不同数量的数据,因此每个列需要有不同的行。我一直在尝试创建一个数据帧,但没有成功。这在熊猫身上真的可以实现吗?如果可以,我该怎么做?在

提前谢谢。在


Tags: 数据pathdataframeforlabelsvideorootfiles
1条回答
网友
1楼 · 发布于 2024-03-29 00:55:17

IIUC您要按^{}旋转数据帧。但是不同的行是有问题的-您可以得到NaN值:

import pandas as pd

all_videos_path = ['/home/richard/Documents/datasets/ucf_sports/mod/GolfSwing/Golf-Swing-Side/004',
'/home/richard/Documents/datasets/ucf_sports/mod/GolfSwing/Golf-Swing-Side/001',
'/home/richard/Documents/datasets/ucf_sports/mod/GolfSwing/Golf-Swing-Side/003',
'/home/richard/Documents/datasets/ucf_sports/mod/Lifting/004',
'/home/richard/Documents/datasets/ucf_sports/mod/Lifting/001']

#create dataframe with list all_videos_path
df =  pd.DataFrame({'links': all_videos_path})
#create new column with labels
df['labels'] = df['links'].str.split('/').str[7]
print df
                                               links     labels
0  /home/richard/Documents/datasets/ucf_sports/mo...  GolfSwing
1  /home/richard/Documents/datasets/ucf_sports/mo...  GolfSwing
2  /home/richard/Documents/datasets/ucf_sports/mo...  GolfSwing
3  /home/richard/Documents/datasets/ucf_sports/mo...    Lifting
4  /home/richard/Documents/datasets/ucf_sports/mo...    Lifting

#
df = df.pivot(index='links', columns='labels', values='labels').reset_index()
print df
labels                                              links  GolfSwing  Lifting
0       /home/richard/Documents/datasets/ucf_sports/mo...  GolfSwing      NaN
1       /home/richard/Documents/datasets/ucf_sports/mo...  GolfSwing      NaN
2       /home/richard/Documents/datasets/ucf_sports/mo...  GolfSwing      NaN
3       /home/richard/Documents/datasets/ucf_sports/mo...        NaN  Lifting
4       /home/richard/Documents/datasets/ucf_sports/mo...        NaN  Lifting

df.loc[df['GolfSwing'].notnull() , 'GolfSwing'] = df['links']
df.loc[df['Lifting'].notnull() , 'Lifting'] = df['links']
del df['links']
^{pr2}$

相关问题 更多 >