如何使用python处理数据集?

2024-04-19 22:50:30 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个输入数据集名称数据.csv 内容是

id ,   name
1  ,  Jone/Elvis/Tom
2  ,  Elvis/Tonny

名称列使用斜杠作为分隔符 我需要这个过程数据.csv,我的预期输出是

id, Jone, Elvis, Tom, Toony
1,   1  ,  1   ,  1 ,  0
2,   0  ,  1   ,  0 ,  1

1表示名称中已存在列名,0表示不存在。 如何使用python和pandas来传输输入?你知道吗


Tags: csv数据name名称id内容pandas过程
3条回答
import  pandas as pd;
df = pd.read_csv("test.csv")

def getDfIds(df):
    ids = []
    for i in df.index:
        ids.append(df.iloc[i,0])
    return ids

# create  headers
def createHeaders(df,ids):
    headers = []
    for i in df.index: 
        names = (df.iloc[i,1]).split('/')
        for index in range(len(names)):
            headers.append(names[index].strip())
    headers = list(set(headers))
    headers.insert(0,"id")
    return headers

# create body
def createBody(df,headers,ids):
    # set default values 0
    data = [[0 for i in range(len(headers))] for j in range(len(df.index))]

    for i in df.index: 
        data[i][0] = ids[i]
        names = (df.iloc[i,1]).split('/')
        for ind in range(len(names)):
            name = names[ind].strip()
            inde = headers.index(name)
            data[i][inde] = 1
    return data

ids = getDfIds(df)
headers = createHeaders(df,ids)
body = createBody(df,headers,ids)

# create new data set
df = pd.DataFrame(body, columns = headers)
print df; 

让我们将pandas和.str.get_dummiessep参数一起使用:

从剪贴板读入数据帧

df = pd.read_clipboard(sep='\s+\,\s+')
df

输入数据帧:

   id            name
0   1  Jone/Elvis/Tom
1   2     Elvis/Tonny

设置索引并将字符串访问器与get_dummies一起使用:

df1 = df.set_index('id')    
df1['name'].str.get_dummies(sep='/').reset_index()

输出:

   id  Elvis  Jone  Tom  Tonny
0   1      1     1    1      0
1   2      1     0    0      1
import pandas as pd

data = pd.read_csv("./data.csv")
data["name"]= data["name"].str.split("/")

jone = [0, 0]
elvis = [0, 0]
tom = [0, 0]
tonny = [0, 0]

for i in data.index:
    if any("Jone" in s for s in data.name[i]):
        jone[i] = 1
    else:
        jone[i] = 0

for i in data.index:
    if any("Elvis" in s for s in data.name[i]):
        elvis[i] = 1
    else:
        elvis[i] = 0

for i in data.index:
    if any("Tom" in s for s in data.name[i]):
        tom[i] = 1
    else:
        tom[i] = 0

for i in data.index:
    if any("Tonny" in s for s in data.name[i]):
        tonny[i] = 1
    else:
        tonny[i] = 0

data['Jone'] = jone
data['Elvis'] = elvis
data['Tom'] = tom
data['Tonny'] = tonny

enter image description here

相关问题 更多 >