如何迭代数据帧单元格的字符串?

2024-06-16 09:28:56 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个数据框,每个单元格中都有文本。我想迭代数据帧及其单元格的单个字符,并用0填充一个列表,0代表空白,1代表字符。我尝试了itertuples、iterrows和iteritems,但我无法访问字符串的每个字符

crispr = pd.DataFrame({'Name': ['Bob', 'Jane', 'Alice'], 
                       'Issue': ['Handling data', 'Could not read sample', 'No clue'],
                       'Comment': ['Need to revise data', 'sample preparation', 'could not find out where problem occurs']})

我尝试的是:

dflist = []
countchar= 0
for i,j in crispr.iteritems():
    for x in range(len(j)):
        test = j[countchar].isspace()
        countchar+=1
        if test == True:
            dflist.append(0)
        else:
            dflist.append(1)

我试图弄清楚它是否适用于itertuples或iterrows()

for i in crispr.itertuples():
    for j in i:
        for b in j:
            print(b)

出现以下错误:

 TypeError: 'int' object is not iterable  

预期输出是一个列表,其中1表示字符,0表示空白:

dflist = [[1,1,1], [1,1,1,1], [1,1,1,1,1]],[[1,1,1,1,1,1,1,0,1,1,1,1], ...]]

Tags: 数据in列表fordatanot代表字符
1条回答
网友
1楼 · 发布于 2024-06-16 09:28:56

您发布的代码(上次编辑之前)有错误,其中有许多未知的内容,会导致与您发布的代码不同的错误。我将您的代码修改为:

dflist = []                    # added this
for i,j in crispr.iteritems():
    for x in range(len(j)):
        test = j[x].isspace()  # changed countchar to x
        # countchar+=1         # removed this
        if test == True:
            dflist.append(0)
        else:
            dflist.append(1)

for i in crispr.itertuples():
    for j in i:
        for b in j:  # this procudes your error
            print(b)

如果您检查j的第一项,您将看到其值为0-因此出现错误。您不能迭代0

解决方案:

import pandas as pd

crispr = pd.DataFrame({
    'Name': ['Bob', 'Jane', 'Alice'],
    'Issue': ['Handling data', 'Could not read sample', 'No clue'],
    'Comment': ['Need to revise data', 'sample preparation', 
                'could not find out where problem occurs']})

print(crispr)
outer_list = []
for i,j in crispr.iteritems():
    dflist = []
    for word in j:
        wordlist = [] 
        for char in word:
            if char.isspace():
                wordlist.append(0)
            else:
                wordlist.append(1)
        dflist.append(wordlist)
    outer_list.append(dflist)

print(outer_list)

输出(为清晰起见添加了注释):

                                   Comment                  Issue   Name
0                      Need to revise data          Handling data    Bob
1                       sample preparation  Could not read sample   Jane
2  could not find out where problem occurs                No clue  Alice

# Comment
[[[1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1], 
  [1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 
  [1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 
   1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1]], 
 # Issue
 [[1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1], 
  [1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1], 
  [1, 1, 0, 1, 1, 1, 1]],
 # Name 
 [[1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1, 1]]]

你应该做你想做的

相关问题 更多 >