遍历列表中的列表然后创建数据框

1 投票
3 回答
66 浏览
提问于 2025-04-14 16:41

我有一个包含子列表的列表。我想遍历所有的子列表,提取里面的数据,然后把这些数据保存到各自的列表中,最后再创建一个数据框(dataframe)。但是当我尝试这样做的时候,数据就混在一起了……


fulllist = [
[
{'Variable': 'First_Name',
 'Answer': 'Anne'},
{'Variable': 'Middle_Name',
 'Answer': 'Wanjohi'},
{'Variable': 'Age',
 'Answer': '50'},
{'Variable': 'Country',
 'Answer': 'Uganda'}],

[
{'Variable': 'First_Name',
 'Answer': 'John'},
{'Variable': 'Middle_Name',
 'Answer': 'Wagwara'},
{'Variable': 'Country',
 'Answer': 'Kenya'}
],

[
{'Variable': 'First_Name',
 'Answer': 'Jeff'},
{'Variable': 'Middle_Name',
 'Answer': 'Simboyi'},
{'Variable': 'Age',
 'Answer': '20'},
{'Variable': 'Country',
 'Answer': 'UK'}],

 [
{'Variable': 'First_Name',
 'Answer': 'Ken'},
{'Variable': 'Middle_Name',
 'Answer': 'Kumbua'},
{'Variable': 'Country',
 'Answer': 'Tanzania'}
]

]

First_Name = [] 
Middle_Name = []
Age = []
Country = []


for i in range(len(fulllist)):
    try:
        First_Name.append(fulllist[i][0]['Answer'])
        Middle_Name.append(fulllist[i][1]['Answer'])
        Age.append(fulllist[i][2]['Answer'])
        Country.append(fulllist[i][3]['Answer'])
    except IndexError:
        print(i)

print(Age)
print(Country)

3 个回答

1
import pandas as pd


fulllist = [
[
{'Variable': 'First_Name',
 'Answer': 'Anne'},
{'Variable': 'Middle_Name',
 'Answer': 'Wanjohi'},
{'Variable': 'Age',
 'Answer': '50'},
{'Variable': 'Country',
 'Answer': 'Uganda'}],

[
{'Variable': 'First_Name',
 'Answer': 'John'},
{'Variable': 'Middle_Name',
 'Answer': 'Wagwara'},
{'Variable': 'Country',
 'Answer': 'Kenya'}
],

[
{'Variable': 'First_Name',
 'Answer': 'Jeff'},
{'Variable': 'Middle_Name',
 'Answer': 'Simboyi'},
{'Variable': 'Age',
 'Answer': '20'},
{'Variable': 'Country',
 'Answer': 'UK'}],

 [
{'Variable': 'First_Name',
 'Answer': 'Ken'},
{'Variable': 'Middle_Name',
 'Answer': 'Kumbua'},
{'Variable': 'Country',
 'Answer': 'Tanzania'}
]

]

lst = []

# Create a list consists of dictionaries with details of each person
for x in fulllist:
    sub = {}
    for y in x:
        key = y['Variable']
        value = y['Answer']
        sub[key] = value
    lst.append(sub)

# As 'Age' is not available for some of the data mention 'Not available' for them
for x in lst:
    if 'Age' not in x.keys():
        x['Age'] = 'Not available'

# Create modified_dct keys being First_Name and values being a list of other data
modified_dct ={}
for i, x in enumerate(lst):
    sub = []
    sub.append(x['Middle_Name'])
    sub.append(x['Age'])
    sub.append(x['Country'])
    modified_dct[x['First_Name']] = sub

# Convert to dataframe
df = pd.DataFrame(modified_dct, index = ['Middle_Name', 'Age', 'Country'])

print(df)


     
'''Output:
                Anne           John     Jeff            Ken
Middle_Name  Wanjohi        Wagwara  Simboyi         Kumbua
Age               50  Not available       20  Not available
Country       Uganda          Kenya       UK       Tanzania
'''

当然可以!请把你想要翻译的内容发给我,我会帮你用简单易懂的语言解释清楚。

1

你遇到的问题是因为你假设fulllist中的每个子列表都包含相同的字典键。但是,从你的数据来看,并不是所有的子列表都有'Age'这个键。为了处理这种情况,你需要考虑到可能缺失的键。

下面是一个修改过的代码版本,它可以处理缺失的键:

import pandas as pd

fulllist = [
    [
        {'Variable': 'First_Name', 'Answer': 'Anne'},
        {'Variable': 'Middle_Name', 'Answer': 'Wanjohi'},
        {'Variable': 'Age', 'Answer': '50'},
        {'Variable': 'Country', 'Answer': 'Uganda'}
    ],
    [
        {'Variable': 'First_Name', 'Answer': 'John'},
        {'Variable': 'Middle_Name', 'Answer': 'Wagwara'},
        {'Variable': 'Country', 'Answer': 'Kenya'}
    ],
    [
        {'Variable': 'First_Name', 'Answer': 'Jeff'},
        {'Variable': 'Middle_Name', 'Answer': 'Simboyi'},
        {'Variable': 'Age', 'Answer': '20'},
        {'Variable': 'Country', 'Answer': 'UK'}
    ],
    [
        {'Variable': 'First_Name', 'Answer': 'Ken'},
        {'Variable': 'Middle_Name', 'Answer': 'Kumbua'},
        {'Variable': 'Country', 'Answer': 'Tanzania'}
    ]
]

First_Name = [] 
Middle_Name = []
Age = []
Country = []

for sublist in fulllist:
    temp_dict = {d['Variable']: d['Answer'] for d in sublist}
    First_Name.append(temp_dict.get('First_Name', ''))
    Middle_Name.append(temp_dict.get('Middle_Name', ''))
    Age.append(temp_dict.get('Age', ''))
    Country.append(temp_dict.get('Country', ''))

df = pd.DataFrame({'First_Name': First_Name, 'Middle_Name': Middle_Name, 'Age': Age, 'Country': Country})
print(df)
2

在fulllist中的每个子列表里,元素的顺序是固定的。但是在某些情况下,比如第二个子列表里,没有'Age'这个变量,这样就会导致出现IndexError的错误。为了避免这个问题,我们可以采取一种更稳妥的方法,就是遍历每个子列表,根据'Variable'这个关键字动态提取值。

# Initialize empty dictionaries for each variable
data = {'First_Name': [], 'Middle_Name': [], 'Age': [], 'Country': []}

# Iterate over each sublist
for sublist in fulllist:
    # Initialize variables to None
    first_name = middle_name = age = country = None
    # Iterate over each dictionary in the sublist
    for item in sublist:
        if item['Variable'] == 'First_Name':
            first_name = item['Answer']
        elif item['Variable'] == 'Middle_Name':
            middle_name = item['Answer']
        elif item['Variable'] == 'Age':
            age = item['Answer']
        elif item['Variable'] == 'Country':
            country = item['Answer']
    
    # Append values to respective lists
    data['First_Name'].append(first_name)
    data['Middle_Name'].append(middle_name)
    data['Age'].append(age)
    data['Country'].append(country)

# Create DataFrame
df = pd.DataFrame(data)
print(df)

输出-

  First_Name Middle_Name   Age   Country
0       Anne     Wanjohi    50    Uganda
1       John     Wagwara  None     Kenya
2       Jeff     Simboyi    20        UK
3        Ken      Kumbua  None  Tanzania

撰写回答