遍历列表中的列表然后创建数据框
我有一个包含子列表的列表。我想遍历所有的子列表,提取里面的数据,然后把这些数据保存到各自的列表中,最后再创建一个数据框(dataframe)。但是当我尝试这样做的时候,数据就混在一起了……
fulllist = [
[
{'Variable': 'First_Name',
'Answer': 'Anne'},
{'Variable': 'Middle_Name',
'Answer': 'Wanjohi'},
{'Variable': 'Age',
'Answer': '50'},
{'Variable': 'Country',
'Answer': 'Uganda'}],
[
{'Variable': 'First_Name',
'Answer': 'John'},
{'Variable': 'Middle_Name',
'Answer': 'Wagwara'},
{'Variable': 'Country',
'Answer': 'Kenya'}
],
[
{'Variable': 'First_Name',
'Answer': 'Jeff'},
{'Variable': 'Middle_Name',
'Answer': 'Simboyi'},
{'Variable': 'Age',
'Answer': '20'},
{'Variable': 'Country',
'Answer': 'UK'}],
[
{'Variable': 'First_Name',
'Answer': 'Ken'},
{'Variable': 'Middle_Name',
'Answer': 'Kumbua'},
{'Variable': 'Country',
'Answer': 'Tanzania'}
]
]
First_Name = []
Middle_Name = []
Age = []
Country = []
for i in range(len(fulllist)):
try:
First_Name.append(fulllist[i][0]['Answer'])
Middle_Name.append(fulllist[i][1]['Answer'])
Age.append(fulllist[i][2]['Answer'])
Country.append(fulllist[i][3]['Answer'])
except IndexError:
print(i)
print(Age)
print(Country)
3 个回答
1
import pandas as pd
fulllist = [
[
{'Variable': 'First_Name',
'Answer': 'Anne'},
{'Variable': 'Middle_Name',
'Answer': 'Wanjohi'},
{'Variable': 'Age',
'Answer': '50'},
{'Variable': 'Country',
'Answer': 'Uganda'}],
[
{'Variable': 'First_Name',
'Answer': 'John'},
{'Variable': 'Middle_Name',
'Answer': 'Wagwara'},
{'Variable': 'Country',
'Answer': 'Kenya'}
],
[
{'Variable': 'First_Name',
'Answer': 'Jeff'},
{'Variable': 'Middle_Name',
'Answer': 'Simboyi'},
{'Variable': 'Age',
'Answer': '20'},
{'Variable': 'Country',
'Answer': 'UK'}],
[
{'Variable': 'First_Name',
'Answer': 'Ken'},
{'Variable': 'Middle_Name',
'Answer': 'Kumbua'},
{'Variable': 'Country',
'Answer': 'Tanzania'}
]
]
lst = []
# Create a list consists of dictionaries with details of each person
for x in fulllist:
sub = {}
for y in x:
key = y['Variable']
value = y['Answer']
sub[key] = value
lst.append(sub)
# As 'Age' is not available for some of the data mention 'Not available' for them
for x in lst:
if 'Age' not in x.keys():
x['Age'] = 'Not available'
# Create modified_dct keys being First_Name and values being a list of other data
modified_dct ={}
for i, x in enumerate(lst):
sub = []
sub.append(x['Middle_Name'])
sub.append(x['Age'])
sub.append(x['Country'])
modified_dct[x['First_Name']] = sub
# Convert to dataframe
df = pd.DataFrame(modified_dct, index = ['Middle_Name', 'Age', 'Country'])
print(df)
'''Output:
Anne John Jeff Ken
Middle_Name Wanjohi Wagwara Simboyi Kumbua
Age 50 Not available 20 Not available
Country Uganda Kenya UK Tanzania
'''
当然可以!请把你想要翻译的内容发给我,我会帮你用简单易懂的语言解释清楚。
1
你遇到的问题是因为你假设fulllist中的每个子列表都包含相同的字典键。但是,从你的数据来看,并不是所有的子列表都有'Age'这个键。为了处理这种情况,你需要考虑到可能缺失的键。
下面是一个修改过的代码版本,它可以处理缺失的键:
import pandas as pd
fulllist = [
[
{'Variable': 'First_Name', 'Answer': 'Anne'},
{'Variable': 'Middle_Name', 'Answer': 'Wanjohi'},
{'Variable': 'Age', 'Answer': '50'},
{'Variable': 'Country', 'Answer': 'Uganda'}
],
[
{'Variable': 'First_Name', 'Answer': 'John'},
{'Variable': 'Middle_Name', 'Answer': 'Wagwara'},
{'Variable': 'Country', 'Answer': 'Kenya'}
],
[
{'Variable': 'First_Name', 'Answer': 'Jeff'},
{'Variable': 'Middle_Name', 'Answer': 'Simboyi'},
{'Variable': 'Age', 'Answer': '20'},
{'Variable': 'Country', 'Answer': 'UK'}
],
[
{'Variable': 'First_Name', 'Answer': 'Ken'},
{'Variable': 'Middle_Name', 'Answer': 'Kumbua'},
{'Variable': 'Country', 'Answer': 'Tanzania'}
]
]
First_Name = []
Middle_Name = []
Age = []
Country = []
for sublist in fulllist:
temp_dict = {d['Variable']: d['Answer'] for d in sublist}
First_Name.append(temp_dict.get('First_Name', ''))
Middle_Name.append(temp_dict.get('Middle_Name', ''))
Age.append(temp_dict.get('Age', ''))
Country.append(temp_dict.get('Country', ''))
df = pd.DataFrame({'First_Name': First_Name, 'Middle_Name': Middle_Name, 'Age': Age, 'Country': Country})
print(df)
2
在fulllist中的每个子列表里,元素的顺序是固定的。但是在某些情况下,比如第二个子列表里,没有'Age'这个变量,这样就会导致出现IndexError
的错误。为了避免这个问题,我们可以采取一种更稳妥的方法,就是遍历每个子列表,根据'Variable'这个关键字动态提取值。
# Initialize empty dictionaries for each variable
data = {'First_Name': [], 'Middle_Name': [], 'Age': [], 'Country': []}
# Iterate over each sublist
for sublist in fulllist:
# Initialize variables to None
first_name = middle_name = age = country = None
# Iterate over each dictionary in the sublist
for item in sublist:
if item['Variable'] == 'First_Name':
first_name = item['Answer']
elif item['Variable'] == 'Middle_Name':
middle_name = item['Answer']
elif item['Variable'] == 'Age':
age = item['Answer']
elif item['Variable'] == 'Country':
country = item['Answer']
# Append values to respective lists
data['First_Name'].append(first_name)
data['Middle_Name'].append(middle_name)
data['Age'].append(age)
data['Country'].append(country)
# Create DataFrame
df = pd.DataFrame(data)
print(df)
输出-
First_Name Middle_Name Age Country
0 Anne Wanjohi 50 Uganda
1 John Wagwara None Kenya
2 Jeff Simboyi 20 UK
3 Ken Kumbua None Tanzania