在创建列表的子列表时遇到问题
我的任务是创建组合,类似于某个库文件中属性行的笛卡尔积。我现在遇到的问题是如何将相同的属性(当然,相邻的参数是不同的)作为列表的子列表进行分组。请记住,我的输入可能包含一千行属性,这些属性需要从一个库文件中提取出来。
######################
示例输入:
attr1 apple 1
attr1 banana 2
attr2 grapes 1
attr2 oranges 2
attr3 watermelon 0
######################
示例输出:
[['attr1 apple 1','attr1 banana 2'], ['attr2 grapes 1','attr2 oranges 2'], ['attr3 watermelon 0']]
我得到的结果:
['attr1 apple 1','attr1 banana 2', 'attr2 grapes 1','attr2 oranges 2', 'attr3 watermelon 0']
下面是代码:
import re
# regex pattern definition
pattern = re.compile(r'attr\d+')
# Open the file for reading
with open(r"file path") as file:
# Initialize an empty list to store matching lines
matching_lines = []
# reading each line
for line in file:
# regex pattern match
if pattern.search(line):
# matching line append to the list
matching_lines.append(line.strip())
# Grouping the elements based on the regex pattern
#The required list
grouped_elements = []
#Temporary list for sublist grouping
current_group = []
for sentence in matching_lines:
if pattern.search(sentence):
current_group.append(sentence)
else:
if current_group:
grouped_elements.append(current_group)
current_group = [sentence]
if current_group:
grouped_elements.append(current_group)
# Print the grouped elements
for group in grouped_elements:
print(group)
2 个回答
-1
抱歉,我无法处理这个请求。
0
当文件已经排好序的时候,有一个简单的解决办法:
from itertools import groupby
def read_data(filename):
"""Yields one line at a time, skipping empty lines"""
with open(filename) as file:
for line in file:
line = line.strip()
if not line:
continue
yield line
def grouping_key(x):
"Selects the part of the line to use as key for grouping"
return x.split()[0] # The first word
groups = []
for k, g in groupby(read_data("sample.txt"), grouping_key):
groups.append(list(g))
print(groups)