如何获取列表中的某些特定元素?那些以4位数字开头的数字和前0,以及之后的一些行?

2024-04-28 09:58:35 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个文本,我需要从中提取一些信息。这是一个列表,我在下面看到了我文本的一部分:

 lines=[   '0021   Literacy and numeracy \n',
 '\n',
 'Literacy and numeracy are programmes or qualifications arranged mainly for adults, designed \n',
 'to  teach  fundamental  skills  in  reading,  writing  and  arithmetic.  The  typical  age  range  of \n',
 'participants  can  be  used  to  distinguish  between  detailed  field  0011  ‘Basic  programmes  and \n',
 'qualifications’ and this detailed field.  \n',
 '\n',
 'Programmes and qualifications with the following main content are classified here: \n',
 '\n',
'0031   Personal skills \n',
 '\n',
 'Personal  skills  are  defined  by  reference  to  the  effects  on  the  individual’s  capacity  (mental, \n',
 'social  etc.).  This  detailed  field  covers  personal  skills  programmes  not  included  in  0011  ‘Basic \n',
 'programmes and qualifications’ or 0021 ‘Literacy and numeracy’, giving key competencies and \n',
 'transferable skills.  \n',
 '\n',
 'Programmes and qualifications with the following main content are classified here: \n',
 '\n']

输出为两个列表:

1-我想收集所有以4位数开始的行(第一位数始终为0)以及2-之后的一段。请注意,段落可以位于列表的不同项目中。列表中的每个元素在我的文本中都是一行。因此,当我到达\n时,一个段落就结束了(因此它是一个嵌套列表)。这是我想要得到的输出:

G= [ ['0021   Literacy and numeracy \n','0031   Personal skills \n']

G1=[['Literacy and numeracy are programmes or qualifications arranged mainly for adults, designed \n',
     'to  teach  fundamental  skills  in  reading,  writing  and  arithmetic.  The  typical  age  range  of \n',
     'participants  can  be  used  to  distinguish  between  detailed  field  0011  ‘Basic  programmes  and \n',
     'qualifications’ and this detailed field.  \n'], ['Personal  skills  are  defined  by  reference  to  the  effects  on  the  individual’s  capacity  (mental, \n',
 'social  etc.).  This  detailed  field  covers  personal  skills  programmes  not  included  in  0011  ‘Basic \n',
 'programmes and qualifications’ or 0021 ‘Literacy and numeracy’, giving key competencies and \n',
 'transferable skills.  \n',
 '\n',]]

这是我试过的,但我真的不知道为什么它不起作用


    definition=[]
    ocupation=[]
    for l,i in enumerate(lines):
       if re.findall(r'd\d\d\d',i)!='':
            ocupation.append(i)
            for j in range(10):
                def1=[]
                while lines[l+2+j]!='\n':
                    def1.append(lines[l+j])
            definition.append(def1)

这行if re.findall(r'd\d\d\d',i)!='':不太好用。我希望4位数字在开头,以0开头,但这样不行


Tags: orandthetoinfield列表are
3条回答
lines = [
    '0021   Literacy and numeracy \n',
    '\n',
    'Literacy and numeracy are programmes or qualifications arranged mainly for adults, designed \n',
    'to  teach  fundamental  skills  in  reading,  writing  and  arithmetic.  The  typical  age  range  of \n',
    'participants  can  be  used  to  distinguish  between  detailed  field  0011  ‘Basic  programmes  and \n',
    'qualifications’ and this detailed field.  \n',
    '\n',
    'Programmes and qualifications with the following main content are classified here: \n',
    '\n',
    '0031   Personal skills \n',
    '\n',
    'Personal  skills  are  defined  by  reference  to  the  effects  on  the  individual’s  capacity  (mental, \n',
    'social  etc.).  This  detailed  field  covers  personal  skills  programmes  not  included  in  0011  ‘Basic \n',
    'programmes and qualifications’ or 0021 ‘Literacy and numeracy’, giving key competencies and \n',
    'transferable skills.  \n',
    '\n',
    'Programmes and qualifications with the following main content are classified here: \n',
    '\n']
definition = []
ocupation = []
for l, i in enumerate(lines):
    if i[:4].isnumeric():
        ocupation.append(i)
        def1 = []
        for j in lines[l+2:]:
            if j == '\n':
                break
            def1.append(j)
        definition.append(def1)

print(ocupation, definition, sep='\n')

输出:

['0021   Literacy and numeracy \n', '0031   Personal skills \n']
[['Literacy and numeracy are programmes or qualifications arranged mainly for adults, designed \n', 'to  teach  fundamental  skills  in  reading,  writing  and  arithmetic.  The  typical  age  range  of \n', 'participants  can  be  used  to  distinguish  between  detailed  field  0011  ‘Basic  programmes  and \n', 'qualifications’ and this detailed field.  \n'], ['Personal  skills  are  defined  by  reference  to  the  effects  on  the  individual’s  capacity  (mental, \n', 'social  etc.).  This  detailed  field  covers  personal  skills  programmes  not  included  in  0011  ‘Basic \n', 'programmes and qualifications’ or 0021 ‘Literacy and numeracy’, giving key competencies and \n', 'transferable skills.  \n']]

我运行了你的代码,尽我最大的努力减少更改。我不提供新的解决方案,因为我想让你更多地了解代码中的问题,也许对你的学习代码有一些帮助

lines=[   '0021   Literacy and numeracy \n',
 '\n',
 'Literacy and numeracy are programmes or qualifications arranged mainly for adults, designed \n',
 'to  teach  fundamental  skills  in  reading,  writing  and  arithmetic.  The  typical  age  range  of \n',
 'participants  can  be  used  to  distinguish  between  detailed  field  0011  ‘Basic  programmes  and \n',
 'qualifications’ and this detailed field.  \n',
 '\n',
 'Programmes and qualifications with the following main content are classified here: \n',
 '\n',
'0031   Personal skills \n',
 '\n',
 'Personal  skills  are  defined  by  reference  to  the  effects  on  the  individual’s  capacity  (mental, \n',
 'social  etc.).  This  detailed  field  covers  personal  skills  programmes  not  included  in  0011  ‘Basic \n',
 'programmes and qualifications’ or 0021 ‘Literacy and numeracy’, giving key competencies and \n',
 'transferable skills.  \n',
 '\n',
 'Programmes and qualifications with the following main content are classified here: \n',
 '\n']

import re
definition=[]
ocupation=[]
for l,i in enumerate(lines):
    #the findall always return match group,if not match the group will be 0
    #so make it easy we can use match,and the regex can replace with '\d{4}'
    if re.match(r'\d\d\d\d',i) is not None:
        ocupation.append(i)

        #should init variable before for loop
        def1=[]
        for j in range(2,10):
            #in your case the 6th line is a '\n' too,so I change the test case to double '\n'
            if lines[l+j] == '\n' and lines[l+j+2] == '\n':
                break
            #bypass any '\n'
            elif lines[l+j]=='\n':
                continue
            else:
                def1.append(lines[l+j])
        definition.append(def1)

我尽可能简化并调试了您的代码,并考虑添加一个新答案,请尝试以下方法:

lines = [
    '0021   Literacy and numeracy \n',
    '\n',
    'Literacy and numeracy are programmes or qualifications arranged mainly for adults, designed \n',
    'to  teach  fundamental  skills  in  reading,  writing  and  arithmetic.  The  typical  age  range  of \n',
    'participants  can  be  used  to  distinguish  between  detailed  field  0011  ‘Basic  programmes  and \n',
    'qualifications’ and this detailed field.  \n',
    '\n',
    'Programmes and qualifications with the following main content are classified here: \n',
    '\n',
    '0031   Personal skills \n',
    '\n',
    'Personal  skills  are  defined  by  reference  to  the  effects  on  the  individual’s  capacity  (mental, \n',
    'social  etc.).  This  detailed  field  covers  personal  skills  programmes  not  included  in  0011  ‘Basic \n',
    'programmes and qualifications’ or 0021 ‘Literacy and numeracy’, giving key competencies and \n',
    'transferable skills.  \n',
    '\n',
    'Programmes and qualifications with the following main content are classified here: \n',
    '\n']
definition = []
ocupation = []
for l, i in enumerate(lines):
    if i[:4].isnumeric():
        ocupation += [i]
        definition += [lines[l+2:lines.index('\n', l+2)] if '\n' in lines[l+2:] else lines[l+2:]]
print(ocupation, definition, sep='\n')

相关问题 更多 >