如何获取列表中的某些特定元素？那些以4位数字开头的数字和前0，以及之后的一些行？问题的回答

如何获取列表中的某些特定元素？那些以4位数字开头的数字和前0，以及之后的一些行？

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

我有一个文本，我需要从中提取一些信息。这是一个列表，我在下面看到了我文本的一部分： <pre><code> lines=[ '0021 Literacy and numeracy \n', '\n', 'Literacy and numeracy are programmes or qualifications arranged mainly for adults, designed \n', 'to teach fundamental skills in reading, writing and arithmetic. The typical age range of \n', 'participants can be used to distinguish between detailed field 0011 ‘Basic programmes and \n', 'qualifications’ and this detailed field. \n', '\n', 'Programmes and qualifications with the following main content are classified here: \n', '\n', '0031 Personal skills \n', '\n', 'Personal skills are defined by reference to the effects on the individual’s capacity (mental, \n', 'social etc.). This detailed field covers personal skills programmes not included in 0011 ‘Basic \n', 'programmes and qualifications’ or 0021 ‘Literacy and numeracy’, giving key competencies and \n', 'transferable skills. \n', '\n', 'Programmes and qualifications with the following main content are classified here: \n', '\n'] </code></pre> 输出为两个列表： 1-我想收集所有以4位数开始的行（第一位数始终为0）以及2-之后的一段。请注意，段落可以位于列表的不同项目中。列表中的每个元素在我的文本中都是一行。因此，当我到达<code>\n</code>时，一个段落就结束了（因此它是一个嵌套列表）。这是我想要得到的输出： <pre><code>G= [ ['0021 Literacy and numeracy \n','0031 Personal skills \n'] G1=[['Literacy and numeracy are programmes or qualifications arranged mainly for adults, designed \n', 'to teach fundamental skills in reading, writing and arithmetic. The typical age range of \n', 'participants can be used to distinguish between detailed field 0011 ‘Basic programmes and \n', 'qualifications’ and this detailed field. \n'], ['Personal skills are defined by reference to the effects on the individual’s capacity (mental, \n', 'social etc.). This detailed field covers personal skills programmes not included in 0011 ‘Basic \n', 'programmes and qualifications’ or 0021 ‘Literacy and numeracy’, giving key competencies and \n', 'transferable skills. \n', '\n',]] </code></pre> 这是我试过的，但我真的不知道为什么它不起作用 <pre><code> definition=[] ocupation=[] for l,i in enumerate(lines): if re.findall(r'd\d\d\d',i)!='': ocupation.append(i) for j in range(10): def1=[] while lines[l+2+j]!='\n': def1.append(lines[l+j]) definition.append(def1) </code></pre> 这行<code>if re.findall(r'd\d\d\d',i)!='':</code>不太好用。我希望4位数字在开头，以0开头，但这样不行

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

我运行了你的代码，尽我最大的努力减少更改。我不提供新的解决方案，因为我想让你更多地了解代码中的问题，也许对你的学习代码有一些帮助 <pre class="lang-py prettyprint-override"><code>lines=[ '0021 Literacy and numeracy \n', '\n', 'Literacy and numeracy are programmes or qualifications arranged mainly for adults, designed \n', 'to teach fundamental skills in reading, writing and arithmetic. The typical age range of \n', 'participants can be used to distinguish between detailed field 0011 ‘Basic programmes and \n', 'qualifications’ and this detailed field. \n', '\n', 'Programmes and qualifications with the following main content are classified here: \n', '\n', '0031 Personal skills \n', '\n', 'Personal skills are defined by reference to the effects on the individual’s capacity (mental, \n', 'social etc.). This detailed field covers personal skills programmes not included in 0011 ‘Basic \n', 'programmes and qualifications’ or 0021 ‘Literacy and numeracy’, giving key competencies and \n', 'transferable skills. \n', '\n', 'Programmes and qualifications with the following main content are classified here: \n', '\n'] import re definition=[] ocupation=[] for l,i in enumerate(lines): #the findall always return match group,if not match the group will be 0 #so make it easy we can use match,and the regex can replace with '\d{4}' if re.match(r'\d\d\d\d',i) is not None: ocupation.append(i) #should init variable before for loop def1=[] for j in range(2,10): #in your case the 6th line is a '\n' too,so I change the test case to double '\n' if lines[l+j] == '\n' and lines[l+j+2] == '\n': break #bypass any '\n' elif lines[l+j]=='\n': continue else: def1.append(lines[l+j]) definition.append(def1) </code></pre>

如何获取列表中的某些特定元素？那些以4位数字开头的数字和前0，以及之后的一些行？

1 个回答

相关Python问题