regex将列表中的项目分开

2024-06-02 07:03:00 发布

您现在位置:Python中文网/ 问答频道 /正文

我需要从列表中分离项目,这意味着我有一个项目列表,但其中一些项目也是列表。我需要一些方法来分离列表中的列表,同时保留原始列表中的所有项目

输入如下:

['lexapro, losartan, lunesta, hormonal supplements', 'prozac 10mg', 'mesalamine, entiviyo, various vitamin supplements', 'none', 'none', '', 'spironolactone 100mg twice a day', 'zoloft', 'blood pressure medications', 'sleep, cholestral, migrain, phentermine']

我希望输出是:

['lexapro', 'losartan', 'lunesta', 'hormonal supplements', 'prozac 10mg', 'mesalamine', 'entiviyo', 'various vitamin supplements', 'none', 'none', 'spironolactone 100mg twice a day', 'zoloft', 'blood pressure medications', 'sleep', 'cholestral', 'migrain', 'phentermine']

我用过这个:

separate = re.findall(r'(\d+)(,\s*\d+)*', medicine_list)

没有运气medicine_list是原始列表。有什么想法吗


Tags: 项目none列表varioustwicevitaminsupplementsmesalamine
2条回答

我认为您需要的是一些基本的文本解析技术和一点点正则表达式。下面是一个例子:

import re

# Raw data.
orig_meds = [
    'lexapro, losartan, lunesta, hormonal supplements',
    'prozac 10mg',
    'mesalamine, entiviyo, various vitamin supplements',
    'none',
    'none',
    '',
    'spironolactone 100mg twice a day',
    'zoloft',
    'blood pressure medications',
    'sleep, cholestral, migrain, phentermine',
]

# Simple regex to split on commas, optionally followed by spaces.
rgx = re.compile(r', *')

# A set of stuff we might not want to keep.
exclude = {'', 'none'}

# Parse.
meds = [
    m
    for om in orig_meds
    for m in rgx.split(om)
    if m not in exclude
]

# Check.
for m in meds:
    print(m)

输出:

lexapro
losartan
lunesta
hormonal supplements
prozac 10mg
mesalamine
entiviyo
various vitamin supplements
spironolactone 100mg twice a day
zoloft
blood pressure medications
sleep
cholestral
migrain
phentermine

这可以通过使用循环和拆分功能轻松完成

medicine_list = ['lexapro, losartan, lunesta, hormonal supplements', 'prozac 10mg', 'mesalamine, entiviyo, various vitamin supplements', 'none', 'none', '', 'spironolactone 100mg twice a day', 'zoloft', 'blood pressure medications', 'sleep, cholestral, migrain, phentermine']

list_of_lists = [w.split(",") for w in medicine_list] # convert to list of lists
print([item for sublist in list_of_lists for item in sublist]) 
#traverse sublists and pick items in them one by one and put them 
#in one final list using list comprehension

相关问题 更多 >