Python glob 中的花括号扩展

21 投票

6 回答

10190 浏览

提问于 2025-04-18 02:15

我在使用 Python 2.7，想要执行以下命令：

glob('{faint,bright*}/{science,calib}/chip?/')

但是我没有得到任何匹配的结果。不过在命令行中，输入 echo {faint,bright*}/{science,calib}/chip? 却得到了：

faint/science/chip1 faint/science/chip2 faint/calib/chip1 faint/calib/chip2 bright1/science/chip1 bright1/science/chip2 bright1w/science/chip1 bright1w/science/chip2 bright2/science/chip1 bright2/science/chip2 bright2w/science/chip1 bright2w/science/chip2 bright1/calib/chip1 bright1/calib/chip2 bright1w/calib/chip1 bright1w/calib/chip2 bright2/calib/chip1 bright2/calib/chip2 bright2w/calib/chip1 bright2w/calib/chip2

我的表达式哪里出错了呢？

正则表达式命令行 glob 花括号扩展

6 个回答

正如其他回答所说，花括号展开是一种预处理步骤，用于处理通配符：你先把所有的花括号展开，然后对每个结果进行通配符匹配。（花括号展开就是把一个字符串变成一个字符串列表。）

Orwellophile 推荐使用 braceexpand 库。但我觉得这个问题太小，不值得引入一个依赖库（虽然这是一个常见的问题，理想情况下应该放在标准库里，最好是打包在通配符模块中）。

所以，这里有一种用几行代码来解决这个问题的方法。

import itertools
import re

def expand_braces(text, seen=None):
    if seen is None:
        seen = set()

    spans = [m.span() for m in re.finditer("\{[^\{\}]*\}", text)][::-1]
    alts = [text[start + 1 : stop - 1].split(",") for start, stop in spans]

    if len(spans) == 0:
        if text not in seen:
            yield text
        seen.add(text)

    else:
        for combo in itertools.product(*alts):
            replaced = list(text)
            for (start, stop), replacement in zip(spans, combo):
                replaced[start:stop] = replacement

            yield from expand_braces("".join(replaced), seen)

### testing

text_to_expand = "{{pine,}apples,oranges} are {tasty,disgusting} to m{}e }{"

for result in expand_braces(text_to_expand):
    print(result)

输出结果是

pineapples are tasty to me }{
oranges are tasty to me }{
apples are tasty to me }{
pineapples are disgusting to me }{
oranges are disgusting to me }{
apples are disgusting to me }{

这里发生的事情是：

嵌套的括号可能会产生重复的结果，所以我们用 seen 来确保只返回那些还没有出现过的结果。
spans 是文本中所有最内层、平衡的括号的起始和结束索引。通过 [::-1] 切片将顺序反转，使得索引从高到低（这在后面会用到）。
alts 的每个元素都是对应的以逗号分隔的替代选项列表。
如果没有匹配项（文本中没有平衡的括号），就返回 text 本身，确保它是唯一的，使用 seen 来检查。
否则，使用 itertools.product 来遍历以逗号分隔的替代选项的笛卡尔积。
用当前的替代选项替换花括号中的文本。因为我们是在原地替换数据，所以必须使用可变序列（list，而不是 str），并且要先替换高索引。如果先替换低索引，后面的索引就会因为 spans 中的变化而改变。这就是我们在创建 spans 时要反转它的原因。
文本中可能会有花括号嵌套在花括号里。正则表达式只找到不包含其他花括号的平衡花括号，但嵌套的花括号是合法的。因此，我们需要递归处理，直到没有嵌套的花括号（即 len(spans) == 0 的情况）。在 Python 生成器中，递归使用 yield from 来重新返回每个递归调用的结果。

在输出中，{{pine,}apples,oranges} 首先被展开为 {pineapples,oranges} 和 {apples,oranges}，然后这两个结果再各自展开。如果不使用 seen 来请求唯一结果，oranges 的结果会出现两次。

像 m{}e 这样的空括号展开为无，因此结果就是 me。

不平衡的括号，比如 }{，保持不变。

如果需要处理大数据集时的高性能，这个算法就不太适用了，但对于合理大小的数据，它是一个通用的解决方案。

回答于 2025-04-18 由 Python大师

分享举报

正如那位朋友提到的，Python并不直接支持大括号扩展。不过，因为大括号扩展是在通配符被处理之前完成的，所以你可以自己来实现这个功能，比如：

result = glob('{faint,bright*}/{science,calib}/chip?/')

就变成了

result = [
    f 
    for b in ['faint', 'bright*'] 
    for s in ['science', 'calib'] 
    for f in glob('{b}/{s}/chip?/'.format(b=b, s=s))
]

回答于 2025-04-18 由 Python大师

分享举报

因为在Python中，glob()不支持使用{}这种写法，所以你可能想要的是类似下面这样的东西：

import os
import re

...

match_dir = re.compile('(faint|bright.*)/(science|calib)(/chip)?')
for dirpath, dirnames, filenames in os.walk("/your/top/dir")
    if match_dir.search(dirpath):
        do_whatever_with_files(dirpath, files)
        # OR
        do_whatever_with_subdirs(dirpath, dirnames)

回答于 2025-04-18 由 Python大师

分享举报

{..} 被称为大括号扩展，这是一种在进行通配符匹配之前的额外步骤。

它不是通配符的一部分，也不被 Python 的通配符函数支持。

回答于 2025-04-18 由 Python大师

分享举报

将通配符和大括号扩展结合起来。

pip install braceexpand

示例：

from glob import glob
from braceexpand import braceexpand

def braced_glob(path):
    l = []
    for x in braceexpand(path):
        l.extend(glob(x))
            
    return l

>>> braced_glob('/usr/bin/{x,z}*k')  
['/usr/bin/xclock', '/usr/bin/zipcloak']

回答于 2025-04-18 由 Python大师

分享举报

Python glob 中的花括号扩展

6 个回答

撰写回答