用正则表达式查找大写单词

2024-06-16 09:42:03 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个字符串如下所示:

df = '''
ACCP ACLL ADER ADERW AEAC AEACW AHI AIRTP AKO/A AKO/B ALIT AMHCU ANDAU APOPW AUGZ AUUD AUUDW 
AVDG AVDR AYTUP BBRX BCAC BCACU BCACW BCTX BCTXW BF/A BF/B BIO/B BRK/A BRK/B BRLIU BRPAU BWL/A 
CCZ CFCV CMCTP CMPX CNNB CNTX COMSW CPTAG CPTI CRD/A CRD/B CRTDW DDI DECZ DEFN DFH DRMT DSOC EAC 
EACPW: No data found, symbol may be delisted- ECC : No data found, symbol may be delisted- ECOM :
No data found, symbol may be delisted
'''

我需要用正则表达式从这个字符串中提取所有符号,结果如下:

result = 'ACCP ACLL ADER ADERW AEAC AEACW AHI AIRTP AKO/A AKO/B ALIT AMHCU ANDAU APOPW AUGZ AUUD AUUDW 
AVDG AVDR AYTUP BBRX BCAC BCACU BCACW BCTX BCTXW BF/A BF/B BIO/B BRK/A BRK/B BRLIU BRPAU BWL/A 
CCZ CFCV CMCTP CMPX CNNB CNTX COMSW CPTAG CPTI CRD/A CRD/B CRTDW DDI DECZ DEFN DFH DRMT DSOC EAC 
EACPW ECC ECOM'

我已经尝试过让所有单词都以两个大写字母开头:

"\b[A-Z]{2}\b"

还有这个:

"\b[A-Z]+[A-Z\/]+\b"

最后一个很好,但仅适用于字符串的初始单词,因此可能存在不考虑单词之间的空格的问题,无论如何,在这种情况下没有一个有效:

在这种情况下需要什么正则表达式模式


Tags: no字符串databesymbol单词maycrd
3条回答

诚然,您可能想改进它,例如使用set,但它似乎得到了所有的股票代码:

import string 

ticker = [
    word for word in df.split() if \
    all(char in string.ascii_uppercase + '/' for char in word)
]

另一种选择是使用第二种模式\b[A-Z]+[A-Z\/]+\bre.findall,然后将这些部分连接在一起

import re

df = '''
ACCP ACLL ADER ADERW AEAC AEACW AHI AIRTP AKO/A AKO/B ALIT AMHCU ANDAU APOPW AUGZ AUUD AUUDW 
AVDG AVDR AYTUP BBRX BCAC BCACU BCACW BCTX BCTXW BF/A BF/B BIO/B BRK/A BRK/B BRLIU BRPAU BWL/A 
CCZ CFCV CMCTP CMPX CNNB CNTX COMSW CPTAG CPTI CRD/A CRD/B CRTDW DDI DECZ DEFN DFH DRMT DSOC EAC 
EACPW: No data found, symbol may be delisted- ECC : No data found, symbol may be delisted- ECOM :
No data found, symbol may be delisted
'''

result = ' '.join(re.findall(r"\b[A-Z]+[A-Z\/]+\b", df))
print(result)

输出

ACCP ACLL ADER ADERW AEAC AEACW AHI AIRTP AKO/A AKO/B ALIT AMHCU ANDAU APOPW AUGZ AUUD AUUDW AVDG AVDR AYTUP BBRX BCAC BCACU BCACW BCTX BCTXW BF/A BF/B BIO/B BRK/A BRK/B BRLIU BRPAU BWL/A CCZ CFCV CMCTP CMPX CNNB CNTX COMSW CPTAG CPTI CRD/A CRD/B CRTDW DDI DECZ DEFN DFH DRMT DSOC EAC EACPW ECC ECOM

Python demo

你所需要的只是一个简单的列表理解

例如:

df = '''
ACCP ACLL ADER ADERW AEAC AEACW AHI AIRTP AKO/A AKO/B ALIT AMHCU ANDAU APOPW AUGZ AUUD AUUDW 
AVDG AVDR AYTUP BBRX BCAC BCACU BCACW BCTX BCTXW BF/A BF/B BIO/B BRK/A BRK/B BRLIU BRPAU BWL/A 
CCZ CFCV CMCTP CMPX CNNB CNTX COMSW CPTAG CPTI CRD/A CRD/B CRTDW DDI DECZ DEFN DFH DRMT DSOC EAC 
EACPW: No data found, symbol may be delisted- ECC : No data found, symbol may be delisted- ECOM :
No data found, symbol may be delisted
'''

print([w for w in df.split() if w.isupper() and len(w) > 2])

输出:

['ACCP', 'ACLL', 'ADER', 'ADERW', 'AEAC', 'AEACW', 'AHI', 'AIRTP', 'AKO/A', 'AKO/B', 'ALIT', 'AMHCU', 'ANDAU', 'APOPW', 'AUGZ', 'AUUD', 'AUUDW', 'AVDG', 'AVDR', 'AYTUP', 'BBRX', 'BCAC', 'BCACU', 'BCACW', 'BCTX', 'BCTXW', 'BF/A', 'BF/B', 'BIO/B', 'BRK/A', 'BRK/B', 'BRLIU', 'BRPAU', 'BWL/A', 'CCZ', 'CFCV', 'CMCTP', 'CMPX', 'CNNB', 'CNTX', 'COMSW', 'CPTAG', 'CPTI', 'CRD/A', 'CRD/B', 'CRTDW', 'DDI', 'DECZ', 'DEFN', 'DFH', 'DRMT', 'DSOC', 'EAC', 'EACPW:', 'ECC', 'ECOM']

相关问题 更多 >