应用返回的随机子串

2024-06-09 15:04:05 发布

您现在位置:Python中文网/ 问答频道 /正文

apply函数返回随机子字符串而不是完整字符串

我已经试过了:

def extract_ticker(title):
    for word in title:
        word_str = word.encode('utf-8')
        if word_str in constituents['Symbol'].values:
            return word_str
sp500news3['tickers'] = sp500news3['title'].apply(extract_ticker)

它回来了

sp500news3['tickers'] 

79944        M
181781       M
213175       C
93554        C
257327       T

而不是预期的产出

79944        MSFT
181781       WMB
213175       CSX
93554        C
257327       TWX

从下面创建示例

constituents =  pd.DataFrame({"Symbol":["TWX","C","MSFT","WMB"]})

sp500news3 = pd.DataFrame({"title":["MSFT Vista corporate sales go very well","WMB No Anglican consensus on Episcopal Church","CSX quarterly profit rises",'C says 30 bln capital helps exceed target','TWX plans cable spinoff']})

Tags: 字符串intitleextractsymbolwordtickerapply
2条回答

为什么不改用正则表达式提取股票代码呢

tickers = ('TWX', 'C', 'MSFT', 'WMB')
regex = '({})'.format('|'.join(tickers))

sp500news3['tickers'] = sp500news3['title'].str.extract(regex)

^{}与单词bondaries和|的联接值一起使用:

pat = '|'.join(r"\b{}\b".format(x) for x in constituents['Symbol'])

sp500news3['tickers'] = sp500news3['title'].str.extract('('+ pat + ')', expand=False)
print (sp500news3)
                                           title tickers
0        MSFT Vista corporate sales go very well    MSFT
1  WMB No Anglican consensus on Episcopal Church     WMB
2                     CSX quarterly profit rises     NaN
3      C says 30 bln capital helps exceed target       C
4                        TWX plans cable spinoff     TWX

您的解决方案应该按空格使用split,也许encode也是必要的:

def extract_ticker(title):
    for word in title.split():
        word_str = word
        if word_str in constituents['Symbol'].values:
            return word_str

sp500news3['tickers'] = sp500news3['title'].apply(extract_ticker)
print (sp500news3)
                                           title tickers
0        MSFT Vista corporate sales go very well    MSFT
1  WMB No Anglican consensus on Episcopal Church     WMB
2                     CSX quarterly profit rises    None
3      C says 30 bln capital helps exceed target       C
4                        TWX plans cable spinoff     TWX

相关问题 更多 >