看下一个单词

ID Testo 141 Vivo in una piccola città 22 Gli Stati Uniti sono una grande nazione 153 Il Regno Unito ha votato per uscire dall'Europa 64 Hugh Laurie ha interpretato Dr. House 12 Mi piace bere birra.

ID Testo Estratte 141 Vivo in una piccola città [] 22 Gli Stati Uniti sono una grande nazione [Gli Stati, Stati Uniti] 153 Il Regno Unito ha votato per uscire dall'Europa [Il Regno, Regno Unito] 64 Hugh Laurie ha interpretato Dr. House [Hugh Laurie, Dr House] 12 Mi piace bere birra. []

3条回答

网友

1楼 · 编辑于 2024-05-26 16:28:15

import re
import pandas as pd

x = {141 : 'Vivo in una piccola città',  22: 'Gli Stati Uniti sono una grande nazione', 
      153 : 'Il Regno Unito ha votato per uscire dall\'Europa',  64 : 'Hugh Laurie ha interpretato Dr. House',  12 :'Mi piace bere birra.'}

df = pd.DataFrame(x.items(), columns = ['id', 'testo'])

caps = []
vals = df.testo

for string in vals:
    string = string.split(' ')
    string = string[1:]
    string = ' '.join(string)
    caps.append(re.findall('([A-Z][a-z]+)', string))

df['Estratte'] = caps```

网友

2楼 · 编辑于 2024-05-26 16:28:15

也许你可以用我下面的代码

def getCapitalize(myStr):
    words = myStr.split()
    for i in range(0, len(words) - 1):
        if (words[i][0].isupper() and words[i+1][0].isupper()):
            yield f"{words[i]} {words[i+1]}"

此函数将创建一个生成器，您必须转换为列表或wtv

网友

3楼 · 编辑于 2024-05-26 16:28:15

有时候正则表达式并不总是好的，让我们试试split和explode

s=df.Testo.str.split(' ').explode()
s2=s.groupby(level=0).shift(-1)
assign=(s + ' ' + s2)[s.str.istitle() & s2.str.isttimeitle()].groupby(level=0).agg(list)
Out[244]: 
1    [Gli Stati, Stati Uniti]
2     [Il Regno, Regno Unito]
3    [Hugh Laurie, Dr. House]
Name: Testo, dtype: object
df['New']=assign
# notice after assign the not find row will be assign as NaN

相关问题更多 >

编程相关推荐

热门问题

热门文章