单数和复数与Pandas相匹配

ingredients=pd.Series(["vanilla extract","walnut","oat","egg","almond","strawberry"]) df=pd.DataFrame(["1 teaspoons vanilla extract","2 eggs","3 cups chopped walnuts","4 cups rolled oats","1 (10.75 ounce) can Campbell's Condensed Cream of Chicken with Herbs Soup","6 ounces smoke-flavored almonds, finely chopped","sdfgsfgsf","fsfgsgsfgfg","2 small strawberries"])

print df val existence 0 1 teaspoons vanilla extract vanilla extract 1 2 eggs egg 2 3 cups chopped walnuts walnut 3 4 cups rolled oats oat 4 1 (10.75 ounce) can Campbell's Condensed Cream... NaN 5 6 ounces smoke-flavored almonds, finely chopped almond 6 sdfgsfgsf NaN 7 fsfgsgsfgfg NaN 8 2 small strawberries NaN

#ingredients #inputwords #outputword vanilla extract vanilla extract walnut walnut walnuts walnut oat oat oats oat egg egg eggs egg almond almond almonds almond strawberry strawberry strawberries strawberry cherry cherry cherries cherry

val existence 0 1 teaspoons vanilla extract vanilla extract 1 2 eggs egg 2 3 cups chopped walnuts walnut 3 4 cups rolled oats oat 4 1 (10.75 ounce) can Campbell's Condensed Cream... NaN 5 6 ounces smoke-flavored almonds, finely chopped almond 6 sdfgsfgsf NaN 7 fsfgsgsfgfg NaN 8 2 small strawberries strawberry

2条回答

网友

1楼 · 编辑于 2024-04-26 04:46:23

# your data frame
df = pd.DataFrame(data = ["1 teaspoons vanilla extract","2 eggs","3 cups chopped walnuts","4 cups rolled oats","1 (10.75 ounce) can Campbell's Condensed Cream of Chicken with Herbs Soup","6 ounces smoke-flavored almonds, finely chopped","sdfgsfgsf","fsfgsgsfgfg","2 small strawberries"])

# Here you create mapping
mapping = pd.Series(index = ['vanilla extract' , 'walnut','walnuts','oat','oats','egg','eggs','almond','almonds','strawberry','strawberries','cherry','cherries'] , 
          data = ['vanilla extract' , 'walnut','walnut','oat','oat','egg','egg','almond','almond','strawberry','strawberry','cherry','cherry'])
# create a function that checks if the value you're looking for exist in specific phrase or not
def get_match(df):
    match = np.nan
    for key , value in mapping.iterkv():
        if key in df[0]:
            match = value
    return match
# apply this function on each row
df.apply(get_match, axis = 1)

网友

2楼 · 编辑于 2024-04-26 04:46:23

考虑使用词干分析器：） http://www.nltk.org/howto/stem.html

直接从他们的页面上取下：

    from nltk.stem.snowball import SnowballStemmer
    stemmer = SnowballStemmer("english")
    stemmer2 = SnowballStemmer("english", ignore_stopwords=True)
    >>> print(stemmer.stem("having"))
    have
    >>> print(stemmer2.stem("having"))
    having

在与成分列表匹配之前，重构代码，使句子中的所有单词都有词干。在

nltk是一个非常棒的工具，可以满足您的要求！在

干杯

相关问题更多 >

编程相关推荐

热门问题

热门文章