我有一个关于使用python进行字数计算的问题
数据框有三列。(id、文本、word)
首先,这是一个示例表
[数据帧]
df = pd.DataFrame({
"id":[
"100",
"200",
"300"
],
"text":[
"The best part of Zillow is you can search/view thousands of home within a click of a button without even stepping out of your door.At the comfort of your home you can get all the details such as the floor plan, tax history, neighborhood, mortgage calculator, school ratings etc. and also getting in touch with the contact realtor is just a click away and you are scheduled for the home tour!As a first time home buyer, this website greatly helped me to study the market before making the right choice.",
"I love all of the features of the Zillow app, especially the filtering options and the feature that allows you to save customized searches.",
"Data is not updated spontaneously. Listings are still shown as active while the Mls shows pending or closed."
],
"word":[
"[best, word, door, subway, rain]",
"[item, best, school, store, hospital]",
"[gym, mall, pool, playground]",
]
})
我已经把文本拆分成字典了
所以,我想把每行单词列表检查成文本
这就是我想要的结果
| id | word dict |
| -- | ----------------------------------------------- |
| 100| {best: 1, word: 0, door: 1, subway: 0 , rain: 0} |
| 200| {item: 0, best: 0, school: 0, store: 0, hospital: 0} |
| 300| {gym: 0, mall: 0, pool: 0, playground: 0} |
请检查这个问题
我们可以使用
re
提取list
中的所有单词。注意,这将只匹配列表中的单词,而不是数字然后应用一个函数,该函数返回一个带有列表中每个单词计数的
dict
。然后,我们可以将此函数应用于df
中的一个新列输出
由于word列的类型为string,请先将其转换为列表:
现在,您可以使用apply for
axis=1
和逻辑来计算每个单词:输出:
相关问题 更多 >
编程相关推荐