我正试图从存储在CSV文件中的文章中提取所有单词,并将句子id号和包含的单词写入一个新的CSV文件。你知道吗
我已经试过了
import pandas as pd
from nltk.tokenize import sent_tokenize, word_tokenize
df = pd.read_csv(r"D:\data.csv", nrows=10)
row = 0; sentNo = 0
while( row < 1 ):
sentences = tokenizer.tokenize(df['articles'][row])
for index, sents in enumerate(sentences):
sentNo += 1
words = word_tokenize(sents)
print(f'{sentNo}: {words}')
row += 1
df['articles'][0]
包含:
The ultimate productivity hack is saying no. Not doing something will always be faster than doing it. This statement reminds me of the old computer programming saying, “Remember that there is no code faster than no code.”
我只取df['articles'][0]
,它给出如下输出:
1:['The', 'ultimate', 'productivity', 'hack', 'is', 'saying', 'no', '.']
2:['Not', 'doing', 'something', 'will', 'always', 'be', 'faster', 'than', 'doing', 'it', '.']
3:['This', 'statement', 'reminds', 'me', 'of', 'the', 'old', 'computer', 'programming', 'saying', ',', '“', 'Remember', 'that', 'there', 'is', 'no', 'code', 'faster', 'than', 'no', 'code', '.', '”']
如何以给定格式编写一个新的output.csv
文件,其中包含data.csv
文件中所有文章的所有句子:
Sentence No | Word
1 The
ultimate
productivity
hack
is
saying
no
.
2 Not
doing
something
will
always
be
faster
than
doing
it
.
3 This
statement
reminds
me
of
the
old
computer
programming
saying
,
“
Remember
that
there
is
no
code
faster
than
no
code
.
”
我是Python新手,在Jupyter笔记本上使用它。你知道吗
这是我第一篇关于堆栈溢出的文章。如果有什么不对劲的地方,纠正我学。非常感谢。你知道吗
只需要重复单词并为每个单词写一行新行。你知道吗
这将是一个有点不可预测的,因为你有逗号作为“词”,以及-可能需要考虑另一个分隔符或删除逗号从您的单词列表。你知道吗
编辑:这似乎是一个更干净的方法。你知道吗
你知道吗输出.csv地址:
相关问题 更多 >
编程相关推荐