从Google文档中提取文本和评论的Python方法

-1 投票

1 回答

37 浏览

提问于 2025-04-14 18:34

我需要帮助，从我的一个谷歌文档中提取评论。简单来说，我想获取被评论的文本和评论框里的内容。比如说，如果我在“Hello World”这句话上评论了“这不太合适”，那么我想同时得到这两段文字。如果不能同时获取这两段文字，我更需要评论框里的内容。目前我写的代码是：

def read_comments(comments):
    comment_text = ''
    for comment in comments:
        comment_text += comment['content']
    return comment_text

def main():
    credentials = get_credentials()
    http = credentials.authorize(Http())
    docs_service = discovery.build(
        'docs', 'v1', http=http, discoveryServiceUrl=DISCOVERY_DOC)
    
    doc = docs_service.documents().get(documentId=DOCUMENT_ID_2).execute()
    doc_content = doc.get('body').get('content')

    comments = docs_service.documents().get(documentId=DOCUMENT_ID_2).execute().get('comments', [])
    comments_text = read_comments(comments)

    print(comments_text)

    sentences = sent_tokenize(comments_text)
    for sentence in sentences:
        sentence = "{This is a PB}" + sentence + "{This is a PB}"
        print(sentence)

if __name__ == '__main__':
    main()

运行这个代码时没有报错，但也没有返回任何结果，列表是空的。

数据处理自动化脚本 api调用文本提取评论提取 google docs

1 个回答

你需要使用Google文档的API来获取一个Google文档文件的评论。这是因为评论并不是文档内容的一部分，而是与文档相关的额外信息。下面是一个修改过的脚本，它使用Google文档API来获取评论的内容和引用的文件内容：

def main():
    credentials = get_credentials()
    http = credentials.authorize(Http())
    gdrive_service = discovery.build(
        "drive", "v3", http=http, discoveryServiceUrl=DISCOVERY_DOC
    )
    
    results = service.comments().list(fileId=file_id, fields='*').execute()
    comments = results.get("comments", [])

    # Now, each item in `comments` is a dictionary, with the following fields:
    # 'content', 'quotedFileContent', 'replies', 'author', 'deleted', 'htmlContent', ...
    # The 'content' field contains the comment text
    # The 'quotedFileContent' field contains the text that was commented on

    comments_text = read_comments(comments)

    # Rest of the code
    ...

请注意，你的项目必须启用Google Drive API，并且文档需要与服务账户的邮箱地址共享。

回答于 2025-04-14 由 Python大师

分享举报

从Google文档中提取文本和评论的Python方法

1 个回答

撰写回答