如何使用Python获取Google电子表格的工作表标识?

2024-05-14 12:38:40 发布

您现在位置:Python中文网/ 问答频道 /正文

我想确定一种方法来获取Google电子表格工作簿中每个工作表的URL内的工作表ID。例如,this workbook的“sheet2”的工作表id是“1244369280”,因为它的url是https://docs.google.com/spreadsheets/d/1yd8qTYjRns4_OT8PbsZzH0zajvzguKS79dq6j--hnTs/edit#gid=1244369280

我发现的一种方法是提取Google电子表格的XML,因为根据this question,获得工作表ID的唯一方法是将工作表的XML流化,但是这个例子是用Javascript编写的,我需要用Python来完成

这是我想用Python执行的Javascript代码:

  Dim worksheetFeed As WorksheetFeed
  Dim query As WorksheetQuery
  Dim worksheet As WorksheetEntry
  Dim output As New MemoryStream
  Dim xml As String
  Dim gid As String = String.Empty

  Try
    _service = New Spreadsheets.SpreadsheetsService("ServiceName")
    _service.setUserCredentials(UserId, Password)
    query = New WorksheetQuery(feedUrl)
    worksheetFeed = _service.Query(query)
    worksheet = worksheetFeed.Entries(0)

    ' Save worksheet feed to memory stream so we can 
    ' get the xml returned from the feed url and look for
    ' the gid.  Gid allows us to download the specific worksheet tab
    Using output
      worksheet.SaveToXml(output)
    End Using

    xml = Encoding.ASCII.GetString(output.ToArray())

从Google电子表格中获取XML的最佳方法似乎是使用Gdata,所以我下载了Gdata并用我的凭证尝试了the Google Spreadsheet example。在

见下文

^{pr2}$

但是,这将返回以下错误:

Traceback (most recent call last):
  File "/Users/Chris/Desktop/gdata_test.py", line 200, in <module>
    main()
  File "/Users/Chris/Desktop/gdata_test.py", line 196, in main
    sample.Run()
  File "/Users/Chris/Desktop/gdata_test.py", line 162, in Run
    self._PromptForSpreadsheet()
  File "/Users/Chris/Desktop/gdata_test.py", line 49, in _PromptForSpreadsheet
    feed = self.gd_client.GetSpreadsheetsFeed()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/gdata/spreadsheet/service.py", line 99, in GetSpreadsheetsFeed
    converter=gdata.spreadsheet.SpreadsheetsSpreadsheetsFeedFromString)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/gdata/service.py", line 1074, in Get
    return converter(result_body)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/gdata/spreadsheet/__init__.py", line 395, in SpreadsheetsSpreadsheetsFeedFromString
    xml_string)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/atom/__init__.py", line 93, in optional_warn_function
    return f(*args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/atom/__init__.py", line 127, in CreateClassFromXMLString
    tree = ElementTree.fromstring(xml_string)
  File "<string>", line 125, in XML
cElementTree.ParseError: no element found: line 1, column 0
[Finished in 0.3s with exit code 1]
[shell_cmd: python -u "/Users/Chris/Desktop/gdata_test.py"]
[dir: /Users/Chris/Desktop]
[path: /usr/bin:/bin:/usr/sbin:/sbin]

我还应该提到,我一直在使用Gspread作为与Google电子表格交互的方法,但是当我运行下面的代码时,我得到了gid,但是我需要有工作表id

gc = gspread.authorize(credentials)
sh = gc.open_by_url('google_spreadsheet_url')
sh.get_id_fields() 
>> {'spreadsheet_id': '1BgCEn-3Nor7UxOEPwD-qv8qXe7CaveJBrn9_Lcpo4W4','worksheet_id': 'oqitk0d'}

Tags: 方法inpyidasgooglelinexml
2条回答

2017年1月

您可以使用新的google电子表格api v4。您可以看看使用APIv4的pygsheets库。在

import pygsheets

#authorize the pygsheets
gc = pygsheets.authorize()

#open the spreadsheet
sh = gc.open('my new ssheet')

# get the worksheet and its id    
print sh.worksheet_by_title("my test sheet").id

请参阅self.gd_client.ProgrammaticLogin()调用-这是导致主要问题的原因,因为它使用的是“ClientLogin”授权方法,该方法最初是不推荐使用的,后来是removed on April 20, 2015。在

实际上,我将研究更新鲜和更积极开发的^{}模块。在


下面是一个有点疯狂的示例,演示如何提取给定电子表格和工作表名称的实际“gid”值。请注意,您首先需要generate the JSON file with the OAuth credentials(我假设您已经这样做了)。在

代码(添加了有助于理解的注释):

import urlparse
import xml.etree.ElementTree as ET

import gspread
from oauth2client.service_account import ServiceAccountCredentials

SPREADSHEET_NAME = 'My Test Spreadsheet'
WORKSHEET_NAME = "Sheet2"

PATH_TO_JSON_KEYFILE = '/path/to/json/key/file.json'
NAMESPACES = {'ns0': 'http://www.w3.org/2005/Atom'}
SCOPES = ['https://spreadsheets.google.com/feeds']

# log in
credentials = ServiceAccountCredentials.from_json_keyfile_name(PATH_TO_JSON_KEYFILE, SCOPES)
gss_client = gspread.authorize(credentials)

# open spreadsheet
gss = gss_client.open(SPREADSHEET_NAME)

# extract the full feed url
root = gss._feed_entry
full_feed_url = next(elm.attrib["href"] for elm in root.findall("ns0:link", namespaces=NAMESPACES) if "full" in elm.attrib["href"])

# get the feed and extract the gid value for a given sheet name
response = gss_client.session.get(full_feed_url)
root = ET.fromstring(response.content)
sheet_entry = next(elm for elm in root.findall("ns0:entry", namespaces=NAMESPACES)
                   if elm.find("ns0:title", namespaces=NAMESPACES).text == WORKSHEET_NAME)
link = next(elm.attrib["href"] for elm in sheet_entry.findall("ns0:link", namespaces=NAMESPACES)
            if "gid=" in elm.attrib["href"])

# extract "gid" from URL
gid = urlparse.parse_qs(urlparse.urlparse(link).query)["gid"][0]
print(gid)

似乎还有一种方法可以将工作表ID转换为gid值,请参见:

相关问题 更多 >

    热门问题