需要帮助创建GAE数据存储加载器类吗?

1 投票
2 回答
891 浏览
提问于 2025-04-16 00:37

需要帮助创建一个 GAE 数据存储加载器类,用来通过 appcfg.py 上传数据吗?有没有其他更简单的方法来完成这个过程?有没有比这里更详细的例子?

当我尝试使用 bulkloader.yaml 时:

Uploading data records.
[INFO    ] Logging to bulkloader-log-20100701.041515
[INFO    ] Throttling transfers:
[INFO    ] Bandwidth: 250000 bytes/second
[INFO    ] HTTP connections: 8/second
[INFO    ] Entities inserted/fetched/modified: 20/second
[INFO    ] Batch Size: 10
[INFO    ] Opening database: bulkloader-progress-20100701.041515.sql3
[INFO    ] Connecting to livelihoodproducer.appspot.com/remote_api
[INFO    ] Starting import; maximum 10 entities per post
[ERROR   ] [Thread-1] WorkerThread:
Traceback (most recent call last):
  File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/tools/adaptive_thread_pool.py", line 150, in WorkOnItems
    status, instruction = item.PerformWork(self.__thread_pool)
  File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/tools/bulkloader.py", line 693, in PerformWork
    transfer_time = self._TransferItem(thread_pool)
  File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/tools/bulkloader.py", line 848, in _TransferItem
    self.content = self.request_manager.EncodeContent(self.rows)
  File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/tools/bulkloader.py", line 1269, in EncodeContent
    entity = loader.create_entity(values, key_name=key, parent=parent)
  File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/ext/bulkload/bulkloader_config.py", line 385, in create_entity
    return self.dict_to_entity(input_dict, self.bulkload_state)
  File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/ext/bulkload/bulkloader_config.py", line 133, in dict_to_entity
    self.__run_import_transforms(input_dict, instance, bulkload_state_copy)
  File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/ext/bulkload/bulkloader_config.py", line 233, in __run_import_transforms
    value = self.__dict_to_prop(transform, input_dict, bulkload_state)
  File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/ext/bulkload/bulkloader_config.py", line 188, in __dict_to_prop
    value = transform.import_transform(value)
  File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/ext/bulkload/bulkloader_parser.py", line 93, in __call__
    return self.method(*args, **kwargs)
  File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/ext/bulkload/transform.py", line 143, in generate_foreign_key_lambda
    return datastore.Key.from_path(kind, value)
  File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/api/datastore_types.py", line 387, in from_path
    'received %r (a %s).' % (i + 2, id_or_name, typename(id_or_name)))
BadArgumentError: Expected an integer id or string name as argument 2; received None (a NoneType).
[INFO    ] [Thread-3] Backing off due to errors: 1.0 seconds
[INFO    ] Unexpected thread death: Thread-1
[INFO    ] An error occurred. Shutting down...
[ERROR   ] Error in Thread-1: Expected an integer id or string name as argument 2; received None (a NoneType).

[INFO    ] 30 entites total, 0 previously transferred
[INFO    ] 0 entities (733 bytes) transferred in 2.8 seconds
[INFO    ] Some entities not successfully transferred

在这个过程中,我手动下载了 csv 数据并插入到 appspot.com。当我尝试上传自己的 csv 数据时,列的顺序需要和从 appspot.com 下载的 csv 一模一样吗?空值该怎么处理呢?

2 个回答

0

看起来你有一些引用属性的值是None,也就是空值。这些空值在批量加载工具的帮助程序中处理得不太对劲。

3

我创建了一个名为 config.yaml 的文件,用来配置批量加载器,并且写了一个简单的辅助函数来处理 None 引用。我不明白为什么原来的辅助函数没有做到这一点。

这个辅助函数(文件 helpers.py)非常简单,只需把它放在和 config.yaml 同一个文件夹里就可以了:

from google.appengine.api import datastore
def create_foreign_key(kind, key_is_id=False):
  def generate_foreign_key_lambda(value):
    if value is None:
      return None

    if key_is_id:
      value = int(value)
    return datastore.Key.from_path(kind, value)

  return generate_foreign_key_lambda

这是我 config.yaml 文件的一部分:

python_preamble:
- import: helpers # this will import our helper
[other imports]
...
- kind: ArticleComment
  connector: simplexml
  connector_options:
    xpath_to_nodes: "/blog/Comments/Comment"
    style: element_centric

  property_map:
    - property: __key__
      external_name: key
      export_transform: transform.key_id_or_name_as_string

    - property: parent_comment
      external_name: parent-comment
      export_transform: transform.key_id_or_name_as_string
      import_transform: helpers.create_foreign_key('ArticleComment')
      #                 ^^^^^^^ here it is
      #                 use this instead of transform.create_foreign_key

撰写回答