App Engine memcache / ndb.get_multi 性能问题

Question

我在使用 ndb.get_multi() 从 Memcache 获取多个键时，发现性能非常差，在 App Engine 上表现得尤为明显。

我大约要获取 500 个小对象，这些对象都在 memcache 中。如果我用 ndb.get_multi(keys) 来获取，通常需要 1500 毫秒以上。下面是 App Stats 的典型输出：

App Stats 还有 RPC Stats

从图中可以看到，所有数据都是从 memcache 提供的。大部分时间被报告为不在 RPC 调用中。然而，我的代码尽可能简单，所以如果时间花在 CPU 上，那一定是在 ndb 内部的某个地方：

# Get set of keys for items. This runs very quickly.
item_keys = memcache.get(items_memcache_key)
# Get ~500 small items from memcache. This is very slow (~1500ms).
items = ndb.get_multi(item_keys)

在 App Stats 中看到的第一个 memcache.get 是获取一组键的单次请求。第二个 memcache.get 是 ndb.get_multi 的调用。

我获取的对象非常简单：

class Item(ndb.Model):
    name = ndb.StringProperty(indexed=False)
    image_url = ndb.StringProperty(indexed=False)
    image_width = ndb.IntegerProperty(indexed=False)
    image_height = ndb.IntegerProperty(indexed=False)

这是不是某种已知的 ndb 性能问题？是和反序列化的开销有关吗？还是说这是 memcache 的问题？

我发现，如果不获取 500 个对象，而是把所有数据聚合成一个单一的块，我的函数运行时间就能缩短到 20 毫秒，而不是超过 1500 毫秒：

# Get set of keys for items. This runs very quickly.
item_keys = memcache.get(items_memcache_key)
# Get individual item data.
# If we get all the data from memcache as a single blob it is very fast (~20ms).
item_data = memcache.get(items_data_key)
if not item_data:
    items = ndb.get_multi(item_keys)
    flat_data = json.dumps([{'name': item.name} for item in items])
    memcache.add(items_data_key, flat_data)

这很有意思，但对我来说并不是解决方案，因为我需要获取的项目集合并不是静态的。

我看到的性能是正常的吗？所有这些测量都是在默认的 App Engine 生产配置下进行的（F1 实例，共享 memcache）。这是反序列化的开销吗？还是因为从 memcache 获取多个键的原因？我不认为问题出在实例启动时间上。我逐行分析了代码，使用 time.clock() 进行计时，看到的数字大致相似（比 AppStats 中看到的快 3 倍，但仍然很慢）。这是一个典型的性能分析：

# Fetch keys: 20 ms
# ndb.get_multi: 500 ms
# Number of keys is 521, fetch time per key is 0.96 ms

更新：出于好奇，我还在将所有 App Engine 性能设置提高到最大（F4 实例，2400Mhz，专用 memcache）后进行了性能分析。性能并没有太大改善。在更快的实例上，App Stats 的时间现在与我的 time.clock() 性能分析相匹配（所以获取 500 个小对象的时间是 500 毫秒，而不是 1500 毫秒）。不过，这似乎仍然非常慢。

数据聚合性能分析 app engine 反序列化 memcache 实例配置 rpc 调用 ndb

App Engine memcache / ndb.get_multi 性能问题

1 个回答

撰写回答