Azure表存储多行查询性能

2024-03-28 22:49:15 发布

您现在位置:Python中文网/ 问答频道 /正文

我们在使用Azure表存储的服务中遇到了问题,有时查询需要几秒钟(3到30秒)。这种情况每天都会发生,但只针对某些查询。我们对服务和表存储没有太大的负载(每小时大约有数百个调用)。但表存储仍然没有执行。在

慢查询都在执行过滤器查询,最多应该返回10行。我对过滤器进行了结构化,以便在or运算符之后始终有一对分区键和行键,然后是下一对分区键和行键:

(partitionKey1 and RowKey1) or (partitionKey2 and rowKey2) or (partitionKey3 and rowKey3)

所以目前我的前提是我需要将查询拆分为单独的查询。我用python脚本验证了这一点。在这里,当我重复同一个查询作为单个查询(带有或的组合查询,并期望得到多行结果)或拆分为多个在不同的踏板上执行的查询时,我看到组合查询时不时地变慢。在

^{pr2}$

下面是python代码的输出示例:

Time elapsed with multi threaded implementation: 0.10776209831237793
Time elapsed with single query: 0.2323908805847168
table item count since x: 5
Both queries returned same amount of results
Press enter to retry test!
Time elapsed with multi threaded implementation: 0.0897986888885498
Time elapsed with single query: 0.21547174453735352
table item count since x: 5
Both queries returned same amount of results
Press enter to retry test!
Time elapsed with multi threaded implementation: 0.08280491828918457
Time elapsed with single query: 3.2932426929473877
table item count since x: 5
Both queries returned same amount of results
Press enter to retry test!
Time elapsed with multi threaded implementation: 0.07794523239135742
Time elapsed with single query: 1.4898555278778076
table item count since x: 5
Both queries returned same amount of results
Press enter to retry test!
Time elapsed with multi threaded implementation: 0.07962584495544434
Time elapsed with single query: 0.20011520385742188
table item count since x: 5
Both queries returned same amount of results
Press enter to retry test!

我们有问题的服务是用C实现的,我还没有在C端复制python脚本得到的结果。与使用单个过滤器查询(返回所有必需的行)相比,将查询拆分为多个单独的查询时,我的性能似乎更差。在

因此,多次执行以下操作并等待所有操作完成似乎比较慢:

TableOperation getOperation =
                TableOperation.Retrieve<HqrScreenshotItemTableEntity>(partitionKey, id.ToString());
            TableResult result = await table.ExecuteAsync(getOperation);

而不是在单个查询中执行所有操作:

        private IEnumerable<MyTableEntity> GetBatchedItemsTableResult(Guid[] ids, string applicationLink)
        {
            var table = InitializeTableStorage();

            TableQuery<MyTableEntity> itemsQuery= 
                new TableQuery<MyTableEntity>().Where(TableQueryConstructor(ids, applicationLink));

            IEnumerable<MyTableEntity> result = table.ExecuteQuery(itemsQuery);

            return result;
        }

        public string TableQueryConstructor(Guid[] ids, string applicationLink)
        {
            var fullQuery = new StringBuilder();

            foreach (var id in ids)
            {
                    // Encode link before setting to partition key as REST GET requests 
                    // do not accept non encoded URL params by default)
                    partitionKey = HttpUtility.UrlEncode(applicationLink);


                // Create query for single row in a requested partition
                string queryForRow = TableQuery.CombineFilters(
                    TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, partitionKey),
                    TableOperators.And,
                    TableQuery.GenerateFilterCondition("RowKey", QueryComparisons.Equal, id.ToString()));

                if (fullQuery.Length == 0)
                {
                    // append query for first row

                    fullQuery.Append(queryForRow);
                }
                else
                {
                    // Append query for subsequent rows with or operator to make queries independent of each other.

                    fullQuery.Append($" {TableOperators.Or} ");
                    fullQuery.Append(queryForRow);
                }
            }

            return fullQuery.ToString();
        }

与C代码一起使用的测试用例与python测试非常不同。在C中,我从大约100000行的数据中查询2000行。如果以50行为一批查询数据,后一个过滤器查询会比在50个任务中运行的单行查询好。在

也许我应该重复一下在C中用python作为控制台应用程序所做的测试,看看.Net客户端api的行为是否与python性能相同。在


Tags: oftotimecountwithtableitemquery
3条回答

张贴作为答复,因为它越来越大的评论。在

您是否可以尝试将查询更改为以下内容:

(PartitionKey eq 'http%3a%2f%2fsome_website.azurewebsites.net%2fApiName_ed6d31b0' and RowKey eq 'ed6d31b0-d2a3-4f18-9d16-7f72cbc88cb3') or (PartitionKey eq 'http%3a%2f%2fsome_website.azurewebsites.net%2fApiName_9be86f34' and RowKey eq '9be86f34-865b-4c0f-8ab0-decf928dc4fc') or (PartitionKey eq 'http%3a%2f%2fsome_website.azurewebsites.net%2fApiName_97af3bdc' and RowKey eq '97af3bdc-b827-4451-9cc4-a8e7c1190d17') or (PartitionKey eq 'http%3a%2f%2fsome_website.azurewebsites.net%2fApiName_9d557b56' and RowKey eq '9d557b56-279e-47fa-a104-c3ccbcc9b023') or (PartitionKey eq 'http%3a%2f%2fsome_website.azurewebsites.net%2fApiName_e251a31a' and RowKey eq 'e251a31a-1aaa-40a8-8cde-45134550235c')

我认为您应该使用多线程实现,因为它包含多个点查询。在单个查询中执行all-in-single查询可能会导致表扫描。正如official doc提到的:

Using an "or" to specify a filter based on RowKey values results in a partition scan and is not treated as a range query. Therefore, you should avoid queries that use filters such as: $filter=PartitionKey eq 'Sales' and (RowKey eq '121' or RowKey eq '322')

您可能认为上面的示例是两个点查询,但实际上它会导致一个分区扫描。在

对我来说,这里的答案似乎是,在表存储上执行查询并没有像您预期的那样优化为使用OR运算符。当查询将点查询与OR运算符组合时,它不作为点查询处理。在

这可以在python、C和azurestorageexplorer中重现,在这些浏览器中,如果您将点查询与OR结合起来,则会比只返回一行的单独点查询慢10倍(甚至更多)。在

因此,获得具有分区键和行键的行数的最有效方法是使用TableOperation.Retrieve(在C中)进行单独的异步查询。使用TableQuery是非常低效的,并且不会产生任何接近Azure表存储的性能可伸缩性目标的结果。可伸缩性目标例如:“单个表分区(1kib实体)的目标吞吐量高达每秒2000个实体”。在这里,我甚至不能以每秒5行的速度提供服务,尽管所有的行都在不同的分区中。在

在任何文档或性能优化指南中,查询性能的这一限制都没有明确说明,但可以从Azure storage performance checklist中的这些行中了解到:

Querying

This section describes proven practices for querying the table service.

Query scope

There are several ways to specify the range of entities to query. The following is a discussion of the uses of each.

In general, avoid scans (queries larger than a single entity), but if you must scan, try to organize your data so that your scans retrieve the data you need without scanning or returning significant amounts of entities you don't need.

Point queries

A point query retrieves exactly one entity. It does this by specifying both the partition key and row key of the entity to retrieve. These queries are efficient, and you should use them wherever possible.

Partition queries

A partition query is a query that retrieves a set of data that shares a common partition key. Typically, the query specifies a range of row key values or a range of values for some entity property in addition to a partition key. These are less efficient than point queries, and should be used sparingly.

Table queries

A table query is a query that retrieves a set of entities that does not share a common partition key. These queries are not efficient and you should avoid them if possible.

所以“一个点查询只检索一个实体”和“尽可能使用点查询”。因为我已经将数据拆分为分区,所以它可能被当作表查询来处理:“表查询是检索不共享公共分区键的一组实体的查询”。尽管查询组合了一组点查询,因为它列出了所有预期实体的分区键和行键。但是,由于组合查询不是只检索一个查询,所以不能期望它作为点查询(或点查询集)执行。在

相关问题 更多 >