如何在DBT中使用Jinja循环嵌套结构？

2条回答

网友

1楼 · 编辑于 2024-05-29 04:16:19

不确定我是否100%遵循了您的数据结构，但假设它与此类似：

{
  "properties": {
    "property1": {
      "column1": "...",
      "column2": "...",
      "column3": "...",
      "value": "my value 1.0"
    },
    "property2": {
      "column1": "...",
      "column2": "...",
      "column3": "...",
      "value": "my value 2.0"
    },
    "propertyX": {
      "column1": "...",
      "column2": "...",
      "column3": "...",
      "value": "my value 3.0"
    }
  }
}

正如您所提到的，您需要使用set来创建变量并能够操作数据。就个人而言，我喜欢创建不同的变量来处理query statement、query result和query values。因此，遵循这一策略，您将获得如下结果：

{% set data_structure_query %}
    select properties from src
{% endset %}

{% set results = run_query(data_structure_query) %}

{% set properties = results.columns[0].values() %}

Note that results.columns[0].values() will bring the data of the first column of your query which in this case it's the properties.

.values()以元组的形式获取列的值，其中项大部分时间被定义为string。因此，为了访问数据的属性，必须将json字符串反序列化为Python对象，例如dict。为此，您需要使用fromjson方法：

...

{% set properties = results.columns[0].values() %}

{% set properties_dict = fromjson(properties[0]) %}

...

Assuming your query return only one row with the JSON format, I specified the properties[0] to access the first row of the result query.

在跳到下一步之前，重要的是要知道dbt有一个jinja变量，它会在dbt处于“执行模式”时通知我们。这是一件我们需要担心的事情，因为它可能会引发构建模型的问题。简而言之，任何依赖于从数据库返回的结果的jinja都将抛出一个错误

在您的例子中，results变量取决于需要在数据库中执行的值，这意味着如果您只是尝试运行模型，那么Compilation Error很可能会出现问题。为了避免这种情况，您需要添加一个if condition来检查dbt是否处于“执行模式”：

...

{% set results = run_query(data_structure_query) %}

{% if execute %}
    {% set properties = results.columns[0].values() %}
    {% set properties_dict = fromjson(properties[0]) %}
{% else %}
    {% set properties = [] %}
    {% set properties_dict = [] %}
{% endif %}

...

最后，您可以继续使用loop来构建列：

select
{%- for property in properties_dict.properties %}
    {{ property }}.value
    {%- if not loop.last %},{% endif -%}
{%- endfor %}
from 
...

这将被汇编为：

select
    property1.value,
    property2.value,
    propertyX.value
from
...

如果要访问每列的值，则：

select
{%- for property in properties_dict.properties %}
    '{{ properties_dict.properties[property].value }}'
    {%- if not loop.last %},{% endif -%}
{%- endfor %}
from
...

将汇编为：

select
    'my value',
    'my value 1.0',
    'my value 2.0'
from
...

可能值得一看您的数据库/仓库，并检查是否有任何处理半结构化数据的内部函数。这也可以帮助你理解逻辑。例如，Snowflake具有lateral flatten，它执行类似的行为将属性拆分为多行

出于调试目的，我建议compile您的模型并使用日志（{{ log('my message', info=True) }}）来了解dbt/jinja是如何处理数据的。根据查询的输出，我提供的一些代码可能会更改

一些有用的链接：

https://docs.getdbt.com/reference/dbt-jinja-functions/run_query

https://docs.getdbt.com/reference/dbt-jinja-functions/execute

https://docs.getdbt.com/reference/dbt-jinja-functions/fromjson/

https://docs.getdbt.com/tutorial/using-jinja

网友

2楼 · 编辑于 2024-05-29 04:16:19

假设您的数据结构如下所示：

{
  "properties": [
    {
      "value": "Value 1"
    },
    {
      "value": "Value 2"
    },
    ...
  ]
}

您只需要将.value查找移到变量分隔符中：{{ property.value }}

一些有用的链接：

相关问题更多 >

编程相关推荐

热门问题

热门文章