从yaml中提取不同的值

2024-04-26 00:07:43 发布

您现在位置:Python中文网/ 问答频道 /正文

我需要使用python脚本从下面的文件中提取--my_dataset--my_table--s3_temp_path

我的档案:

▶ cat my_datasets/my_file.yml
__global__:
  role: myrole
  contact: sam@user.com

__default__:
  cc_policy: VERY_NEW
  act_num: 16384
  react_num: 16384
  with_start: 1
  where_to: my_file.log
  class: myClass
  my_arguements: >-
    -Dmy.num.1=4096
    -Dmy.num.2=true
    -Dmy.num.3=fgcd
  is_it: true
  if_not: false
  compure: dc1
  env: test
  my_compute: res-dc
  config: /my/file/config

first_adhoc:
  my_space: my_transfer
  doodle: my_transfer.tar.gz
  jar: my_transfer.jar
  my_dir: "dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11:my-deploy"
  my_arguments: >-
    m.big.class
    --sdrs
    --tz UTC
    --env test
    --my_dataset my_analytics
    --my_table onboarding_client_events
    --current_date 2020-09-22
    --my_project my_aws_project
    --s3_temp_path s3://test-wierd/
    --my_key_json dir1/dir2/dir3/dir4/keys.json
    --my_auth_file dir1/dir2/dir3/dir4/gcp/my_new.yml
    --my_proxy example.com:9999
    --write_mode write
    --update_option option1 option2

first_cron:
  my_space: my_transfer
  doodle: my_transfer.tar.gz
  jar: my_transfer.jar
  my_dir: "dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11:my-deploy"
  my_arguments: >-
    m.big.class
    --sdrs
    --tz UTC
    --env test
    --my_dataset my_analytics
    --my_table i_wish
    --current_date 2020-09-22
    --my_project my_aws_project
    --s3_temp_path s3://test-wierd/
    --my_key_json dir1/dir2/dir3/dir4/keys.json
    --my_auth_file dir1/dir2/dir3/dir4/gcp/my_new.yml
    --my_proxy example.com:9999
    --write_mode write
    --update_option option1 option2
  cron_schedule: "* * 4 * *"

我在base_路径中有很多像我上面提到的那样的文件,它们来自我需要获取的所有文件--my_dataset--my_table--s3_temp_path

下面是我到目前为止的时间。我可以用my_file.yaml递归地提取所有文件,但无法提取上述distinct

我的剧本:

import fnmatch
import os
import re
import yaml


user_path = os.path.expanduser('~')
source_path = user_path + "/where/are/"
base_path = source_path + "/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10"


def find(pattern, base_path):
    results = []
    for root, dirs, files in os.walk(base_path):
        for name in files:
            if fnmatch.fnmatch(name, pattern):
                results.append(os.path.join(root, name))

    for result in results:
        stream = open(result, 'r')
        dictionary = yaml.load(stream)
        for key, value in dictionary.items():
            res = dict((k, dictionary[k]) for k in ['my_dataset', 'my_table', 's3_temp_path' ] 
                                        if k in dictionary) 
            print (key + " : " + str(value))

print find('my_file.yml', base_path)

当前结果:

▶ python myWork.py
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
{}
None

预期成果:

{"my_dataset": "my_analytics", "my_table": "i_wish", "s3_temp_path": "s3://test-wierd/"}

Tags: pathintestbases3mytabletemp
2条回答

感谢您的回复,我可以通过编写下面的代码来解决这个问题

import os
import yaml
tbl = []
def getNames(pattern, base_path):
    rsls = []
    for root, dirs, files in os.walk(base_path):
        for name in files:
            if fnmatch.fnmatch(name, pattern):
                rsls.append(os.path.join(root, name))
    for rsl in rsls:
        stream = open(rsl, 'r')
        dictionary = yaml.load(stream)
        for key, value in dictionary.items():
            if 'my_arguments' in value:
                arg_val = value['my_arguments']
                tname_fractions = []
                for col in col_names:
                    col_val = arg_val.split(col)[1].strip().split()[0]
                    tname_fractions.append(col_val)
                tbl_nm = ','.join(tname_fractions)
                tbls.append(tbl_nm)
                utl = list(set(tbls))

    return utl

您的代码给人的印象是,您只是编写了一些随机行,而根本没有考虑您试图导航的结构

首先,让我们看看值的位置:它们包含在一个长折叠标量中。这将加载到Python中的字符串中,您不能通过值的名称直接查询这些值,因为YAML不理解这些值是cmdline参数。那么,让我们编写一个函数,从这个字符串中提取一个值:

def get_cmdline_arg(name, cmdline):
  found = False
  for item in cmdline.split():
    if found: return item
    # if the current item is the searched name, the next item will be
    # its value
    found = (item == name)
  return "<not found>"

现在,让我们看看如何从结构中获取my_arguments字符串。YAML在其根级别上有四个键:__global____default__first_adhocfirst_cron。您正在搜索的数据位于first_adhocfirst_cron中,因此让我们首先迭代这两个值(从代码中加载的dictionary值开始):

for k in ['first_adhoc', 'first_cron']:
  arguments = dictionary[k]['my_arguments']

现在我们有了my_arguments值,我们只需要获得参数值:

  res = {}
  for name in ['my_dataset', 'my_table', 's3_temp_path']:
    res[name] = get_cmdline_arg(' ' + name, arguments)
  print(k + ": " + str(res))

相关问题 更多 >