从yaml中提取不同的值

▶ cat my_datasets/my_file.yml __global__: role: myrole contact: sam@user.com __default__: cc_policy: VERY_NEW act_num: 16384 react_num: 16384 with_start: 1 where_to: my_file.log class: myClass my_arguements: >- -Dmy.num.1=4096 -Dmy.num.2=true -Dmy.num.3=fgcd is_it: true if_not: false compure: dc1 env: test my_compute: res-dc config: /my/file/config first_adhoc: my_space: my_transfer doodle: my_transfer.tar.gz jar: my_transfer.jar my_dir: "dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11:my-deploy" my_arguments: >- m.big.class --sdrs --tz UTC --env test --my_dataset my_analytics --my_table onboarding_client_events --current_date 2020-09-22 --my_project my_aws_project --s3_temp_path s3://test-wierd/ --my_key_json dir1/dir2/dir3/dir4/keys.json --my_auth_file dir1/dir2/dir3/dir4/gcp/my_new.yml --my_proxy example.com:9999 --write_mode write --update_option option1 option2 first_cron: my_space: my_transfer doodle: my_transfer.tar.gz jar: my_transfer.jar my_dir: "dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11:my-deploy" my_arguments: >- m.big.class --sdrs --tz UTC --env test --my_dataset my_analytics --my_table i_wish --current_date 2020-09-22 --my_project my_aws_project --s3_temp_path s3://test-wierd/ --my_key_json dir1/dir2/dir3/dir4/keys.json --my_auth_file dir1/dir2/dir3/dir4/gcp/my_new.yml --my_proxy example.com:9999 --write_mode write --update_option option1 option2 cron_schedule: "* * 4 * *"

import fnmatch import os import re import yaml user_path = os.path.expanduser('~') source_path = user_path + "/where/are/" base_path = source_path + "/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10" def find(pattern, base_path): results = [] for root, dirs, files in os.walk(base_path): for name in files: if fnmatch.fnmatch(name, pattern): results.append(os.path.join(root, name)) for result in results: stream = open(result, 'r') dictionary = yaml.load(stream) for key, value in dictionary.items(): res = dict((k, dictionary[k]) for k in ['my_dataset', 'my_table', 's3_temp_path' ] if k in dictionary) print (key + " : " + str(value)) print find('my_file.yml', base_path)

▶ python myWork.py {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} {} None

2条回答

网友

1楼 · 编辑于 2024-04-26 00:07:43

感谢您的回复，我可以通过编写下面的代码来解决这个问题

import os
import yaml
tbl = []
def getNames(pattern, base_path):
    rsls = []
    for root, dirs, files in os.walk(base_path):
        for name in files:
            if fnmatch.fnmatch(name, pattern):
                rsls.append(os.path.join(root, name))
    for rsl in rsls:
        stream = open(rsl, 'r')
        dictionary = yaml.load(stream)
        for key, value in dictionary.items():
            if 'my_arguments' in value:
                arg_val = value['my_arguments']
                tname_fractions = []
                for col in col_names:
                    col_val = arg_val.split(col)[1].strip().split()[0]
                    tname_fractions.append(col_val)
                tbl_nm = ','.join(tname_fractions)
                tbls.append(tbl_nm)
                utl = list(set(tbls))

    return utl

网友

2楼 · 编辑于 2024-04-26 00:07:43

您的代码给人的印象是，您只是编写了一些随机行，而根本没有考虑您试图导航的结构

首先，让我们看看值的位置：它们包含在一个长折叠标量中。这将加载到Python中的字符串中，您不能通过值的名称直接查询这些值，因为YAML不理解这些值是cmdline参数。那么，让我们编写一个函数，从这个字符串中提取一个值：

def get_cmdline_arg(name, cmdline):
  found = False
  for item in cmdline.split():
    if found: return item
    # if the current item is the searched name, the next item will be
    # its value
    found = (item == name)
  return "<not found>"

现在，让我们看看如何从结构中获取my_arguments字符串。YAML在其根级别上有四个键：__global__、__default__、first_adhoc和first_cron。您正在搜索的数据位于first_adhoc和first_cron中，因此让我们首先迭代这两个值（从代码中加载的dictionary值开始）：

for k in ['first_adhoc', 'first_cron']:
  arguments = dictionary[k]['my_arguments']

现在我们有了my_arguments值，我们只需要获得参数值：

  res = {}
  for name in ['my_dataset', 'my_table', 's3_temp_path']:
    res[name] = get_cmdline_arg(' ' + name, arguments)
  print(k + ": " + str(res))

相关问题更多 >

编程相关推荐

热门问题

热门文章