'Python: Iterate through YAML file and output all relevant settings from hierarchy

I'm new to python and am looking for the best way to import a YAML file into Python and iterate through it to collect relevant appsettings for a specific instance of an app.

For example this YAML structure:

UAT:
    Configuration:
    #config relevant to all servers on UAT
    - appSettings:
        AWSAccessKey: ExampleKey
        AWSSecretKey: ExampleSecret
        AWSRegion: ExampleRegion
    Servers:
    - Server1:
        #config relevant to all apps on UAT>Server1
        Configuration:
        - appSettings:
            Key1: true
            Key2: '123'
        Apps:
        - Engine1:
            #config relevant to all Apps of type UAT>Server1>Engine1
            version: 1.2
            appSettings:
                Key3: 'abc'
                Key4: 'def'
                Key5: 'abc-123'
            Instance:
            - Instance1:
                path: 'examplepath'
                appSettings:
                    Key6: 'A1B1C1'
                    Key7: true
            - Instance2:
                appSettings:
                    Key6: 'A2B2C2'
                    Key7: false
            - Instance3:
                appSettings:
                    Key6: 'A3B3C3'
                    Key7: true
        - Engine2:
            version: 'example'
            appSettings: 'example'          
        - Engine3:
            path: 'example'
            version: 'example'
            appSettings:  
    - Server2:
        Configuration:
        - AppSettings:
            Apps:
            - App1:
                Instance:
                - Instance1: 'example'
                - Instance2: 'example'
    - Server3: 'example'

I would like to be able to digest this and for example get all the relevant appsettings for Instance3 of Engine1 on Server1 on UAT. The expected output for UAT>Server1>Engine1>Instance3 would be:

        AWSAccessKey: ExampleKey
        AWSSecretKey: ExampleSecret
        AWSRegion: ExampleRegion
            Key1: true
            Key2: '123'
                Key3: 'abc'
                Key4: 'def'
                Key5: 'abc-123'
                    Key6: 'A3B3C3'
                    Key7: true

I'm not concerned with the formatting of the outcome I would just like to be able to spit out all the relevant key value pairs.

As a start I have imported the YAML file as a dictionary and been able to output all of the app settings but cannot work out how to specify the specific level I would like to stop at.

import sys
from pathlib import Path
import ruamel.yaml

in_file = Path('Example.yml')

yaml = ruamel.yaml.YAML()
data = yaml.load(in_file)

def lookup(sk, d, path=[]):
   # lookup the values for key(s) sk return as list the tuple (path to the value, value)
   if isinstance(d, dict):
       for k, v in d.items():
           if k == sk:
               yield (path + [k], v)
           for res in lookup(sk, v, path + [k]):
               yield res
   elif isinstance(d, list):
       for item in d:
           for res in lookup(sk, item, path + [item]):
               yield res

for path, value in lookup("appSettings", data):
   print(value)

Looking further into this I am not sure if a matrix/array would be better than a dictionary as this maintains the order? Any help with this would be MUCH appreciated



Solution 1:[1]

I don't think there is a particular YAML format that would work better, this often has to do with what you want to be together. Given a particular structure that has some regularity (like yours) you should be able to get out of it what you want.

However you don't load this YAML into a dictionary. You load this, and the toplevel mapping is created as a Python dictionary.

In order to reach your target Instance3, you would use:

data['UAT']['Servers'][0]['Server1']['Apps'][0]['Engine1']['Instance'][2]['Instance3']

You simply leave out all the list/sequence indexes by changing your lookup, but you cannot easily ignore intermediate keys when matching the path UAT>Server1>Engine1>Instance3 ( it would be easier to look for UAT>Servers>Server1>Apps>Engine1>Instance>Instance3 instead). But you seem to want to skip adding a key to your path if the value for that key is a sequence/list.

Once you do that you can match your "current path" to your path to match, of course only for the length you have gathered so far:

import sys
from pathlib import Path
import ruamel.yaml

in_file = Path('input.yaml')
yaml = ruamel.yaml.YAML()
yaml.preserve_quotes = True
data = yaml.load(in_file)

def lookup(d, sk, target, path=None):
    if path is None:
        path = []
    if target[:len(path)] != path:
        return
    if isinstance(d, dict):
        for k, v in d.items():
            if k == sk:
                if isinstance(v, dict):
                    for k1, v1 in v.items():
                        yield path + [k1], v1
                else:
                    yield path, v  # there is an appSetting that doesn't have a dict as value
            newpath = path[:] if isinstance(v, list) else path + [k]
            for res in lookup(v, sk, target, newpath):
                yield res
    elif isinstance(d, list):
        for item in d:
            for res in lookup(item, sk, target, path[:]):  # don't add item to path here, but still need a copy
                yield res

target = 'UAT>Server1>Engine1>Instance3'.split('>')
dout = {key[-1]: value for key, value in lookup(data, sk='appSettings', target=target)}
yaml.dump(dout, sys.stdout)

which gives:

AWSAccessKey: ExampleKey
AWSSecretKey: ExampleSecret
AWSRegion: ExampleRegion
Key1: true
Key2: '123'
Key3: 'abc'
Key4: 'def'
Key5: 'abc-123'
Key6: 'A3B3C3'
Key7: true

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Anthon