'YAML indentation for array in hash

I think indentation is important in YAML.

I tested the following in irb:

> puts({1=>[1,2,3]}.to_yaml)
--- 
1: 
- 1
- 2
- 3
 => nil 

I expected something like this:

> puts({1=>[1,2,3]}.to_yaml)
--- 
1: 
  - 1
  - 2
  - 3
 => nil 

Why isn't there indentation for the array?

I found this at http://www.yaml.org/YAML_for_ruby.html#collections.

The dash in a sequence counts as indentation, so you can add a sequence inside of a mapping without needing spaces as indentation.



Solution 1:[1]

Both ways are valid, as far as I can tell:

require 'yaml'

YAML.load(%q{--- 
1:
- 1
- 2
- 3
})
# => {1=>[1, 2, 3]}

YAML.load(%q{--- 
1:
  - 1
  - 2
  - 3
})
# => {1=>[1, 2, 3]}

It's not clear why you think there should be spaces before the hyphens. If you think this is a violation of the spec, please explain how.

Why isn't there indentation for the array?

There's no need for indentation before the hyphens, and it's simpler not to add any.

Solution 2:[2]

It's so you can do:

1: 
- 2: 3
  4: 5
- 6: 7
  8: 9
- 10
=> {1 => [{2 => 3, 4 => 5}, {6 => 7, 8 => 9}, 10]}

Basically, dashes delimit objects, and indentation denotes the "value" of the key-value pair.

That's the best I can do; I haven't managed to find any of the reasons behind this or that aspect of the syntax.

Solution 3:[3]

The short answer is that both are valid because they are unambiguous for the YAML parser. This fact was already pointed by the other answers, but allow me to add some gasoline to this discussion.

YAML uses indentation not only for aesthetics or readability, it has a crucial meaning when composing different data structures and nesting them:

# YAML:         # JSON equivalent:
                 
---             # {
one:            #   "one": {
  two:          #     "two": null,
  three:        #     "three": null
                #   }
                # }
                
---             # {
one:            #   "one": {
  two:          #     "two": {
    three:      #       "three": null
                #     }
                #   }
                # }

As we can see, the simple addition of an indentation level before three changes its nesting level and removes the previous null value assignment we had for two.

This behavior is, however, not consistent when it comes to lists, as they tolerate the removal of a level of indentation that we would naturally expect to occur (as anticipated by the OP), in order to reflect the correct nesting level of the items. It will still work the same way:

YAML:           # JSON equivalent:
                 
---             #
one:            #
  two:          #
    - foo       # {            
    - bar       #   "one": {   
                #     "two": [ 
                #       "foo", 
                #       "bar"  
                #     ]        
---             #   }          
one:            # }            
  two:          #
  - foo         #
  - bar         #

The second form above is somewhat unexpected and breaks with the idea that the indentation level is connected to nesting level, as it is very clear that both two (an object) and the nested list are written with the same indentation, but are placed at different nesting levels.

What is even worse, it won't work all the times, but only when the list is placed immediately under an object key. Nesting lists inside other lists won't allow freely dropping a level of indentation because, obviously, would bring the nested elements to the parent list:

YAML:           # JSON equivalent:
                 
---             # {
one:            #   "one": {
  two:          #     "two": [
    -           #       null,
    -           #       [
      -         #         null,
      -         #         null
                #       ]
                #     ]
                #   }
                # }
                 
---             # {           
one:            #   "one": {  
  two:          #     "two": [
    -           #       null, 
    -           #       null, 
    -           #       null, 
    -           #       null  
                #     ]       
                #   }         
                # }         

I know, I know... Don't even start and say that the example above is a bit extreme and could be considered an edge case. They are perfectly valid data structures and prove my point. More complicated situations also happen when mixing objects and nested lists of objects, specially if they have a single key. Not only it may lead to errors in the data structure declaration, but also becomes extremely hard to read.

The following YAML documents are identical:

---
one:
  two:
  - three: foo
  - bar
  - four:
    - baz
    five:
    - fizz
    - buzz
    six:
  seven:

---
one:
  two:
    - three: foo
    - bar
    - four:
        - baz
      five:
        - fizz
        - buzz
      six:
  seven:

I don't know about you, but I find the second one much easier to read and follow, specially in a very large document. It's very easy to get lost in the first one, specially when losing the visibility of the beginning of a given object declaration. There is simply no clear connection between the indentation level and the nesting level.

Keeping the indentation level consistently connected to the nesting level is very important to improve readability. Allowing the suppression of an indentation level for lists as optional sometimes is something you have to be very careful about.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Darshan Rivka Whittle
Solution 2 Narfanator
Solution 3 Victor Schröder