'How are Docker image names parsed?

When doing a docker push or when pulling an image, how does Docker determine if there is a registry server in the image name or if it is a path/username on the default registry (e.g. Docker Hub)?

I'm seeing the following from the 1.1 image specification:

Tag

A tag serves to map a descriptive, user-given name to any single image ID. Tag values are limited to the set of characters [a-zA-Z_0-9].

Repository

A collection of tags grouped under a common prefix (the name component before :). For example, in an image tagged with the name my-app:3.1.4, my-app is the Repository component of the name. A repository name is made up of slash-separated name components, optionally prefixed by a DNS hostname. The hostname must follow comply with standard DNS rules, but may not contain _ characters. If a hostname is present, it may optionally be followed by a port number in the format :8080. Name components may contain lowercase characters, digits, and separators. A separator is defined as a period, one or two underscores, or one or more dashes. A name component may not start or end with a separator.

For the DNS host name, does it need to be fully qualified with dots, or is "my-local-server" a valid registry hostname? For the name components, I'm seeing periods as valid, which implies "team.user/appserver" is a valid image name. If the registry server is running on port 80, and therefore no port number is needed on the hostname in the image name, it seems like there would be ambiguity between the hostname and the path on the registry server. I'm curious how Docker resolves that ambiguity.



Solution 1:[1]

TL;DR: The hostname must contain a . dns separator, a : port separator, or the value "localhost" before the first /. Otherwise the code assumes you want the default registry, Docker Hub.


After some digging through the code, I came across distribution/distribution/reference/reference.go with the following:

// Grammar
//
//  reference                       := name [ ":" tag ] [ "@" digest ]
//  name                            := [hostname '/'] component ['/' component]*
//  hostname                        := hostcomponent ['.' hostcomponent]* [':' port-number]
//  hostcomponent                   := /([a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9-]*[a-zA-Z0-9])/
//  port-number                     := /[0-9]+/
//  component                       := alpha-numeric [separator alpha-numeric]*
//  alpha-numeric                   := /[a-z0-9]+/
//  separator                       := /[_.]|__|[-]*/
//
//  tag                             := /[\w][\w.-]{0,127}/
//
//  digest                          := digest-algorithm ":" digest-hex
//  digest-algorithm                := digest-algorithm-component [ digest-algorithm-separator digest-algorithm-component ]
//  digest-algorithm-separator      := /[+.-_]/
//  digest-algorithm-component      := /[A-Za-z][A-Za-z0-9]*/
//  digest-hex                      := /[0-9a-fA-F]{32,}/ ; At least 128 bit digest value

The actual implementation of that is via a regex in distribution/distribution/reference/regexp.go.

But with some digging and poking, I found that there's another check beyond that regex (e.g. you'll get errors with an uppercase hostname if you don't don't include a . or :). And I tracked down the actual split of the name to the following in distribution/distribution/reference/normalize.go:

// splitDockerDomain splits a repository name to domain and remotename string.
// If no valid domain is found, the default domain is used. Repository name
// needs to be already validated before.
func splitDockerDomain(name string) (domain, remainder string) {
    i := strings.IndexRune(name, '/')
    if i == -1 || (!strings.ContainsAny(name[:i], ".:") && name[:i] != "localhost") {
        domain, remainder = defaultDomain, name
    } else {
        domain, remainder = name[:i], name[i+1:]
    }
    if domain == legacyDefaultDomain {
        domain = defaultDomain
    }
    if domain == defaultDomain && !strings.ContainsRune(remainder, '/') {
        remainder = officialRepoName + "/" + remainder
    }
    return
}

The important part of that for me is the check for the ., :, or the hostname localhost before the first / in the first if statement. With it, the hostname is split out from before the first /, and without it, the entire name is passed to the default registry hostname.

Solution 2:[2]

Solution 3:[3]

Note: Many URL parsing libraries aren't able to parse docker image references / tags, unless they conform to standardized URL format.

Example Ansible Snippet:

- debug: #(FAILS)
    msg: "{{ 'docker.io/alpine' | urlsplit() }}"
# ^-- This will fail, because the image reference isn't in standard URL format

# If you can convert the docker image reference to standard URL format
# Then most URL parsing libraries will work correctly

- debug: #(WORKS)
    msg: "{{ ('https://' + 'docker.io/alpine') | urlsplit() }}"
# ^-- Example: This becomes standard URL syntax, so it parses correctly

- debug: #(FAILS)
    msg: "{{ ('http://' + 'busybox:1.34.1-glibc') | urlsplit('path') }}"
# ^-- Unfortunately, this trick won't work to turn 100% of images into 
#     Standard URL format for parsing. (This example fails as well)

Based on BMitch's answer I realized a simple if statement algorithmic logic could be used to convert arbitrary docker image references / tags into standardized URL format, which allows them to be parsed by most libraries.

Algorithm in human speak:

1. look for / in $TAG
2. If / not found 
   Then return ("https://docker.io/" + $TAG)
3. If / found, split $TAG into 2 parts by first /
   and test text left of /, to look for ".", ":", or "localhost"
4. If (".", ":", or "localhost" found in text left of 1st /)
   Then return (https://" + $TAG)
5. If (".", ":", or "localhost" not found in text left of 1st /)
   Then return (https://docker.io/ + $TAG)

(This logic converts docker tags into standardized URL format 
so they can be processed by URL parsing libraries.)

Algorithm in Bash:
vi docker_tag_to_standardized_url_format.sh
(Copy paste the following)

#!/bin/bash
#This standardizes the naming of docker images
#Basically busybox --------------------> https://docker.io/busybox
#          myregistry.tld/myimage:tag -> https://myregistry.tld/myimage:tag
STDIN=$(cat -)
INPUT=$STDIN
OUTPUT=""

echo "$INPUT" | grep "/" > /dev/null
if [ $? -eq 0 ]; then
  echo "$INPUT" | cut -d "/" -f1 | egrep "\.|:|localhost" > /dev/null
  #Note: grep considers . as wildcard, \ is escape character to treat \. as .
  if [ $? -eq 0 ]; then
    OUTPUT="https://$INPUT"
  else
    OUTPUT="https://docker.io/$INPUT"
  fi
else
  OUTPUT="https://docker.io/$INPUT"
fi

echo $OUTPUT

Make it executable:
chmod +x ./docker_tag_to_standardized_url_format.sh

Usage Example:

# Test data, to verify against edge cases
A=docker.io/alpine
B=docker.io/rancher/system-upgrade-controller:v0.8.0
C=busybox:1.34.1-glibc
D=busybox
E=rancher/system-upgrade-controller:v0.8.0
F=localhost:5000/helloworld:latest
G=quay.io/go/go/gadget:arms
####################################
echo $A | ./docker_tag_to_standardized_url_format.sh 
echo $B | ./docker_tag_to_standardized_url_format.sh
echo $C | ./docker_tag_to_standardized_url_format.sh
echo $D | ./docker_tag_to_standardized_url_format.sh
echo $E | ./docker_tag_to_standardized_url_format.sh
echo $F | ./docker_tag_to_standardized_url_format.sh
echo $G | ./docker_tag_to_standardized_url_format.sh

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2
Solution 3 neokyle