'Replace tags in text file using key-value pairs from JSON file

I am trying to write a shell script that can read a json string, decode it to an array and foreach through the array and use the key/value for replacing strings in another file.

If this were PHP, then I would write something like this.

$array = json_decode($jsonString, true);
foreach($array as $key => $value)
{
  str_replace($key, $value, $rawString);
}

I need this to be converted to Bash script. Here is the example JSON string.

{
  "login": "lambda",
  "id": 37398,
  "avatar_url": "https://avatars.githubusercontent.com/u/37398?v=3",
  "gravatar_id": "",
  "url": "https://api.github.com/users/lambda",
  "html_url": "https://github.com/lambda",
  "followers_url": "https://api.github.com/users/lambda/followers",
  "following_url": "https://api.github.com/users/lambda/following{/other_user}",
  "gists_url": "https://api.github.com/users/lambda/gists{/gist_id}",
  "starred_url": "https://api.github.com/users/lambda/starred{/owner}{/repo}",
  "subscriptions_url": "https://api.github.com/users/lambda/subscriptions",
  "organizations_url": "https://api.github.com/users/lambda/orgs",
  "repos_url": "https://api.github.com/users/lambda/repos",
  "events_url": "https://api.github.com/users/lambda/events{/privacy}",
  "received_events_url": "https://api.github.com/users/lambda/received_events",
  "type": "User",
  "site_admin": false,
  "name": "Brian Campbell",
  "company": null,
  "blog": null,
  "location": null,
  "email": null,
  "hireable": null,
  "bio": null,
  "public_repos": 27,
  "public_gists": 23,
  "followers": 8,
  "following": 2,
  "created_at": "2008-11-30T21:03:27Z",
  "updated_at": "2016-12-21T23:53:11Z"
}

I've this file,

Lamba login name is %login%, and avatar url is %avatar_url%

I am using jq

jq -c '.[]' /tmp/json | while read i; do
   echo $i
done

This outputs only the value part. How do I loop through key and also get value?

Also, I've found that the keys of the json string can be returned using

jq  'keys' /tmp/params

However, I am still trying to figure out how to loop through the key and return the data.



Solution 1:[1]

The whole thing can be done quite simply (and very efficiently) in jq.

For the sake of illustration, suppose we have defined dictionary to be the dictionary object given in the question, and template to be the template string:

def dictionary: { ...... };

def template: 
  "Lamba login name is %login%, and avatar url is %avatar_url%";

Then the required interpolation can be performed as follows:

dictionary
| reduce to_entries[] as $pair (template; gsub("%\($pair.key)%"; $pair.value))

The above produces:

"Lamba login name is lambda, and avatar url is https://avatars.githubusercontent.com/u/37398?v=3"

There are of course many other ways in which the dictionary and template string can be presented.

Solution 2:[2]

I'm assuming your JSON is in infile.json and the text with the tags to be replaced in infile.txt.

Here is an entirely unreadable one-liner that does it:

$ sed -f <(jq -r 'to_entries[] | [.key, .value] | @tsv' < infile.json | sed 's~^~s|%~;s~\t~%|~;s~$~|g~') infile.txt
Lamba login name is lambda, and avatar url is https://avatars.githubusercontent.com/u/37398?v=3

Now, to decipher what this does. First, a few linebreaks for readability:

sed -f <(
    jq -r '
        to_entries[] |
        [.key, .value] |
        @tsv
    ' < infile.json |
    sed '
        s~^~s|%~
        s~\t~%|~
        s~$~|g~
    '
) infile.txt

We're basically using a sed command that takes its instructions from a file; instead of an actual file, we use process substitution to generate the sed commands:

jq -r 'to_entries[] | [.key, .value] | @tsv' < infile.json |
    sed 's~^~s|%~;s~\t~%|~;s~$~|g~'

Some processing with jq, followed by some sed substitutions.

This is what the jq command does:

  • Generate raw output (no quotes, actual tabs instead of \t) with the -r option
  • Turn the input JSON object into an array of key-value pairs with the to_entries function, resulting in

    [
      {
        "key": "login",
        "value": "lambda"
      },
      {
        "key": "id",
        "value": 37398
      },
      ...
    

    ]

  • Get all elements of the array with []:

    {
      "key": "login",
      "value": "lambda"
    }
    {
      "key": "id",
      "value": 37398
    }
    ...
    
  • Get a list of arrays with key/value in each using [.key, .value], resulting in

    [
      "login",
      "lambda"
    ]
    [
      "id",
      37398
    ]
    ...
    
  • Finally, use the @tsv filter to get the key-value pairs as a tab separated list:

    login   lambda
    id      37398
    ...
    

Now, we pipe this to sed, which performs three substitutions:

  • s~^~s|%~ – add s|% to the beginning of each line
  • s~\t~%|~ – replace the tab with %|
  • s~$~|g~ – add |g to the end of each line

This gives us a sed file that looks as follows:

s|%login%|lambda|g
s|%id%|37398|g
s|%avatar_url%|https://avatars.githubusercontent.com/u/37398?v=3|g

Notice that for these substitutions, we used ~ as the delimiter, and for the substitution commands we generated, we used | – mostly to avoid running into problems with strings containing /.

If this sed file were stored as commands.sed, the overall command would correspond to

sed -f commands.sed infile.txt

Remarks

  • If your shell doesn't support process substitution, you could make sed read from standard input instead, using sed -f -:

    jq -r 'to_entries[] | [.key, .value] | @tsv' < infile.json |
        sed 's~^~s|%~;s~\t~%|~;s~$~|g~' |
        sed -f - infile.txt
    
  • If infile.json contained | or ~, you would have to choose different delimiters for the sed substitutions (see for example this answer about using a non-printable character as a delimiter) or even perform additional substitutions to get rid of the delimiting characters first and put them back in at the end (see this and this Q&A).

  • Some seds (such as BSD sed found in MacOS) have trouble with \t used in the pattern to substitute. If that is the case, the command s~\t~%|~ has to be replaced by s~'$'\t''~%|~ to "splice in" the tab character, or (if the shell doesn't support ANSI-C quoting) even with s~'"$(printf '\t')"'~%|~.

Solution 3:[3]

Here's a simple sed solution. Assume that the json Object is in x.json and the file where the replacements should be done in f.txt. The following x.sed - Programm called as

sed -n -f x.sed x.json <(echo FILE_DELIM) f.txt

does the job.

x.sed:

1,$H
$ {
    x
    :b
    s/\("\([^"]\+\)" *: *\(\("\([^"]*\)"\)\|\(\(\w\|\.\)\+\)\).*FILE_DELIM.*\)%\2%\(.*\)/\1\3\8/
    tb
    s/.*FILE_DELIM\n//
    p
}

The trick is to save the two files (separated by the string FILE_DELIM) in one line in sed's hold space and then recursively replace the keys (e.g. %login%) by their values behind the FILE_DELIM. The crucial point is to define the pattern which matches a key value pair in the json object. Here I used:

" followed by non " followed by " followed by blanks followed by a colon (*1) followed by blanks followed by (again a qouted string or a string consisting of (word characters or .)) (*2)

The backreference \2 in the search pattern matches the key and is replaced with \3 which matches the value.

*1): Up to here this matches a key like "login"

*2): The values are allowed to be "xyz", "", abc, 0.1, ...

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2
Solution 3