'Tool sed-like to insert a HTML snippet code (a div block) after a <div id="jsn-page">...</div>

I am looking for a way to insert a block <div id="jsn-content-bottom">...code...</div> after a block <div id="jsn-body">...</div>.

I want to use a shell script because I need to apply this insertion to multiple HTML files into recursive directories.

In a first attempt, I tried to use sed. But the issue is that I don't know how to find the right closing tag </div> corresponding to the open tag <div id="jsn-body">. Indeed, there are multiple others <div> tag inside the <div id="jsn-body"> block and I need to find this closing tag (maybe its line number is enough) because I want to insert the block <div id="jsn-content-bottom">...code...</div> just after this closing tag.

Anyone could see how to find easily the line of this closing tag ( when I say that, I guess to use sed in my shell script but I am opened to others tools or Linux command that would make easier this processing of HTML files).

Just a last thing, I would like that inserted block to be stored in a file and handle this file for my insertion (with cat or similar commands).

Update

For the moment, solution suggested by ctac_ is almost working. You can test the HTML source on index.html.txt, with the code snippet to insert insert.txt and the command line suggested, i.e :

awk '
NR==FNR{b=b$0RS;next}
/<div id="jsn-body">/{a=1;s[d]++}
a && /<div/{s[d]++}
a && /<\/div/{s[d]--}
a && s[d]==1{a=0;print $0RS b;next}1' insert.txt index.html.txt > outfile.html.txt

Unfortunately, when I "grep 'jsn-content-bottom" on the output of above awk command ( i.e by remove redirection "> outfile.html.txt" ), no pattern match appears is displayed.

I don't know where the error could come from.

You can test the solution given by ctac_ on the following files :

index.html.txt insert.txt

and with the awk command above.



Solution 1:[1]

You can try this awk

awk '
NR==FNR{b=b$0RS;next}
/<div id="jsn-page">/{a=1;d++}
a && /<div/{d++}
a && /<\/div/{d--}
a && d==1{a=0;print $0RS b;next}1' insert.txt infile.html >outfile.html

insert.txt contain the block 'jsn-content-bottom">...code...'.

first read this file and keep this content in b.

After read infile.html and find the start of block jsn-page.

a is a flag to tell we are in the block.

each time 'div' is seen d is incremented (start of block).

each time '<\div' is seen d is decremented (end of block).

When d return to 1, it's the end of block jsn-page.

a=0 to tell we are out of block.

so print the current line and b (the content of the insert file)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1