'Parse Text with "awk" and Modify One Of The Columns With "sed"
I have a data seperated with pipe "|" and I would like to parse it with awk and write it into a DB.
EndpointRequest|ID-ip-172-31-70-119-eu-west-1-compute-internal-209879772|2022-05-12 08:20:03:467|0|ip-172-31-70-119|616e50193233020648|vfgh|GenericAmount|61d458303574b21f|Display|v1|Display-v1|PrepaidEndpoint|6227300ec1786d26|Corporate|62273041c8cf901071786d81|Health Line||||69.28.67.153|Java/1.8.0_321|application/xml|468|475|POST||http://127.0.0.1/endpoint/||200||2022-05-12 08:20:03:458|0|468|7|0|0|0|true|Http|null|null|HTTPConnector:CallPrepaid|Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2\nAuthorization: Bearer e3edbb1d8f5d8c828dc584ed293602bf\nContent-Type: application/xml\nX-Amzn-Trace-Id: Root=1-627cc333-7167\nX-Forwarded-For: XX.XX.XX.XX\nX-Forwarded-Port: 443\nX-Forwarded-Proto: https\n\n<?xml version="1.0"?>\n<!DOCTYPE cp_request SYSTEM "cp_req_websvr.dtd">\n<cp_request>\n <cp_id>YY1880</cp_id>\n <cp_transaction_id>SDP</cp_transaction_id>\n <op_transaction_id>arr684754251</op_transaction_id>\n <application>1</application>\n <action>2</action>\n <user_id type="MSISDN">9999999999</user_id>\n <cp_timer>5</cp_timer>\n <transaction_price>1900</transaction_price>\n <transaction_currency>0</transaction_currency>\n</cp_request>
The data has many lines like the one above and I use the command below to get certain fields.
more file.log | egrep "EndpointRequest|EndpointSuccess|EndpointFailure" | egrep "PrepaidEndpoint" | awk -F"|" '{print $1"|"$2"|"$3"|"$4"|"$5"|"$12"|"$13"|"$15"|"$17"|"$21"|"$25"|"$30"|"$31"|"$32"|"$33"|"$44}'
The thing here is, on the last field (#44), there is an HTTP response that contains some headers and an XML payload. I need to get "op_transaction_id" value ("arr684754251") and add it to the end of the awk command, but am unable to do so. In a seperate command, I can get that value via "sed",
sed -n "s/.*<op_transaction_id>\(.*\)<\/op_transaction_id>.*/\1/p" file.log
How do I migrate the "sed" command into the "awk" command, so I can have "op_transaction_id" value as one of the fields in "awk".
Expected output:
EndpointRequest|ID-ip-172-31-70-119-eu-west-1-compute-internal-209879772|2022-05-12 08:20:03:467|0|ip-172-31-70-119|Display-v1|PrepaidEndpoint|Corporate|Health Line|69.28.67.153|475|200||2022-05-12 08:20:03:458|0|arr684754251
Thank you bash gurus. Any help is appreciated.
Solution 1:[1]
How do I migrate the "sed" command into the "awk" command
You might harness gensub
function, consider following simple example, let file.txt
be |
-sheared with 3 columns:
<tag>text1</tag>|A|1
<tag>text2</tag>|B|2
<tag>text3</tag>|C|3
and say you want to get what is inside tag from 1st field and use ,
then you might do
awk 'BEGIN{FS="|";OFS=","}{$1=gensub(/<tag>(.+)<\/tag>/,"\\1",1,$1);print}' file.txt
which gives output
text1,A,1
text2,B,2
text3,C,3
Arguments to gensub
are regular expression, replacement, how (either number to point which occurence to replace or "g"
for all) and target. gensub
does return altered string, which we then assign as new value for 1st field. FS
inform that field separator is |
and OFS
that output field separator is ,
. Note that you must not mindlessly copy your regular expression from sed
to become 1st argument of gensub
. For example (
and )
are used in GNU sed
to denote literal brackets and needs to be escaped to get capturing group, in GNU AWK
(
and )
denote capturing group and must be escaped to get literal brackets.
(tested in gawk 4.2.1)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Daweo |