'How to select two columns from awk and print if they do not match

I need to select two MSISDN values from OMO account Migration logs and print the ones that do not match.

[2019-03-11 04:15:08 INFO-SUBAPP ESBRestClient:117] ## IP-103.228.158.85##TOKEN-201903110416276787774(**923419606907**)RESPONSE-BODY: {"callStatus":"false","responseCode":"18","description":"OMO account migration – **923481057772**"}
[2019-03-11 04:24:02 INFO-SUBAPP ESBRestClient:117] ## IP-119.153.134.128##TOKEN-1552260212780839(923214748517)RESPONSE-BODY: {"callStatus":"false","responseCode":"18","description":"OMO account migration – 953214748517"}

923481057772 is the old MSISDN.

923419606907 is the new MSISDN and I need to save it in a new file. I'm was using the following command to select only the new MSISDN:

cat migration.txt | egrep "OMO account migration" | egrep "responseCode\":\"1700" | awk -F"(" '{gsub(/\).*/,"",$2);print $2}' >>newmsisdn.txt

I'm using the saved msisdn values to fetch the token number. Then I'm using those tokens to fetch multiple parameters. Final output is something like this:

Date            Time          Old MSISDN        New MSISDN     Old Profile New Profile  CNIC      Acc Status Acc Status Migration Channel
                                                                                                   (Before)   (After)
2019-03-11  |  00:00:14  |  923135260528  |  923029403541  |  OMO BVS MA  |  0  |  1620221953175  |  ACTIVE  |     |  subapp

2019-03-11  |  00:00:14  |  923135260528  |  923003026654  |  OMO BVS MA  |  0  |  1620221953175  |  ACTIVE  |     |  subapp

2019-03-11  |  00:00:14  |  923135260528  |  923003026654  |  OMO BVS MA  |  0  |  1620221953175  |  ACTIVE  |     |  subapp

2019-03-11  |  00:00:14  |  923135260528  |  923038048244  |  OMO BVS MA  |  0  |  1620221953175  |  ACTIVE  |     |  subapp

In the second log instance, these two values are the same. I need to filter those out i.e. I only need to use the non matching values. How do I compare the two non matching values and print the new MSISDN?

Solution 1:^[1]

Answer for first version of the question

Try:

awk -F'[*][*]' '/OMO account migration/ && /responseCode":"18"/ && $2 != $4 { print $2}' migration.txt

The avoids the need for spawning multiple processes and connecting them with pipelines. That makes this approach comparatively efficient.

How it works

-F'[*][*]'

This sets the field separator to be two stars. In this way the new MSISDN is field 2 and the old one is field 4.
/OMO account migration/ && /responseCode":"18"/ && $2 != $4 { print $4}

This selects for lines which (1) contain the regex OMO account migration/ and (2) contain the regex responseCode":"18" and (3) have the second field different from the fourth. For any such line, the second field is printed.

Example

Let's consider this three-line test file:

$ cat migration.txt 
[2019-03-11 04:15:08 INFO-SUBAPP ESBRestClient:117] ## IP-103.228.158.85##TOKEN-201903110416276787774(**923419606907**)RESPONSE-BODY: {"callStatus":"false","responseCode":"18","description":"OMO account migration – **923481057772**"}
[2019-03-11 04:15:08 INFO-SUBAPP ESBRestClient:117] ## IP-103.228.158.85##TOKEN-201903110416276787774(**923419606888**)RESPONSE-BODY: {"callStatus":"false","responseCode":"19","description":"OMO account migration – **923481057999**"}
[2019-03-11 04:15:08 INFO-SUBAPP ESBRestClient:117] ## IP-103.228.158.85##TOKEN-201903110416276787774(**923419606123**)RESPONSE-BODY: {"callStatus":"false","responseCode":"18","description":"OMO account migration – **923419606123**"}

Let's run our command:

$ awk -F'[*][*]' '/OMO account migration/ && /responseCode":"18"/ && $2 != $4 {print $2}' migration.txt >>newmsisdn.txt

The output file now contains the one new MSISDN that we want:

$ cat newmsisdn.txt 
923419606907

Solution 2:^[2]

Considering that your actual Input_file is same as shown samples and you need new value for each line, if this is the case then try following.

awk '
/OMO account migration/ && /responseCode":"18"/{
  val_old=val_new=""
  match($0,/\*\*[0-9]+\*\*/)
  val_old=substr($0,RSTART,RLENGTH)
  $0=substr($0,RSTART+RLENGTH)
  match($0,/\*\*[0-9]+\*\*/)
  val_new=substr($0,RSTART,RLENGTH)
}
(val_old!=val_new){
  gsub("*","",val_new)
  print val_new
}
'   Input_file

Explanation: Adding detailed explanation for above code now.

awk '                                                     ##Starting awk program here.
/OMO account migration/ && /responseCode":"18"/{          ##Checking condition if a line contains strings OMO account migration AND responseCode":"18" in it then do following.
  val_old=val_new=""                                      ##Nullifying variables val_old and val_new here.
  match($0,/\*\*[0-9]+\*\*/)                              ##Using match OOTB function of awk to match from **digits** here. If match found then value of RSTART and RLENGTH(awk variables) will be SET.
  val_old=substr($0,RSTART,RLENGTH)                       ##Creating variable val_old which is substring of starting point as RSTART and ending point of RLENGTH here.
  $0=substr($0,RSTART+RLENGTH)                            ##Re-defining value of current line with substring whose value starts after matched regexs next index, so that we can catch new value in next further statements.
  match($0,/\*\*[0-9]+\*\*/)                              ##Using match OOTB function of awk to match from **digits** here. If match found then value of RSTART and RLENGTH(awk variables) will be SET(2nd time run).
  val_new=substr($0,RSTART,RLENGTH)                       ##Creating variable named val_new whose value is substring of current line startpoint is RSTART and ending point is RLENGTH here.
}                                                         ##Closing BLOCK for string matching condition here.
(val_old!=val_new){                                       ##Checking condition ig val_old variable is NOT equal to val_new then do following.
  gsub("*","",val_new)                                    ##Globaly subsituting * in val_new to get exact value as per OP need.
  print val_new                                           ##Printing val_new value here.
}
'  Input_file                                             ##Mentioning Input_file name here.

Solution 3:^[3]

I'd go for the following approach : I see that every MSISDN number contains twelve digits ([0-9]), located between two double asterisks.
You can find those using following regular expression:

grep -o "\*\*[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]\*\*"

In case your system supports this, you might simplify this to:

grep -o "\*\*[0-9]{12}\*\*"

Once you have those, you might use awk for showing just the ones being different, something like:

'{IF ($1 != $2) PRINT $1 $2}' (not tested).

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1
Solution 2
Solution 3	Dominique

'How to select two columns from awk and print if they do not match

Solution 1:[1]