'Getting JSON content from a packet using Scapy with Python

I have a pcapng file that contains a little bit of traffic. One of the packets I am trying to print out is containing JSON data. If I open the packet up in Wireshark, I am able to see the values in the JSON. But when using scapy to read the file and print I don't see it.

from scapy.all import IP, sniff
from scapy.layers import http


def process_tcp_packet(packet):
    if packet.haslayer(http.HTTPRequest):
        http_layer = packet.getlayer(http.HTTPRequest)
        ip_layer = packet.getlayer(IP)
        #print('\n{0[src]} just requested a {1[Method]} {1[Host]}{1[Path]}'.format(ip_layer.fields, http_layer.fields))
        #print(ip_layer.fields)
        #print(http_layer.fields)
        #packet.show()
        print('Packet: ' + str(packet))
        print('\n\n')


# Start sniffing the network.
sniff(offline='test.pcapng', prn=process_tcp_packet, count=2)

Here is the JSON content Wireshark is showing me:

enter image description here

And this is the output I am getting for that packet using the code above..

Packet: b'\x18\x0fv\xef0\x8a\xc4\x98\\\xe7=\x18\x08\x00E\x00\x01&&S@\x00@\x06}\n\xc0\xa8\x89\x94#\xa7(\x91\x9b\xd0\x00P\x16-/\x9e\xb1\xa1\xe8V\x80\x18\x01K\x97\xaf\x00\x00\x01\x01\x08\n\x00\x00\t\xd5\xfb\xc3b\x89POST /v1/identify HTTP/1.1\r\nHost: api.segment.io\r\nUser-Agent: Roku/DVP-9.10 (489.10E04121A)\r\nAccept: application/json\r\nAuthorization: Basic: NHJmY3AzUEJmTUhPVlJsWVZZNTZKRDZ0N1JuMUNoaVY=\r\nContent-Type: application/json\r\nContent-Length: 704\r\n\r\n'

I was reading on how to print the entire content of the packet and thats where I came across both packet.show() and print(packet) however both of them are still missing the JSON data.

I want to get the JSON data because I want to be able to manually parse it. I don't like how Wireshark has all the JSON nested into arrows that I have to drop down to see.

This is the output of show:

enter image description here

And I am using the latest version of scapy.



Solution 1:[1]

It's an old question, but for future people who search for an answer, here is how I did it:

packet_dict = {}
for line in packet.show2(dump=True).split('\n'):
    if '###' in line:
        layer = line.strip('#[] ')
        packet_dict[layer] = {}
    elif '=' in line:
        key, val = line.split('=', 1)
        packet_dict[layer][key.strip()] = val.strip()
print(json.dumps(packet_dict))

Solution 2:[2]

If it can be useful to someone, starting from Yechiel's code I made some improvements:

  • Key values are returned in the correct format instead of all as a string
  • Sublayers are parsed
def pkt2dict(pkt):
    packet_dict = {}
    for line in pkt.show2(dump=True).split('\n'):
        if '###' in line:
            if '|###' in line:
                sublayer = line.strip('|#[] ')
                packet_dict[layer][sublayer] = {}
            else:
                layer = line.strip('#[] ')
                packet_dict[layer] = {}
        elif '=' in line:
            if '|' in line and 'sublayer' in locals():
                key, val = line.strip('| ').split('=', 1)
                packet_dict[layer][sublayer][key.strip()] = val.strip('\' ')
            else: 
                key, val = line.split('=', 1)
                val = val.strip('\' ')
                if(val):
                    try:
                        packet_dict[layer][key.strip()] = eval(val)
                    except:
                        packet_dict[layer][key.strip()] = val
        else:
            log.debug("pkt2dict packet not decoded: " + line)
    return packet_dict

To check if it works on all types of layers returned by scapy.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 STeXE