'How to parse this custom log file in Python3

The log file is generated by a program written in C++.

Here is the demo log:

|Frame:0|NUMBER:0
|Frame:1|NUMBER:1|{INDEX:0|RECT:[11,24][31,43]}
|Frame:2|NUMBER:2|{INDEX:0|RECT:[11,24][31,43]}|{INDEX:1|RECT:[11,24][31,43]}
|Frame:3|NUMBER:0

I am trying to read those log files into a list/dict or etc.

Here is the information that I hope to capture from the demo log above:

#frame, number, index, rect
[0,     0] 
[1,     1,      0,      11,24,31,43]
[2,     2,      0,      11,24,31,43,  1,   11,24,31,43]
[3,     0]


Solution 1:[1]

Thanks to @Juan Facundo Peña.

This answer is base his answer. Which makes some improvement to the duplicate keys.

import re

program_result = []

code_list = []
with open("2.log", "r") as f:
    logs = f.readlines()
    for line in logs:
        if line.startswith("|Frame:"):
            parsed_line = line.split("|")
            code_dict = {}
            next_rect_idx_key = ""
            for parse in parsed_line:
                rect_idx = 0
                split_line = parse.strip("{}").split(":")
                key = split_line[0]
                if not key:
                    continue
                data_as_strings = re.findall(r"\d+", split_line[-1])
                data_as_integers = [int(s) for s in data_as_strings]
                if("" != next_rect_idx_key):
                    code_dict[next_rect_idx_key] = data_as_integers
                    next_rect_idx_key = ""
                else:
                    if('INDEX' == key):
                        next_rect_idx_key = key + str(data_as_integers)
                    else:
                        code_dict[key] = data_as_integers

            print(code_dict)
            code_list.append(code_dict)

Solution 2:[2]

This can be solved using the re library.

import re

code_list = []
with open("log_file.log", "r") as f:
    logs = f.readlines()
    for line in logs:
        parsed_line = line.split("|")
        code_dict = {}
        for parse in parsed_line:
            split_line = parse.split(":")
            key = split_line[0]
            if not key:
                continue
            value = re.findall(r"\d+", split_line[-1])
            code_dict[key] = value
        
        code_list.append(code_dict)

You will end up with a list of dictionaries (i.e.:code_list), each of which contains both the key and the values in each line.

In line 3, you will have two "INDEX - RECT" dictionaries, but you can then split the whole logs list by "Frame" to understand what codes belong to what line (if needed).

If you only wish for the numbers, you can also try:

import re

code_list = []
with open("log_file.log", "r") as f:
    logs = f.readlines()
    for line in logs:
        codes = re.findall(r"\d+", line)
        code_list.append(codes)

This approach will give you a list of lists, each of which contains a single line.

Edit: if you try to loop through a single string other than a file, try:

import re

code_list = []
logs = log_string.split("\n")

for line in logs:
    # <<<business as usual>>>

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 John
Solution 2