'Recovering human-readable Python 3.10 source from cached .pyc bytecode

After manually clearing a corrupt recycle bin on my removable USB drive, I found a couple of my most recently executed Python files to be corrupt as well; opening them with an editor shows their entire contents filled with empty bytes (all 00s). I have no idea how this happened, but in any case, my last backup unfortunately dates to several weeks ago, so I'd like to try to recover the lost source files if at all possible.

I found the relevant .pyc (dated to the day before the corruption) file in .\__pycache__\ and I am attempting to reconstruct a human-readable, ready-to-execute .py file from the binary, but I haven't had much success so far.

Many searches in this vein turn up tools such as uncompyle6 or decompyle3, but neither of these support Python 3.10 and their developer has stated that they are not planning on maintaining either.

Seemingly the only tool/package that does anything remotely related to bytecode decompilation that also supports Python 3.10 is this fork of unpyc3, however it seems to operate on actual code (or code objects; I'm not entirely sure).

Hoping that this tool held the key to my code recovery, this is how far I got on my own:

from unpyc3 import decompile
import dis, marshal

with open("thermo.cpython-310.pyc", "rb") as f:
    f.seek(16) # By all accounts this should be 8 bytes, but 16 is the only way I have successfully been able to read the bytecode
    
    raw = f.read()
    code = marshal.loads(raw)

with open("disassembly.txt", "w", encoding="utf-8") as out:
    dis.dis(code, file=out)

encoding="utf-8" is needed somewhere in the process because some of my variables are Unicode characters (e.g. α, λ, φ, etc.).

This writes what I believe to be a series of CPython Instruction instances to disassembly.txt, a snippet of which I have reproduced below:

   2           0 LOAD_CONST               0 (0)
               2 LOAD_CONST               1 (None)
               4 IMPORT_NAME              0 (Constants)
               6 STORE_NAME               0 (Constants)

   3           8 LOAD_CONST               0 (0)
              10 LOAD_CONST               1 (None)
              12 IMPORT_NAME              1 (EOS)
              14 STORE_NAME               1 (EOS)

   4          16 LOAD_CONST               0 (0)
              18 LOAD_CONST               1 (None)
              20 IMPORT_NAME              2 (ACM)
              22 STORE_NAME               2 (ACM)

   7          24 LOAD_CONST               0 (0)
              26 LOAD_CONST               2 (('sqrt', 'exp', 'log'))
              28 IMPORT_NAME              3 (math)
              30 IMPORT_FROM              4 (sqrt)
              32 STORE_NAME               4 (sqrt)
              34 IMPORT_FROM              5 (exp)
              36 STORE_NAME               5 (exp)
              38 IMPORT_FROM              6 (log)
              40 STORE_NAME               6 (log)
              42 POP_TOP

The actual source file thermo.py which I'm trying to recover is nearly 3000 lines long, so I'm not going to reproduce the whole output here (nor do I think I can anywhere since it exceeds Pastebin's free limit of 512 kB).

This seems like the correct information, but my programming experience completely dries up once we reach this assembly-adjacent code, and I'm honestly at a loss as to the next step. It appears that unpyc3.decompile() accepts a Python module, a Python function or a CPython PyCodeObject as input, but unpyc3's documentation is not very detailed.

So my issue now is this:

  • If my above marshal/disassembly approach is correct, I don't know how to process the disassembled Instructions to feed to unpyc3.decompile().

  • If the above approach is incorrect, I have no idea where to go with this.

If anyone knows how to progress with this problem (or whether or not my goal is actually achievable), I'd appreciate any advice.



Solution 1:[1]

try pycdc, pycdc is a linux based c++ tool, is used to reverse python3.10+ pyc code to its original form. find some info on pycdc

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Md Josif Khan