Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Encryption may generate functions that cause Python to crash after 1024 or 2048 calls. #1491

Open
irreg opened this issue Sep 21, 2023 · 22 comments
Labels

Comments

@irreg
Copy link

irreg commented Sep 21, 2023

Encryption may generate functions that cause Python to crash with a memory access violation (0xc0000005) after 1024 or 2048 calls.

The exe is generated from source code containing about 5000 functions, and this has occurred 4 times in about 200 encryptions. (In other words, a problematic function is generated with low probability, and if we generate it again with the same py file, it will not occur.)
The function that causes the problem appears to change randomly each time it is generated.

The command at the time of generation is as follows

  • pyarmor pack -s main.spec -x "--restrict=101 --exclude .venv" main.py

The above command generates a single exe file, but the problem also occurs when the exe file is disassembled back into a .py file and executed

I tried to find out as much as possible about one of the files where the problem occurred.
It seemed that the error occurred at the moment of calling a particular encrypted function from a particular encrypted function in another file. I replaced one of the two files with an unencrypted one and the problem no longer occurs.
However, I have not been able to get any detailed results because of the encryption.

Is there any possible known or resolved cause?
I will add more if I find out anything else.

C:\Users\Admin>pyarmor info
INFO     PyArmor Version 6.8.0
INFO     Python 3.8.9
ERROR    [Errno 2] No such file or directory: '.pyarmor_config'

OS: Windows 10 x64

@irreg irreg added the bug label Sep 21, 2023
@jondy
Copy link
Contributor

jondy commented Sep 21, 2023

Try to upgrade latest Pyarmor 7.x version: 7.7.4

It's better to provide sample script could reproduce in my side

@jondy
Copy link
Contributor

jondy commented Sep 22, 2023

Also try to remove option --restrict=101

@jondy jondy closed this as completed Sep 22, 2023
@irreg
Copy link
Author

irreg commented Oct 31, 2023

Thanks for the reply. I have tried what you pointed out.

  • Pyarmor 7.7.4: No change, error still occurs.
  • Remove --restrict=101: No change, error still occurs.

I've created a reproduction code below.

Abstract

This is a procedure that creates 10000 files that may cause problems and then runs and tests them all.

File structure

root
|- run.py # Entry point
|- main.py* # Script to launch sub-processes from the main process
|- sub.py* # Script to call the files these are causing the problem
|- mod.py # File that may cause problems after encryption
|- cls.py* # Definition loaded from mod.py
|- duplicate.py* # Script to copy mod.py to suspects folder 10000 times before encryption

Files marked with * do not require encryption; the problem reproduces without encryption.

File Contents

run.py

import main

if __name__ == '__main__':
    print("start")
    main.run()
    print("end")

main.py

from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor, as_completed
import sys
import re


def submit(n):
    import sub
    sub.run(n)


def execute(n):
    with ProcessPoolExecutor(max_workers=1) as e:
        future = e.submit(submit, n)
        future.result()


def run():
    max_workers = 12
    n_list = list(range(10000))
    continue_on_error = False
    for arg in sys.argv[1:]:
        if matched := re.search(re.compile(r"-*worker=(?P<num>\d+)"), arg):
            max_workers = int(matched.group("num"))
        elif matched := re.search(re.compile(r"-*range=(?P<begin>\d+)[,:-](?P<end>\d+)"), arg):
            n_list = n_list[int(matched.group("begin")):int(matched.group("end"))]
        elif matched := re.search(re.compile(r"-*continue-on-error"), arg):
            continue_on_error = True
        else:
            print(arg)
    e = ThreadPoolExecutor(max_workers=max_workers)
    futures = [e.submit(execute, n) for n in n_list]
    try:
        for future in as_completed(futures):
            n = n_list[futures.index(future)]
            try:
                future.result()
            except Exception as ex:
                print(f"{n},failed: {ex}")
                if not continue_on_error:
                    raise
            else:
                print(f"{n},success")
    finally:
        for future in futures:
            future.cancel()
        e.shutdown(wait=False)

sub.py

The reproduction probability will decrease if there is not some complex processing before calling the function in mod.py.
A script that can be reproduced with a relatively short effective description is described in sub.py.
Updated on November 1 as a faster method was found. If this complex processing is executed at least 8 times at the end, the probability of occurrence seems to be almost the same

import pandas as pd
import importlib


class A:
    def a(self, a, b):
        pass

    def b(self, t):
        pass


def run(n):
    mod = importlib.import_module(f"suspects.mod{n:04}")
    for i in range(1025):
        if i > 1015:
            aaa = [pd.DataFrame({j: [0.0] for j in range(97)}).astype("category") for _ in range(4)]
            for a in aaa:
                for s in "abcdefg":
                    a[s] = ""
            bbb = [a.to_numpy() for a in aaa]
            c = pd.DataFrame({"": [0.0]}).astype("category")
            d = c.to_numpy()
        mod.f(A(), A())
        if i % 100 == 0:
            print(f"{n},{i}")

mod.py

I couldn't find a way to reproduce it in a shorter description.

from cls import C


def f(a, c):
    l = list()
    n = ""
    l.append(n)
    if c is None:
        t = C(C(a=l).x({n: None}))
    else:
        t = c
    a.a(a=True, b=True)
    r = C(a.b(t))
    r.x = l
    return r

cls.py

class C:
    def __init__(self, a):
        pass

duplicate.py

import shutil
import os

os.makedirs("suspects", exist_ok=True)
for i in range(10000):
    shutil.copy(
        r"mod.py",
        rf"suspects\mod{i:04}.py",
    )

Reproduction environment

  • Windows 10/11 x64
  • Python 3.8.9
  • Pyarmor 6.8.0/7.7.4
  • (pandas 2.0.3)

Reproduction procedures

  1. Arrange files as written in the File structure
  2. Execute python duplicate.py
  3. Execute pyarmor obfuscate run.py --recursive
  4. Move to dist folder
  5. Execute python run.py

The following is no longer necessary because sub.py could be made faster (updated November 1). It takes about an hour.
Since it takes more than 10 hours to test everything, it is recommended to divide the work among several PCs with the range=x:y option specified.

python run.py range=0:5000

The default setting is to stop when one problematic file is found, but if you want to get all results without stopping, specify the options as follows.

python -u run.py continue-on-error >result.csv

In my environment, I found about 1 or 2 files per 1000 files with errors. The error does not always occur even in problematic files, and some files have only a 1% or less chance of error (for the current sub.py description). The probability of occurrence varies from file to file.
Since some files only occur with low probability, it may be necessary to repeat the test hundreds of times to find all problematic files.

@irreg
Copy link
Author

irreg commented Oct 31, 2023

@jondy Could you please reopen it?

@jondy
Copy link
Contributor

jondy commented Nov 1, 2023

Sorry, in test scripts, there is third-party library pandas, generally I only debug Python offical system library, not other libraries.

And pyarmor obfuscated scripts has different frame, for example, sys._getframe(1) will get different frame as original scripts.

And someone else has reported the issues about pandas to use sys._getframe to query some local variables.

So there are 2 solutions, first patch pandas, there are some examples for Pyarmor 8, but it also works for Pyarmor 7
https://pyarmor.readthedocs.io/en/latest/how-to/third-party.html

The other solution is to use --obf-code 0 to obfuscate the related scripts, the default options for other scripts.

@irreg
Copy link
Author

irreg commented Nov 1, 2023

It seems that sys._getframe is never called in this code. I will continue to investigate.

@irreg
Copy link
Author

irreg commented Nov 2, 2023

We have investigated the cause.
Apparently, it occurs in many cases if the function is calling native code (written in C language etc.).
I was able to reproduce it even with ThreadPoolExecutor and sqlite objects in the standard library as shown below.

sub.py ver.2

import importlib
from concurrent.futures import ThreadPoolExecutor

class A:
    def a(self, a, b):
        pass

    def b(self, t):
        pass

def a(c):
    b = ThreadPoolExecutor()
    return ThreadPoolExecutor()


def run(n):
    mod = importlib.import_module(f"suspects.mod{n:04}")
    for i in range(1025):
        if i > 1015:
            ccc = [a(ThreadPoolExecutor()) for _ in range(512)]
        mod.f(A(), A())

sub.py ver.3

import importlib
import sqlite3

class A:
    def a(self, a, b):
        pass

    def b(self, t):
        pass

def a(c):
    b = sqlite3.connect(":memory:")
    return sqlite3.connect(":memory:")


def run(n):
    mod = importlib.import_module(f"suspects.mod{n:04}")
    for i in range(1025):
        if i > 1015:
            ccc = [a(sqlite3.connect(":memory:")) for _ in range(512)]
        mod.f(A(), A())

No error seems to occur when using "--obf-code 0". However, considering the results so far, it is a little difficult because all codes must be set to "--obf-code 0". (If the conditions that cause the problem are met, a remote encrypted file that is completely unrelated to those conditions will cause an error.)

We also checked the behavior with version 8 (trial).
It was reproduced when using the old command, but may not be reproduced when using the new command.

old: pyarmor-7 obfuscate run.py --recursive
new: pyarmor gen run.py . -r --private

If there is no problem with the new command, that's enough, but I would appreciate it if you could add it and find out what the problem was.
Thank you for your response.

@irreg
Copy link
Author

irreg commented Nov 7, 2023

Although the CPU is different, there appear to be many similarities in the conditions of occurrence and problems with issue #885. It may be the same case. (Confirmed that advanced 2 and restrict 2 options have nothing to do with reproducing the problem)

@jondy
Copy link
Contributor

jondy commented Nov 8, 2023

@irreg If Python > 3.6, try it with Pyarmor 8.

@irreg
Copy link
Author

irreg commented Nov 8, 2023

I have tried it on Pyarmor 8.4.3.

I can reproduce the problem by executing the encrypted file with the following command.

  • pyarmor-7 obfuscate run.py --recursive

I could not reproduce the problem when using the new command as shown below.

  • pyarmor gen run.py . -r --private

However, the new command does not work with Restrict Mode 100+ we were using. Because of this, We are seriously looking for a replacement for this feature.

@irreg
Copy link
Author

irreg commented Nov 22, 2023

We further investigated the conditions under which the problem occurs in version 7 and found that the problem can be reproduced with any content of the mod.f function as long as the bytecode before encryption is 16, 48, 64, 192 or 448 ... instructions.

Therefore, it will also occur in the following cases

mod.py (48 Instructions)

def f(a, c):
    if None is not None:
        x = None
        x = None
        x = None
        x = None
        x = None
        x = None
        x = None
        x = None
        x = None
        x = None
        x = None
        x = None
        x = None
        x = None
        x = None
        x = None
        x = None
        x = None
        x = None
        x = None
        x = None

@jondy jondy reopened this Nov 23, 2023
@jondy
Copy link
Contributor

jondy commented Nov 23, 2023

How about only obfuscate mod.py by --obf-code 0?

First obfuscate all the scripts with normal options, save to dist/
Then obfuscate mod.py by --obf-code 0, save to dist2/
Next copy dist2/mod.py to dist

Thanks for your efforts, but now Pyarmor 8 still has tasks, pyarmor.man, Refine RFT mode, enhancement for BCC mode, after that it's Pyarmor 7 bugs.

The prefer solution is to upgrade Pyarmor to 8+

This issue may be as knonwn issue for Pyarmor 7

@irreg
Copy link
Author

irreg commented Nov 23, 2023

Thanks for the replies and suggestions.
We agree with your priorities.
I think it will work perfectly if we encrypt all mod.py with --obf-code 0, as you said.
However, it is quite complicated and bottlenecked for us to completely list the files that match the problematic conditions.

In this way, it seems most likely to move to the version 8 license, although there are some challenges in the transition.
We are considering adopting version 8.

@Thoufak
Copy link

Thoufak commented Feb 26, 2024

@irreg, Hi. Have you managed to figure out a way to detect builds that will (or are likely to) crash? Also, if you have any new details to share about this bug, I'd be happy to know, as I'm having the same issue.

@irreg
Copy link
Author

irreg commented Feb 26, 2024

@Thoufak
It is difficult to check if a problem occurs after encryption.
Following the example above, it is necessary to actually call all functions 1024 times to find out. Currently, no other method has been found.

Compared to the above, it is still more practical to check if a function meets the conditions for a problem to occur.
For a module before encryption, the number of instructions of all callable objects can be checked to some extent as shown in the example below.
If you insert an appropriate meaningless statement (e.g., _ = None) in a function that satisfies the condition, you can avoid the problem, although it will take a lot of time and effort.
However, the code below does not support decorators, async functions, inner functions, etc., so detection is incomplete.
Our team decided to move to pyarmor 8 before resolving these issues, so we have not investigated further.
Currently, version 8 (unless you use compatibility mode) has not caused any problems even after hundreds of thousands of encryptions, so statistically, we believe that the above problems have been solved.

import target # Modules you want to check


def check_callable(name, obj):
    if callable(obj) and hasattr(obj, "__code__"):
        inst_len = len(obj.__code__.co_code) // 2
        if inst_len in (16, 48, 64, 192, 448):
            print(f"found: {name}")
        print(inst_len)


def check_class(obj):
    if isinstance(obj, type):
        for inner_name, inner_obj in obj.__dict__.items():
            check_class(inner_obj)
            check_callable(inner_name, inner_obj)


for name, obj in target.__dict__.items():
    check_class(obj)
    check_callable(name, obj)

@Thoufak
Copy link

Thoufak commented Feb 26, 2024

@irreg, thank you very much for the response. I don't think I will switch to pyarmor 8 any time soon. So, it seems that I could write a tool that would check the instructions count across all my codebase and, if It finds something with an unwanted number of them, it can append one meaningless instruction. Quite tedious, but I'm glad there's finally at least some hope for this issue to be gone (I've been living with it for over a year now).

By the way, is this a complete list or there may be more values to be discovered?

if inst_len in (16, 48, 64, 192, 448):

@irreg
Copy link
Author

irreg commented Feb 27, 2024

@Thoufak
We have not checked all the number of instructions that correspond to the conditions of occurrence.
If necessary, the following should be used to find out.

search.bat

python -m venv .venv
call .venv\Scripts\activate.bat
pip install pyarmor==*.*.*
rem echo Change the search range if necessary
for /L %%i in (3,1,512) do (
  python gen_mod.py %%i
  python duplicate.py
  cd test
  rd dist /s /q
  pyarmor obfuscate run.py --recursive
  cd dist
  python run.py > ..\..\%%i_result.txt
  cd ../../
)
pause

gen_mod.py

import sys

i = int(sys.argv[1])

with open("mod.py", "w") as f:
    f.write("def f(a, c):\n")
    if i == 130:
        extend = True
        i -= 2
    else:
        if i > 130:
            i -= 1
        extend = False
    if i <= 3:
        f.write("    pass\n")
        sys.exit()
    i -=2 # Correct the number of instructions due to return statement
    if i < 6:
       if i > 3:
           f.write("    x = None\n")
           i -= 2
       if i == 3:
           f.write("    list()\n")
       else:
           f.write("    x = None\n")
       sys.exit()
    
    f.write("    if None is not None:\n")
    i -= 4

    while i > 0:
        if i == 3:
            f.write("        list()\n")
            break
        f.write("        x = None\n")
        i -= 2
    if extend:
        f.write("    x = None\n")

run.py, main.py, sub.py ver.2, duplicate.py in the above example are required for execution
In our results, the number of mod.py replicas was sufficient to detect about 3000, so reducing the number should speed up the process somewhat.

@jondy
Copy link
Contributor

jondy commented Mar 7, 2024

@irreg

I just do a test with

  • Windows 7/x86_64
  • Python 3.8.3
  • Pyarmor 7

It broken with this exception:

$ python dist/run.py

...
617,success
620,failed: A process in the process pool was terminated abruptly while the future was running or pending.
Traceback (most recent call last):
  File "<dist/run.py>", line 3, in <module>
  File "<frozen run>", line 83, in <module>
  File "<frozen main>", line 39, in run
  File "C:\Python38\lib\concurrent\futures\_base.py", line 432, in result
    return self.__get_result()
  File "C:\Python38\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
  File "C:\Python38\lib\concurrent\futures\thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "<frozen main>", line 14, in execute
  File "C:\Python38\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Python38\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

Is it same as your case?

@irreg
Copy link
Author

irreg commented Mar 8, 2024

@jondy

Same error contents.
However, only when the number of instructions is 16, it may stop with a different exception

other than 16 instructions

2945,success
2696,failed: A process in the process pool was terminated abruptly while the future was running or pending.
Traceback (most recent call last):
  File "<run.py>", line 3, in <module>
  File "<frozen run>", line 74, in <module>
  File "<frozen main>", line 38, in run
  File "C:\Python\Python38\lib\concurrent\futures\_base.py", line 437, in result
    return self.__get_result()
  File "C:\Python\Python38\lib\concurrent\futures\_base.py", line 389, in __get_result
    raise self._exception
  File "C:\Python\Python38\lib\concurrent\futures\thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "<frozen main>", line 14, in execute
  File "C:\Python\Python38\lib\concurrent\futures\_base.py", line 444, in result
    return self.__get_result()
  File "C:\Python\Python38\lib\concurrent\futures\_base.py", line 389, in __get_result
    raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

16 instructions

401,success
400,success
265,failed: '' object has only read-only attributes (assign to ._idle_semaphore)
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
  File "C:\Python\Python38\lib\concurrent\futures\process.py", line 239, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "<frozen main>", line 8, in submit
  File "<frozen sub>", line 21, in run
  File "<frozen sub>", line 20, in <listcomp>
  File "C:\Python\Python38\lib\concurrent\futures\thread.py", line 148, in __init__
    self._idle_semaphore = threading.Semaphore(0)
TypeError: '' object has only read-only attributes (assign to ._idle_semaphore)
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<run.py>", line 3, in <module>
  File "<frozen run>", line 74, in <module>
  File "<frozen main>", line 38, in run
  File "C:\Python\Python38\lib\concurrent\futures\_base.py", line 437, in result
    return self.__get_result()
  File "C:\Python\Python38\lib\concurrent\futures\_base.py", line 389, in __get_result
    raise self._exception
  File "C:\Python\Python38\lib\concurrent\futures\thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "<frozen main>", line 14, in execute
  File "C:\Python\Python38\lib\concurrent\futures\_base.py", line 444, in result
    return self.__get_result()
  File "C:\Python\Python38\lib\concurrent\futures\_base.py", line 389, in __get_result
    raise self._exception
TypeError: '' object has only read-only attributes (assign to ._idle_semaphore)

@jondy
Copy link
Contributor

jondy commented Mar 9, 2024

Got it, thanks.

@jondy
Copy link
Contributor

jondy commented May 8, 2024

I spent almost 2 days on this issue, still not found the reason.

But if using the python option -X dev, it seems the obfuscated script could work. I did this test (using sub.py #ver2, but change 10000 to 1000)

pyarmor obfuscate -r run.py

python3.8 -X dev dist/run.py

Without -X dev, dist/run.py will failed soon (No. 87 failed).

With -X dev, at least it could run 1000 success

And when I try to test plain script with Python 3.9

python3.9 run.py

9,failed: cannot import name 'Lock' from partially initialized module 'multiprocessing.synchronize' (most likely due to a circular import) (/Users/jondy/workspace/pytransform/python/lib/python3.9/multiprocessing/synchronize.py)

It seems it's related to circular import.

And there is no problem for Python 3.10

python3.10 pyarmor/pyarmor.py obfuscate -O dist310 --exclude pyarmor -r run.py
python3.10 dist310/run.py

Now I put it aside until I have new idea, maybe it's only failed in Python 3.8

@irreg
Copy link
Author

irreg commented May 10, 2024

In python 3.10, the bytecode seems to have changed slightly from previous versions.
Therefore, it may be necessary to use a different code to get the desired length of bytecode.

mod.py (16 Instructions for python 3.10 or later)

def f(a, c):
    if None is not None:
        x = None
        x = None
        x = None
        x = None

gen_mod.py (for python3.10 or later)

import sys

i = int(sys.argv[1])

with open("mod.py", "w") as f:
    f.write("def f(a, c):\n")
    if i == 258:
        extend = True
        i -= 2
    else:
        if i > 258:
            i -= 1
        extend = False
    if i <= 3:
        f.write("    pass\n")
        sys.exit()
    i -=2 # Correct the number of instructions due to return statement
    if i < 8:
       if i > 5 :
           f.write("    x = None\n")
           i -= 2
       if i > 3:
           f.write("    x = None\n")
           i -= 2
       if i == 3:
           f.write("    list()\n")
       else:
           f.write("    x = None\n")
       sys.exit()
    
    f.write("    if None is not None:\n")
    i -= 6

    while i > 0:
        if i == 3:
            f.write("        list()\n")
            break
        f.write("        x = None\n")
        i -= 2
    if extend:
        f.write("    x = None\n")
        f.write("    x = None\n")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants