python为什么多线程和不同的函数/作用域共享一个导入过程

2024-04-20 12:35:04 发布

您现在位置:Python中文网/ 问答频道 /正文

这个陷阱是我多年前使用python以来最难发现的bug。在

让我举一个过于简单的例子,我有这个文件/dir:

[xiaobai@xiaobai import_pitfall]$ tree -F -C -a
.
├── import_all_pitall/
│   ├── hello.py
│   └── __init__.py
└── thread_test.py

1 directory, 3 files
[xiaobai@xiaobai import_pitfall]$

螺纹含量_测试.py公司名称:

^{pr2}$

内容你好.py公司名称:

[xiaobai@xiaobai import_pitfall]$ cat import_all_pitall/hello.py
print( "haha0" )
import time
t = time.time()
print( "haha1" )
def do_task():
    success = 0
    while not success:
        try:
            time.sleep(1)
            undefined_func( "Done haha" )
            success = 1
        except Exception as e:
            print("exception occur", e)
            print( "haha time is ", t )
do_task()
print( "haha -1" )
[xiaobai@xiaobai import_pitfall]$

而import\u all\u pitall/init.py是一个空文件。在

让我们运行它:

[xiaobai@xiaobai import_pitfall]$ python thread_test.py 
main 1
main 2
do_import 1A
 main 3
haha0
haha1
main 4
do_import 2A
main 5
{'do_import1': <function do_import1 at 0x7f9d884760c8>, 'do_import3': <function do_import3 at 0x7f9d884a6758>, 'do_import2': <function do_import2 at 0x7f9d884a66e0>, '__builtins__': <module '__builtin__' (built-in)>, '__file__': 'thread_test.py', 't2': <Thread(Thread-2, started 140314429765376)>, '__package__': None, 'threading': <module 'threading' from '/usr/lib64/python2.7/threading.pyc'>, 't': <Thread(Thread-1, started 140314438158080)>, 'time': <module 'time' from '/usr/lib64/python2.7/lib-dynload/timemodule.so'>, '__name__': '__main__', '__doc__': None}
do_import 3A
('exception occur', NameError("global name 'undefined_func' is not defined",))
('haha time is ', 1439451183.753475)
('exception occur', NameError("global name 'undefined_func' is not defined",))
('haha time is ', 1439451183.753475)
('exception occur', NameError("global name 'undefined_func' is not defined",))
('haha time is ', 1439451183.753475)
('exception occur', NameError("global name 'undefined_func' is not defined",))
('haha time is ', 1439451183.753475)
('exception occur', NameError("global name 'undefined_func' is not defined",))
('haha time is ', 1439451183.753475)
('exception occur', NameError("global name 'undefined_func' is not defined",))
('haha time is ', 1439451183.753475)
^C('exception occur', NameError("global name 'undefined_func' is not defined",))
('haha time is ', 1439451183.753475)
('exception occur', NameError("global name 'undefined_func' is not defined",))
('haha time is ', 1439451183.753475)
^C('exception occur', NameError("global name 'undefined_func' is not defined",))
('haha time is ', 1439451183.753475)
^C^C('exception occur', NameError("global name 'undefined_func' is not defined",))
('haha time is ', 1439451183.753475)
... #Forever

仔细看,“do_import 2B”和“do峎import 3B”在哪里?它只是挂在导入指令上,甚至不去导入的第一行,因为只有一行时间。时间()将运行。它挂起仅仅是因为第一个在“未完成”循环状态下将同一个模块导入另一个线程/函数。我的整个系统都是大而多线程的,在我知道情况之前非常难调试。在

在我注释掉“undefined”之后你好.py公司名称:

print( "haha0" )
import time
t = time.time()
print( "haha1" )
def do_task():
    success = 0
    while not success:
        try:
            time.sleep(1)
            #undefined_func( "Done haha" )
            success = 1
        except Exception as e:
            print("exception occur", e)
            print( "haha time is ", t )
do_task()
print( "haha -1" )

然后运行它:

[xiaobai@xiaobai import_pitfall]$ python3 thread_test.py 
main 1
main 2
do_import 1A
main 3
main 4
do_import 2A
main 5
{'do_import3': <function do_import3 at 0x7f31a462c048>, '__package__': None, 't2': <Thread(Thread-2, started 139851179529984)>, '__name__': '__main__', '__cached__': None, 'threading': <module 'threading' from '/usr/lib64/python3.4/threading.py'>, '__doc__': None, 'do_import2': <function do_import2 at 0x7f31ac1d56a8>, 'do_import1': <function do_import1 at 0x7f31ac2c0bf8>, '__spec__': None, 't': <Thread(Thread-1, started 139851187922688)>, '__file__': 'thread_test.py', 'time': <module 'time' from '/usr/lib64/python3.4/lib-dynload/time.cpython-34m.so'>, '__loader__': <_frozen_importlib.SourceFileLoader object at 0x7f31ac297048>, '__builtins__': <module 'builtins' (built-in)>}
do_import 3A
haha0
haha1
haha -1
do_import 1B 139851188124312 {'hello': <module 'import_all_pitall.hello' from '/home/xiaobai/note/python/import_pitfall/import_all_pitall/hello.py'>}
do_import 2B 139851188124312 {'h': <module 'import_all_pitall.hello' from '/home/xiaobai/note/python/import_pitfall/import_all_pitall/hello.py'>}
do_import 3B 139851188124312 {'h2': <module 'import_all_pitall.hello' from '/home/xiaobai/note/python/import_pitfall/import_all_pitall/hello.py'>}
main -1
[xiaobai@xiaobai import_pitfall]$ 

我打印了身份证,发现他们都有相同的身份证139851188124312。因此3个函数共享相同的导入对象/进程。但这对我来说没有意义,我认为object是函数的本地对象,因为如果我试图在全局范围上打印导入的“hello”对象,它将抛出错误:

编辑线程_测试.py要在全局范围内打印hello对象,请执行以下操作:

...
print( "main 5" )
print(globals()) #no such hello
time.sleep(2) #slightly wait for do_import 1A import finished to test print hello below.
print( "main 6", id(hello), locals() ) #"name 'hello' not defined" error even do_import1 was success
do_import3()
print( "main -1" )

让我们运行它:

[xiaobai@xiaobai import_pitfall]$ python3 thread_test.py 
main 1
main 2
do_import 1A
main 3
main 4
do_import 2A
main 5
{'t': <Thread(Thread-1, started 140404878976768)>, '__spec__': None, 'time': <module 'time' from '/usr/lib64/python3.4/lib-dynload/time.cpython-34m.so'>, '__cached__': None, '__loader__': <_frozen_importlib.SourceFileLoader object at 0x7fb296b87048>, 'do_import2': <function do_import2 at 0x7fb296ac56a8>, 'do_import1': <function do_import1 at 0x7fb296bb0bf8>, '__doc__': None, '__file__': 'thread_test.py', 'do_import3': <function do_import3 at 0x7fb28ef19f28>, 't2': <Thread(Thread-2, started 140404870584064)>, '__name__': '__main__', '__package__': None, '__builtins__': <module 'builtins' (built-in)>, 'threading': <module 'threading' from '/usr/lib64/python3.4/threading.py'>}
haha0
haha1
haha -1
do_import 1B 140404879178392 {'hello': <module 'import_all_pitall.hello' from '/home/xiaobai/note/python/import_pitfall/import_all_pitall/hello.py'>}
do_import 2B 140404879178392 {'h': <module 'import_all_pitall.hello' from '/home/xiaobai/note/python/import_pitfall/import_all_pitall/hello.py'>}
Traceback (most recent call last):
  File "thread_test.py", line 31, in <module>
    print( "main 6", id(hello), locals() ) #"name 'hello' not defined" error even do_import1 was success
NameError: name 'hello' is not defined
[xiaobai@xiaobai import_pitfall]$ 

hello不是全局的,但是为什么不同的线程可以在不同的函数中共享它呢?为什么python不允许唯一的本地导入?为什么python共享导入过程,而仅仅因为一个线程在导入过程中挂起,它就让所有其他线程无缘无故地“等待”?在


Tags: namepyimporthellotimeismainnot
2条回答

我建议您打印threading.current_thread().name,并在所有打印中命名线程。更容易理解谁做了这个动作。在

Look carefully, where does "do_import 2B" and "do_import 3B" ?

Python当前正在加载模块,Python导入进程是线程安全的。这意味着两个线程不能同时加载同一个模块。这不是关于处理time.time(),而是关于对文件进行锁定。在

I print the id and figure they all share the same id 140589697897480

是的,因为Python只加载一个模块一次。将Python模块视为singleton。在

Hello is not global, but why it can be share by different thread's in different functions ?

这是因为hello是指向共享模块的局部变量。如果如前所述,您将模块视为singleton,然后认为同一进程中线程之间的所有内存都是共享的,那么singleton将与所有线程共享。在

正如很多人所说,这不是一个bug,而是一个特性:)


这是另一个例子。让我们考虑两个文件:main.py这是执行的文件,other.py是导入的文件。在

这是main.py

import threading
import logging
logging.basicConfig(level=logging.INFO)

def do_import_1():
    import other
    logging.info("I am %s and who did the import job ? %s", threading.current_thread().name, other.who_did_the_job.name)

def do_import_2():
    import other
    logging.info(other.who_did_the_job.name)
    logging.info("I am %s and who did the import job ? %s", threading.current_thread().name, other.who_did_the_job.name)

thread_import_1 = threading.Thread(target=do_import_1, name="Thread import 1")
thread_import_2 = threading.Thread(target=do_import_2, name="Thread import 2")

thread_import_1.start()
thread_import_2.start()

这是other.py

^{pr2}$

我使用logging以避免两个线程同时尝试在stdout中写入时出现问题。下面是我得到的结果(使用python2.7):

Thread loading the module :  Thread import 1
INFO:root:I am Thread import 1 and who did the import job ? Thread import 1
INFO:root:Thread import 1
INFO:root:I am Thread import 2 and who did the import job ? Thread import 1

如您所见,模块只导入一次。在

回答其中一个问题-

I print the id and figure they all share the same id 140589697897480. So 3 functions share the same import object/process.

是的,当您导入模块时,python导入模块对象并将其缓存在sys.modules中。然后对于该模块的任何后续导入,python从sys.modules获取模块对象并返回,它不再导入。在

对于同一问题的第二部分-

But this doesn't make sense to me, i though object is local to the function, because if i try to print imported "hello" object on global scope, it will throw error

嗯,sys.modules不是本地的,但是名称hello是函数的局部名称。如上所述,如果您再次尝试导入模块,python将首先查找sys.modules以查看它是否已被导入,并返回是否包含该模块,否则导入它并添加到sys.modules。在


对于第一个程序,当导入python模块时,它是从顶层运行的,在您的hello.py中,有一个infite循环-while 1:,因为1总是真的。进口永远不会结束。在

如果不希望无限循环运行,则应将导入模块时不想运行的代码放在-

if __name__ == '__main__':

上面if语句中的代码只会运行,如果脚本是直接运行的,则在导入模块时不会运行。在


我猜你说-

After i comment out the '#undefined_func( "Done haha" )' in hello.py

实际上,您注释掉了完整的无限循环,因此导入是成功的。在

相关问题 更多 >