'Multithreading not improving results in python?
I am applying Multi-threading to a python script to improve its performance. I don't understand why there is no improvement in the execution time.
This is the code snippet of my implementation:
from queue import Queue
from threading import Thread
from datetime import datetime
import time
class WP_TITLE_DOWNLOADER(Thread):
def __init__(self, queue,name):
Thread.__init__(self)
self.queue = queue
self.name = name
def download_link(self,linkss):
####some test function
###later some processing will be done on this list.
#####this will be processed on CPU.
for idx,link in enumerate(linkss):
##time.sleep(0.01)
test.append(idx)
for idx,i in enumerate(testv):
i=i.append(2)
##
def run(self):
while True:
# Get the work from the queue
linkss = self.queue.get()
try:
self.download_link(linkss)
finally:
self.queue.task_done()
######with threading
testv=[[i for i in range(5000)] for j in range(20)]
links_list=[[i for i in range(100000)] for j in range(20)]
test=[]
start_time =time.time()
queue = Queue()
thread_count=8
for x in range(thread_count):
worker = WP_TITLE_DOWNLOADER(queue,str(x))
# Setting daemon to True will let the main thread exit even though the workers are blocking
worker.daemon = True
worker.start()
##FILL UP Queue for threads
for links in links_list:
queue.put(links)
##print("queing time={}".format(time.time()-start_time))
#print(test)
#wait for all to end
j_time =time.time()
queue.join()
t_time = time.time()-start_time
print("With threading time={}".format(t_time))
#############without threading,
###following function is same as the one in threading.
test=[]
def download_link(links1):
for idx,link in enumerate(links1):
##time.sleep(0.01)
test.append(idx)
for idx,i in enumerate(testv):
i=i.append(2)
start_time =time.time()
for links in links_list:
download_link(links)
t_time = time.time()-start_time
print("without threading time={}".format(t_time))
With threading time=0.564049482345581 without threading time=0.13332700729370117
NOTE: When I uncomment time.sleep, with threading time is lower than without threading. My test case is I have a list of lists, each list has more than 10000s elements, the idea of using multi-threading is that instead of processing a single list item, multiple lists can be processed simultaneously, resulting in a decrease in overall time. But the results are not as expected.
Solution 1:[1]
Python has a concept called 'GIL(Global Interpreter Lock)'. This lock ensures that only one thread looks during runtime. Therefore, even if you spawned multiple threads to process multiple lists, only one thread is processing at a time. You can try multi-processing for parallel execution.
Solution 2:[2]
Threading is awkward in Python because of the GIL (Global Interpreter Lock). Threads have to compete to get the main interpreter to be able to compute. Threading in python is only beneficial when the code inside the thread does not require the global interpreter, ie. when offloading computations to a hardware accelerator, when doing I/O bound computations or when calling a non-python library. For true concurrency in python, use multiprocessing instead. It's a bit more cumbersome as you have to specifically share your variables or duplicate them and often serialize your communications.
Solution 3:[3]
As a general rule (there will always be exceptions) multithreading is best suited to IO-bound processing (this includes networking). Multiprocessing is well suited to CPU-intensive activities.
Your testing is therefore flawed.
Your intention is clearly to do some kind of web-crawling but that's not happening in your test code which means that your test is CPU-intensive and therefore not suitable for multi-threading. Whereas, once you've added your networking code you may find that matters have improved providing you've used suitable techniques.
Take a look at ThreadPoolExecutor in concurrent.futures. You may find that useful in particular because you can swap to multiprocessing by simply replacing ThreadPoolExecutor with ProcessPoolExecutor which will make your experiments easier to quantify
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Kungyu Lee |
Solution 2 | gchapuis |
Solution 3 |