'(Python) Hash function returns different values depending on how the script is called (cmd vs IDE)
I have a script that scrapes cell value data from an excel workbook into nested tuples. Then it returns the hash of that tuple. I can run it in my IDE (Spyder 4.0.1, Python 3.7), or I can call the function using command line.
The problem is that that hash is a different number depending on how I call it. This should not be the case, as it should be pulling the same exact data from the same exact excel workbook, and then using the same exact hash function. I already tried some debugging but I'm running out of ideas. Thoughts?
The relevant code:
import extract #my own code, which contains open_excel()
import sys
#This function takes nested lists and turns them into nested tuples.
def list2tuple(l):
lcopy = []
for item in l:
if type(item) == list:
lcopy.append(list2tuple(item))
else:
lcopy.append(item)
return tuple(lcopy)
def hashxl(filename):
filetype = filename[filename.index('.')+1:]
if filetype in ['xlsx','xlsb']:
f = extract.open_excel(filename) #This should be a list of lists of lists of data (sheets, rows, columns of excel data)
h = hash(list2tuple(f))
return h
if sys.argv[1] == 'hash':
print(hashxl(sys.argv[2])
When I run
python thiscodefile.py hash testfile.xlsb
in command line I get -3482465542484766986. When I run
hashxl("testfile.xlsb")
in the Spyder IDE I get 6187680721660987353.
Solution 1:[1]
Yes that's normal because python uses a random hash seed to solve such problems, vulnerability disclosure. To solve your problem you can set a fixed seed using the PYTHONHASHSEED. The best and easiest way to do it is:
import os
import sys
hashseed = os.getenv('PYTHONHASHSEED')
if not hashseed:
os.environ['PYTHONHASHSEED'] = '0'
os.execv(sys.executable, [sys.executable] + sys.argv)
[your code here]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Maykel |