'Custom Hashing Algorithm
Out of sheer boredom, I decided to write a hashing algorithm in python. It all works, however I've got a couple problems:
Results are similar (e.g.
hash(1234)
has one character difference tohash(4321)
)There is a repeating sequence forming (more noticeable on smaller inputs)
Most of the values I've used are random, I have no experience writing a hash algorithm but I wanted to give it a shot. Is there anything I can do to resolve the 2 problems I'm having?
def mash(n):
l = [int(x) for x in list(str(n))]
m = []
for i in range(len(l)):
if pow(len(l) >> 1, (i + 1) << 3) % (i + 1) == 0 and i != 0:
m[-1] = int(str(m[-1]) + str(l[i]))
else:
m.append(l[i])
return m
def hash(content, key=None):
"""Hashes the content (either a string or number"""
o = [0] * 128
if type(content) is float:
while content % 1 != 0:
content = content * 10
content = int(content)
if type(content) is str:
content = int(''.join([str(ord(x)) for x in list(content)]))
if type(content) is int:
content = mash(content)
_ = []
for i in range(len(content)):
_.append((chr(((content[i] * (i << 3)) % 94) + 33)))
# 33 - 126
content = ''.join(_)
elif type(content) is not str:
raise Exception
z = [ord(x) for x in list(content)]
for i in range(len(z)):
if z[i] % 2 == 0:
n = next((i for i, x in enumerate(o) if not x and i % 2 == 0), 0)
else:
n = next((i for i, x in enumerate(o) if not x and i % 2 == 1), 0)
o[n] = content[i]
o = [str(x) for x in o]
o.reverse()
init_n = o.count('0')
while o.count('0') > 5:
n = next((i for i, x in enumerate(o) if x == '0'), None)
o[n] = chr(((pow(init_n >> 1, o.count('0'))) % 94) + 33)
o.reverse()
return ''.join(o)
Note that the key
parameter has not yet been implemented. Wanted to resolve existing issues before attempting another feature.
Solution 1:[1]
I know that this question is around four years old at this point, but I still wanted to give an answer to this Stack Overflow post.
Recall with Hash Tables, our goal is "near" constant time searching for data. We have the option of using a binary search tree for a faster O(log n), though it must be stored in sorted order and maintain balance.
We can do better, by using memory for a decent tradeoff in speed! This is a common pattern we see in Computer Science.
Hashing functions are created to convert data of various sizes to a fixed "address" that can be identified with a key. Key features include this algorithm must be done quickly, minimal collisions with existing keys, and repeatable. Collisions will happen so we need a strategy to handle them.
class BasicHashTable:
""" Simple HashTable implementation """
class HashableItem:
def __init__(self, key=None, value=None):
self.key = key
self.value = value
def __init__(self):
self.capacity = 4
self.size = 2**self.capacity - 1
self.slot = [None for _ in range(self.size)]
self.count = 0
def _hash(self, s):
i = 1
hash_value = 0
for char in string:
h += ord(char) * i
i += 1
return h % self.size
Here I have a hashing algorithm that is based on the ordinal value of a character and the position in the string. I also return the value as it is confined within the BasicHashTable size constraint.
Recall that this algorithm fulfills constraint outlined earlier:
- Quick
- Minimal Collisions
- Repeatable
I omitted two key interfaces put and get. Since they can vary and out of the scope of this post. Though, they will handle collisions and resizing the BasicHashTable as it exceeds the capacity.
Cheers!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | kev-odin |