2018-09-03

Hash Tables

When we talk about hash tables, we're actually talking about dictionary. While an array can be used to construct hash tables, array indexes its elements using integers. However, if we want to store data and use keys other than integer, such as 'string', we may want to use dictionary.

Dictionaries in Python are implemented using hash tables. It is an array whose indexes are obtained using a hash function on the keys.

We declare an empty dictionary like this:

We declare an empty dictionary like this:

>>> D = {}

Then, we can add its elements:

>>> D['a'] = 1

>>> D['b'] = 2

>>> D['c'] = 3

>>> D

{'a': 1, 'c': 3, 'b': 2}

It's a structure with (key, value) pair:

D[key] = value

The string used to "index" the hash table D is called the "key". To access the data stored in the table, we need to know the key:

>>> D['b']

2

How we loop through the hash table?

>>> for k in D.keys():

...    print D[k]

...

1

3

2

If we want to print the (key, value) pair:

>>> for k,v in D.items():

...    print k,':',v

...

a : 1

c : 3

b : 2

Hashing from two arrays

Using two Arrays of equal length, create a Hash object where the elements from one array (the keys) are associated with the elements of the other (the values):

>>> keys = ['a', 'b', 'c']

>>> values = [1, 2, 3]

>>> hash = {k:v for k, v in zip(keys, values)}

>>> hash

{'a': 1, 'c': 3, 'b': 2}

Hashing

Here are some hashing samples using built-in hash function:

>>> map(hash, [0, 1, 2, 3])

[0, 1, 2, 3]

>>> map(hash, ['0','1','2','3'])

[6144018481, 6272018864, 6400019251, 6528019634]

>>> hash('0')

6144018481

As we can see from the example, Python is using different hash() function depending on the type of data.

Python provides hashlib for secure hashes and message digests:

md5(), sha*():

>>> import hashlib>>> hashlib.md5('a')>>> hashlib.md5('a').digest()'\x0c\xc1u\xb9\xc0\xf1\xb6\xa81\xc3\x99\xe2iw&a;'>>> hashlib.md5('a').hexdigest()'0cc175b9c0f1b6a831c399e269772661'>>> hashlib.sha512('a')>>> hashlib.sha512('a').digest()

'\x1f@\xfc\x92\xda$\x16\x94u\ty\xeel\xf5\x82\xf2\xd5\xd7\xd2\x8e\x183]\xe0Z\xbcT\xd0V\x0e\x0fS\x02\x86\x0ce+\xf0\x8dV\x02R\xaa^t!\x05F\xf3i\xfb\xbb\xce\x8c\x12\xcf\xc7\x95{&R;\xfe\x9au'

>>> hashlib.sha512('a').hexdigest()

'1f40fc92da241694750979ee6cf582f2d5d7d28e18335de05abc54d0560e0f5302860c652bf08d560252aa5e74210546f369fbbbce8c12cfc7957b2652fe9a75'

>>>

Hashing example code

Hashing example code

The following code is a revision from Sets (union/intersection) and itertools - Jaccard coefficient & shingling to check plagiarism. In this section, we used 64 bit integer (hash value from hash()) for the comparison of shingles instead of directly working on the string.

Output is exactly the same as the one we got using string comparison:

combinations=[(0, 1), (0, 2), (1, 2)]

(0, 1) : jaccard=0.0196078431373

(0, 2) : jaccard=0.0

(1, 2) : jaccard=0.0

©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容