[Re-View]Hash Table (Basics)

Hash table Basics



Intro

The concept, usage and implementations of Hash table are always used in Software Engineer interviews. From the interview guidance of Google, there is an requirement of hash table. It is said "Hashtables: Arguably the single most important data structure known to mankind." There is indeed a bunch of knowledge and techniques for hashtables (hash function, collision, etc.), but from the interview perspective, it is not possible to test the thorough and complete skills of hashtables in a short interview. Take this advantage, in this post, I'd like to learn the basics of hash tables, and try to implement sample code.

What is Hash Table?

It is a very common but often occurred question in IT interviews. I generalize the concept in my own words: " Hash table, is a data structure, which stores key-value pairs, the access of value by key can be O(1) time, a hash function is used to map the key to the index of the value."

You can find many many definitions of hash table, generally speaking, you can imagine hash table is an array, originally we access an element in array by using index, e.g. A[1], A[2]. However in hash table, we access element by the key,  e.g. A["Monday"], D["Marry"].  The great advantage of it is the speed to look up an element (O(1) time). 

How does Hash table works

Firstly, hash tables can be implemented based on many data structures, e.g. Linked list, array and linked list, binary search tree, etc. The idea is to store the <key, value> pair and build a way to access it. For better understanding, just consider an array, we put the <key, value> in a specific order. The way to locate the <key, value> using the key is called hashing. We can consider a hash function takes the key as the input, and output the location of the <key, value> in the array. A simple hash function is to used "mod" operation.  Use the "key mod array size" to get the hash, the index of the desired value. 


An example

Let's see a simple example.
We have a storage of  size 5:
idx      key       value
0         -1           0
1         -1           0
2         -1           0
3         -1           0
4         -1           0
key=-1 means the slot is empty.
The hash function is   hash(key) = key % 5;
First we insert <12, 12>  (first is key, second is value)
Compute the hash(12) = 2;
Store the <key, value> into the storage of idx 2.
idx      key       value
0         -1           0
1         -1           0
2         12          12
3         -1           0
4         -1           0
Next we insert <29,29>, hash(29)=4;
idx      key       value
0         -1           0
1         -1           0
2         12          12
3         -1           0
4         29          29
Then we insert <27,27>, where the hash code is 2. When we check the location 2, it is already in use. 
It is called a collision, where different key are mapped into same hash code. To deal with the collision, there are many methods, such as, chaining (use a linked list for each location), and rehashing (second function is used to map to another location). Usually we need to know at least these two kinds of methods.
Here we use the rehashing.  

The rehashing function is:  rehash(key) = (key+1)%5;
So, continue the above step, rehash(2) = 3; location 3 is empty, then store the <27,27> to location 3.
idx      key       value
0         -1           0
1         -1           0
2         12          12
3         27          27
4         29          29

If we further insert <32,32>, hash(32) = 2; location 2 is in use, rehash(2) = 3, location 3 is also in use,
Then rehash again, rehash(3) = 4, no available, rehash(4) = 0, OK! Store <32, 32 > in 0th slot.

idx      key       value
0         32          32
1         -1           0
2         12          12
3         27          27
4         29          29

That is the basic way of insert operation for a hash table.

To retrieve the value, e.g. we want to find the value of key <27, ?>, hash(27) = 2, check the key stored in location 2 , which is 12 !=27, then rehashing is need, rehash(2) = 3,  the key is 27, then return the value 27.

A simple implementation (in C++)

#include <iostream>


using namespace std;

const int sz = 5;

struct data{
 int id;
 int val;
};

class Hashtable{
 data dt[sz];
 int numel;
public:
 Hashtable();
 int hash(int &id);
 int rehash(int &id);
 int insert(data &d);
 int remove(data &d);
 int retrieve(int &id); 
 void output();
};


Hashtable::Hashtable(){
 for (int i=0;i<sz;i++){
   dt[i].id = -1;
dt[i].val = 0;
 }
 numel = 0;
}

int Hashtable::hash(int &id){
 return id%sz;
}

int Hashtable::rehash(int &id){
 return (id+1)%sz;
}

int Hashtable::insert(data &d){
 if (numel<sz){
   int hashid = hash(d.id);
if (hashid>=0 && hashid < sz){
 if (dt[hashid].id==-1 || dt[hashid].id==-2){
   dt[hashid].id = d.id;
   dt[hashid].val = d.val;
           numel++;
   return 0;
 }else{
   cout << "collision! rehashing..." <<endl;
   int i=0;
   while (i<sz){
     hashid = rehash(hashid);
 if (dt[hashid].id==-1 || dt[hashid].id==-2){
   dt[hashid].id = d.id;
   dt[hashid].val = d.val;
   numel++;
   return 0;
     }
 if (i==sz){return -1;}
 i++;
}
 }
}
 }else{return -1;}
}

int Hashtable::remove(data &d){
 int hashid = hash(d.id);
if (hashid>=0 && hashid < sz){
 if (dt[hashid].id==d.id){
   dt[hashid].id = -2;
   dt[hashid].val = 0;
   numel--;
   return 0;
 }else{
   int i=0;
   while (i<sz){
     hashid = rehash(hashid);
 if (dt[hashid].id==d.id){
   dt[hashid].id = -2;
   dt[hashid].val = 0;
   numel--;
   return 0;
     }
 if (i==sz){return -1;}
 i++;
}
 }
}
}

int Hashtable::retrieve(int &id){
 int hashid = hash(id);
 if (hashid>=0 && hashid < sz){
   if (dt[hashid].id==id){
 return dt[hashid].val;
}else{
  int i=0;
   while (i<sz){
     hashid = rehash(hashid);
 if (dt[hashid].id==id){
   return dt[hashid].val;
 }
 if (i==sz){return 0;}
 i++;
}
}
 }
}

void Hashtable::output(){
 cout << "idx  id  val" << endl;
 for (int i=0;i<sz;i++){
   cout << i << "    " << dt[i].id << "    " << dt[i].val << endl; 
 }
}


int main(){
 Hashtable hashtable;
 data d;
 d.id = 27;
 d.val = 27;
 hashtable.insert(d);
 hashtable.output();
 
 
 d.id = 99;
 d.val = 99;
 hashtable.insert(d);
 hashtable.output();
 
 d.id = 32;
 d.val = 32;
 hashtable.insert(d);
 hashtable.output();
 
 d.id = 77;
 d.val = 77;
 hashtable.insert(d);
 hashtable.output();
 
 //retrieve data
 int id = 77;
 int val = hashtable.retrieve(id);
 cout << endl;
 cout << "Retrieving ... " << endl;
 cout << "hashtable[" << id<< "]=" << val << endl;
 cout << endl;
 
 
 //delete element
 d.id = 32;
 d.val = 32;
 hashtable.remove(d);
 hashtable.output();
 
 d.id = 77;
 d.val = 77;
 hashtable.remove(d);
 hashtable.output();
 
     
 return 0;
}
 


No comments:

Post a Comment