What Is A Hash Table?
Hash Tables are considered a type of container in the field of software engineering that searches efficiently. This means that when you display data, if you need to search all the time, it makes sense to use a Hash table as your container of choice since it makes search very fast and will have your users happy about not having to wait seconds for a response.
How Does A Hash Table Work?
We won’t go into great detail here as there are plenty of literature that can explain in depth and you’ll potentially want to have coffee ready since understanding the underlying mechanisms of hash tables can be very involved. So as a quick overview, there is some container (most likely an array for fast random access) with a limited number of slots. The data you want to search for is hashed to some slot number, and then through a logical sequence of steps, that slot is searched for the data which may or may not be found. This sequence of steps adds much complication to the hash table data structure, which forces it to deal with collisions. A collision in a hash table is when more than one data hashes to the same slot. There are a variety of known ways to deal with these collisions including chaining with linked lists, linear probing, quadratic probing, and double hashing. That’s how hash tables work in a nutshell so please comment below if you have additional questions.
Hash Table Use Cases
Now let’s move on to the use cases, which produce some quite fascinating results. So let’s start with Google Search! Imagine that, search is in the name so it makes sense that they would use a hash table as their main structure to provide lightning fast results back to the user. Who knows how many websites there are today, but Google has stated in the past that it knows about 130 trillion web pages. Wow, that’s a huge number, yet your results come back to you in mere seconds. Hey Google! Send me some web pages about a photographer that can take photos at my birthday party. Google response: Sure, give me 0.5 seconds to return your results. Nothing short of amazing!
So how does Google do it? Well the hash table of course. This is where Search Engine Optimization (SEO) comes into play and you may have been referred to this article for a more in-depth look from the service we offer. As you know by now, search engines crawl the web and figure out what information is contained on the webpages it crawls. They then add this information to the index which could take weeks. Without your information in Google’s index, it will never show up in search engines. So what does the index sound like and why is it that even if they crawl your page, if it hasn’t been added to the index your data won’t show up? Good guess, it’s the hash table. The container chosen to ensure fast lookup times. Until your page gets added to the index, it is not in the hash table to be searched. That way when a user types in “event photographer”, Google hashes the text, goes to the slot and sequentially looks for pages that may match the search term. Now if you know how hash tables work, you know that you never search the entire table, just the hashed slot index and its possible matches through a series of well defined steps. Also note with a hash table, getting data in sorted order is not efficient, although you can certainly sort after you have all the results (ranking). So does Google go through all 170 trillion plus pages in order to return your results back to you? Of course not, it goes to the index and searches pages that match the hash of what you typed in. Now it should make sense why you want text on a page. Users don’t come to your website and search for DSC4302.jpg image. They might, however, search for bodybuilder where your alt text will come in very handy!
Let’s look at another use case in the field of software engineering. We will look at the version control system (VCS), GIT. GIT allows multiple developers to work on the same project flawlessly. To do this, GIT had to look at the the problem a tad bit differently. Knowing that search is done a lot when checking file versions and differences in files. Note in GIT you can change a file multiple times but if the file ends up being the same as when it was opened, git does not see any difference. The reason should be obvious at this point. The text hashes to the same bucket regardless of all the intermediate changes so it is the same according to GIT. This small change in perspective of looking at the problem gives GIT all types of power over other version control systems, not to mention intense speed and remote friendly. Also note that with GIT you must add your files to the index before they can be committed. Once again, seems like index is becoming a standard for hash table as it sounds nicer 🙂 .
As you can see from the two examples above, hash tables have immense power when it comes to translating theory to real world application. Have you seen how a hash table might improve your project? Would implementing the container as a hash table cause a good shift in perspective of how to solve the problem? How have you experienced hash tables? Let me know below.
If you are in need of a overhaul to your website or making your website run more efficiently through a better architected solution, please get a website quote from us today.
Hi, thanks for presenting us with use cases. Seems pretty interesting.
Commenter avatars come from Gravatar.