Datastructures || Strings and Hash tables

Strings

A string is a datatype that behaves like a data structure in the sense that it can be manipulated and has time and space complexity implications.

Strings are stored in memory as arrays of integers where each character in a given string is mapped to an integer via some character encoding standard like ASCII.

Strings are generally immutable which means they can not be changed after they are created. When you are doing any operation that changes the string, it is making a copy of it and manipulate it which at times can be costly.

Traverse 0(n) T 0(1) S

Copying 0(n) ST

Get a particular character 0(1) ST

Hash tables

Probably one of the most widely used data structures. In Javascript, it is Javascript objects and in python, it is python dictionary. Under the hood, the hash table uses a dynamic array of linked lists to efficiently store key/value pairs. It provides fast insertion, deletion, and lookup of key/value pairs. Since hash tables store data as key-value pairs, one can access the value given the key.

The good thing about hash tables is that insertion, deletion and searching time are constant on average and keys can be an integer, string or another datatype. Hash tables are built on top of arrays and while arrays use an index which is a number to access elements, the hash table’s key can be a number, string or another datatype.

Photo by Jorge Stolfi

When inserting a key-value pair inside the hash table; you use a hash function to transform the key into an index that fits in an underlying array.

A hash function maps a value to a particular index of the underlying array. One of the ways to map the values is by mapping the key to the corresponding ASCII code then returns the modulo operator to the length of the array.

Let us take an example of the key-value pair “foo”:12. The ASCII character mapping for “foo” is 324. Now imagine we have an array of length 4. 324%4 = 0 since the remainder of 324 divided by 4 is 0. Now we have to add “foo”:12 key-value pairs to the index 0 of the array.

There might be situations where different keys map to the same index. For example, If we have another key-value pair “ofo”:47; it would map to the same index and the situation where two keys are mapping to the same index is referred to as a collision. It turns out that a hash table is not just an array but an array where each index maps to a linked list of potential values(each index contains a linked list). This is precisely to deal with situations where two keys are hashed to the same index.

Now we understand that the two key-value pairs are stored on the same array’s index but in a form of a linked list. So, what value comes up when you search using the key? To ensure that we know which value relates to which key, we also store the key in a linked list.

Imagine we have an array of length three but all and we have 10 key-value pairs and all are being mapped to the index 0, this is where the effectiveness of hashing functions come in. The best hashing function minimizes the number of collisions to ensure that we do have constant time for insertion, deletion and lookup of key-value pairs.

There could be other scenarios where you have a performant hash function but the size of the array is small. For example, the scenario where you have an array of length 3 and 300 key-value pairs. If you were to divide them equally, one index would have a linked list of 100 values. In this case, It is important to resize the array and using a dynamic array makes resizing easier.

Time complexities of the common hash table’s operations

Inserting a key/value pair: 0(1) on average; 0(n) in the worst case

Removing a key/value pair: 0(1) on average; 0(n) in the worst case

Looking up a key: 0(1) on average; 0(n) in the worst case