less_retarded_wiki/hash.md
Miloslav Ciz 61d3471ce6 Add hash
2022-03-16 17:34:02 +01:00

2.6 KiB

Hash

Hash is a number computed by a hash function, a function that takes some data and turns it into a number (the hash) that's much smaller than the data itself, has a fixed size (number of bits) and which has additional properties such as being completely different from hash values computed from very similar data. Thanks to these properties hashes have a very wide use in computer science -- they are often used to quickly compare whether two pieces of non-small data, such as documents, are the same, they are used in indexing structures such as hash tables which allow for quick search of data, and they find a great use in cryptocurrencies and security, e.g. for digital signatures. Hashing is extremely important and as a programmer you won't be able to avoid encountering hashes somewhere in the wild.

It is generally given that a hash (or hash function) should satisfy the following criteria:

  • Have fixed size (given in bits), even for data that's potentially of variable size (e.g. text strings).
  • Be fast to compute. This is mostly important for non-security uses, cryptographic hashes may prioritize other properties to guarantee the hash safety.
  • Have uniform mapping. That is if we hash a lot of different data the hashes we get should be uniformly spread over the space of the hashes, i.e. NOT be centered around some number. This is in order for hash tables to be balanced, and it's also required in security (non-uniform hashes can be easier to reverse).
  • Behave in a chaotic manner, i.e. hashes of similar data should be completely different. This is similar to the point above; a hash should kind of appear as a "random" number associated to the data (but of course, the hash of the same data has to always be the same when computed repeatedly, i.e. be deterministic). So if you change just one bit in the hashed data, you should get a completely different hash from it.
  • Minimize collision, i.e. the probability of two different values giving the same hash. Mathematically collisions are always possible if we're mapping a big space onto a smaller one, but we should try to reduce collisions that happen in practice. This property should follow from the principle of uniformity and chaotic behavior mentioned above.
  • Be difficult to reverse (mainly for security related hashes). Lots of times this comes naturally from the fact that a hash maps a big space onto a smaller space (i.e. it is a non-injective function). Hashes can typically be reversed only by brute force.

TODO: example, hash tables, uses