Ruby `#hash` implementations

I tested out a handful of different #hash implementations to see how they perform.

Setup

Given a Point class:

class Point
  attr_reader :x, :y, :z

  def initialize(x, y, z)
    @x = x
    @y = y
    @z = z
  end

… and four different #hash implementations.

XOR:

  def hash
    self.class.hash ^ @x.hash ^ @y.hash ^ @z.hash
  end

Prime XOR:

  def hash
    self.class.hash ^ 3 * @x.hash ^ 5 * @y.hash ^ 7 * @z.hash
  end

High prime XOR:

  def hash
    self.class.hash ^ 53 * @x.hash ^ 59 * @y.hash ^ 61 * @z.hash
  end

Array:

  def hash
    [self.class, @x, @y, @z].hash
  end

For each point type, I created 100 000 points, with coordinates 0 ≤ c < 100:

MAX = 100
TIMES = 100_000

points_with_xor_hash = TIMES.times.map do
  PointWithXORHash.new(rand(MAX), rand(MAX), rand(MAX))
end
points_with_prime_xor_hash = TIMES.times.map do
  PointWithPrimeXORHash.new(rand(MAX), rand(MAX), rand(MAX))
end
points_with_high_prime_xor_hash = TIMES.times.map do
  PointWithHighPrimeXORHash.new(rand(MAX), rand(MAX), rand(MAX))
end
points_with_array_hash = TIMES.times.map do
  PointWithArrayHash.new(rand(MAX), rand(MAX), rand(MAX))
end

Test: speed

I then benchmarked how fast #hash is:

           xor 34.200 (± 2.9%) i/s - 342.000  in  10.003618s
     prime_xor 17.855 (± 0.0%) i/s - 179.000  in  10.027120s
high_prime_xor 16.992 (± 0.0%) i/s - 170.000  in  10.010219s
         array 36.327 (± 0.0%) i/s - 366.000  in  10.076506s

Comparison:
               array:       36.3 i/s
                 xor:       34.2 i/s - 1.06x  (± 0.00) slower
           prime_xor:       17.9 i/s - 2.03x  (± 0.00) slower
      high_prime_xor:       17.0 i/s - 2.14x  (± 0.00) slower

Winners: array and xor.

Test: collisions

For each collection of points, I counted how often each hash code occurs:

          name   range    mean

           xor      43    1.37
     prime_xor       3    1.05
high_prime_xor       3    1.05
         array       3    1.05

Winners: all but xor.

Test: DX

That sweet developer experience. Arguably, hash is easiest to implement with Array: the implementation [self.class, @x, @y, @z].hash is so nicely compact.

Conclusion

It seems that using Array#hash for constructing hashes is both very fast (even slightly faster than XOR), and creates well-distributed hash codes. it’s also the most compact.

Note last edited April 2022.
***