At Ruby Hack Night in February, we completed a series of Koans. A workshop participant had the following comments about the Koans dealing with Hash default values.

“From what I understand, when a default value (such as [] or Array.new) is passed as argument to the Hash#new method, the hash is created with the empty array as default value. But when we reference the hash with a key (:a, :b, etc) and we try to push an element, all we ended up modifying is the default value, the array itself…. Am I right about this?”
– Vincent L.

Consider this simple use of a Hash default value, and how it behaves in an expected way:
% irb
irb(main):001:0> h = Hash.new(2)
=> {}
irb(main):002:0> h[:a]
=> 2
irb(main):003:0> h[:a] += 1
=> 3
irb(main):004:0> h[:b]
=> 2

Now here is some similar code, that demonstrates an unexpected behaviour:

% irb
irb(main):001:0> h = Hash.new([])
=> {}
irb(main):002:0> h[:a]
=> []
irb(main):003:0> h[:a] << "x"
=> ["x"]
irb(main):004:0> h[:b]
=> ["x"]

By that example, it appears that there is some problem with the Hash class, perhaps it’s getting confused about the values? It’s easy to blame the Hash class for the unexpected behaviour, but once you consider the following working example, you might see what’s going on:
% irb
irb(main):001:0> h = Hash.new([])
=> {}
irb(main):002:0> h[:a]
=> []
irb(main):003:0> h[:a] += ["x"]
=> ["x"]
irb(main):004:0> h[:b]
=> []

It works! Why in this last example was the default value unchanged, but in the earlier example it was? How random and bizarre this seemingly is!

Do you see what’s going on?

This behaviour is quite predictable, but to understand it you need to know that it’s caused by a collaboration of two things:
1. The hash default, whether set by the #new or #default= methods, is shared between each of the keys that use it. When I say “shared between each of the keys”, I mean to say that there is only one default value that is referenced or “pointed to” by any keys that use that default value.
2. Operations generally fall into two types: those that replace values with new instances, and those that modify a value “in-place”. We see this bizarre Hash behaviour only when operations of the latter type are used to modify values. Put another way, line “003” of each of these irb examples exercises a slightly different piece of code. In the first and the third examples, the contents of h[:a] are replaced with the result of a #+ operation through assignment. In the second example, the contents of h[:a] are modified in-place.

The collaboration between these two details causes the seemingly unpredictable outcome. The strange behaviour, where the default value is changed, only happens if both: the hash uses a shared default value, and the operation modifies in-place (rather than replacing) the value of a key that is referring to the default.

Note that it is possible to force the creation of a distinct instance of the default value for each new key. Instead of an object, the Hash constructor also accepts a block. This block will be executed each time a default hash value is required, and thus avoid the surprising effects of in-place modifications, because each default value will be a separate instance.

An example:
% irb
irb(main):001:0> h = Hash.new { |h,k| h[k] = [] }
=> {}
irb(main):002:0> h[:a]
=> []
irb(main):003:0> h[:a] << "x"
=> ["x"]
irb(main):004:0> h[:b]
=> []

How should we deal with this? Some developers will err on the side of caution and always use the block constructor, and I believe this is a reasonable compromise but only if it is probable that an unaware developer will get tripped up by this in later edits to the same file. On the flip-side, with a shared default value there is a small run-time benefit in size and speed. This benefit will be largest if the Hash has a large number of keys with default values. Plus, since there are a limited number of operations that modify in-place, you may decide to optimize your code by taking advantage of the single instance of the default value!

In summary, Vincent is right about the Array#push operation (i.e. the second example) modifying both the default value and, since the default value is a reference, the value of any key that uses the default value. Many devs at all levels have been burned by this. It is one of the few times in Ruby when we are entirely aware of the use of references to objects – and it’s not something we are particularly prepared to deal with because Ruby works intuitively in most contexts.

I would advocate that this is not a bug, but a feature to be exploited when the time is right – perhaps one of the rougher edges of the default Ruby classes, but for those of us who consider ourselves “Ruby gurus”, our clarity of understanding of this is another little trophy on our mental bookshelf of geekdom. Assess the risk of this within your team and choose appropriately. Better yet, get your team to read this and thrive on the way the Hash default values are intended to work.

[Click for more articles from our Development series]
[Click for more articles from our Workshop series]
[Click to learn more about Ryatta]

David Andrews is a Canadian web developer and President of Ryatta Group. David founded Ryatta Group to build hospitality web applications, releasing SpaDirect (renamed ROBE for Spa), a successful and innovative real-time spa booking platform in 2012. Since that time, he and his team have built market share through a series of innovations, including the groundbreaking Itinerary Booking™ system, which improves online spa revenue by as much as 50% over the competition.