Hashes and encapsulation
Earlier today I made an off-the-cuff remark about encapsulation on twitter.
Then Josh, not realising that I was simply casting judgement down from my ivory tower, asked for more than 140 characters worth of explanation:
So, I agreed. Here is that explanation.
Suppose you have an object representing a DOM element. DOM elements have attributes, which you store as a hash. Something like this:
class DOMElement
attr_reader :attributes
def initialize
# ...
@attributes = Hash[parse_attributes.map { |name, value|
[name, DOMAttribute.new(name, value)]
}]
end
end
In your code that uses DOMElement
, you access an attribute like this:
element.attributes['margin-left']
If you try to access an attribute that does not exist, nil
will be
returned.
Later on, you realise that you are checking for nil
in lots of places
where you use attributes. Listening to the code smell, you decide that
missing attributes should instead return a null
object.
What do you do? Well, your only option is to add a default
proc
to the Hash
:
@attributes = Hash[parse_attributes.map { |name, value|
[name, DOMAttribute.new(name, value)]
}]
@attributes.default_proc = proc { NullDOMAttributes.new }
You have now made your object impossible to marshal without using insane hackery, since procs cannot be marshalled.
The more important point, though, is that by allowing users to dip into
the @attributes
hash like this, you are failing to encapsulate the
internal implementation of your DOMElement
object.
This presents another problem: an unrelated dodgy bit of code could
remove an attribute from the element, without your DOMElement
object
realising!
el.attributes.delete('margin-left')
If you pass around the same hash between lots of different objects, that hash is quite likely to get mutated at some point in my experience, and when that happens it will affect all the other objects relying on it.
Yes, you could call @attributes.dup
every time the attributes
method
is called, but that’s hardly very efficient when we only want to get at
a single attribute.
If the code had been written in an encapsulated manner to begin with:
class DOMElement
def initialize
# ...
@attributes = Hash[parse_attributes.map { |name, value|
[name, DOMAttribute.new(name, value)]
}]
end
def attributes
@attributes.dup
end
def attribute(name)
@attributes[name]
end
end
It would be straightforward to add the null object without further consequences:
def attribute(name)
@attributes.fetch(name) { NullDOMElement.new }
end
So, exposing internal hashes can be convenient, but you should always think twice before doing so and realise that you might regret the decision at a later date.
Comments
I'd love to hear from you here instead of on corporate social media platforms! You can also contact me privately.
Celestial M Weasel
Yesbutnobut, you could always define [] and []= for whatever class you made attributes an instance of.
Jon Leighton
Yesbutnobut, that would still be a demeter violation :)
Jon Leighton
The other point is that you may want #attributes to return a 'plain old hash' when called - and only want the null object when the user has specifically request a specific attribute.
Celestial M Weasel
Yes, indeed, but not encapsulation.
Fun fact - one of the 'advantages' (for some value of advantage) of FORTRAN over C was that array reference and function call both have syntax BADGER(I,J), for getting values, so under certain (rare) circumstances you can change from having an array to having a function without having to change code.
Celestial M Weasel
You could do that when overloading [] - after all you could always fabricate the 'plain old hash' from whatever you were storing.
Celestial M Weasel
I am agnostic about Demeter anyway. We have a grids (as in spreadsheet type grids) library which is badger.duck().hamster().wombat() up to the hilt. I find things like this aesthetically unpleasing but am not sure that the disadvantages outweigh the advantages. I will have to look at the code and see what I think.
Lucas Hungaro
It's not a "one size fits all" solution, but I like to use OpenStruct for value objects (while most people use hashes). A nice and underused Ruby class. :)
iande
Favorite line: "Listening to the code smell ..."
Alex Kahn
How so? [] would just be a different name for the method you called attribute.
Jon Leighton
You're now doing two method calls. You're doing element.attributes.[]('margin-left'), rather than element.attribute('margin-left').
Alex Kahn
I misunderstood. I thought the commenter was proposing DOMElement#[], so you would access attribute values like: el['href'].
Bill Dueber
I don't understand this. When attributes is a Hash, you're calling Hash#[]. When attributes is another class that defines the method [], you're just calling MyOtherClass#[]. One method call in each case.
Jon Leighton
Yes, but I am comparing the case of when you want to get a *single* attribute from *element*.
element.attributes[foo] results in two chained method calls. There's DOMElement#attributes, and then Hash#[] foo.
element.attribute(foo) results in one method call, to DOMElement#attribute.
(Note that I am talking about external calls, of course DOMElement#attribute contains a method call to Hash#[], but that's an implementation detail and not the concern of the user of DOMElement.)
Avdi Grimm wrote some good articles on Demeter:
* http://avdi.org/devblog/201...
* http://avdi.org/devblog/201...
Anthony Navarre
The practical disadvantage of that kind of chaining (besides being difficult to write automated tests for it) is that when something breaks in one of those systems, it can become excruciating to track it down.
Avdi Grimm
Indeed. There's a larger point here, which is that having objects expose their aggregations to the world directly--any type of aggregation--breaks encapsulation. If you're going to treat objects as objects rather than as glorified C structs, they need to be empowered to manage their own collections and control access to them.
Timon Vonk
Nice article! In cases like these I usually prefer dynamic methods on the attributes (e.g. attributes.keys.each &define_method), given the implementation is usually common. I guess in most cases attributes are part of the public interface anyway, so that's ok*.
* As long as it's readable -.-
Luke Redpath
What's wrong with defining #[] on DOMElement, so attributes can be fetched thus: el["some-attribute"]. Seems reasonable to me. I'm pretty sure this is what Alex meant.
Jon Leighton
Ok, in which case it's fine; it doesn't matter if you name the method #attribute or #[].
Samuel Cochran
In this (increasingly contrived) example you could presume that a DOMElement exists mainly to manipulate attributes, and add #[], #[]= and #delete directly to DOMElement to enable encapsulated access to the attributes.
The better approach here would be to create a DOMAttributes object (lazily constructed by #attributes) which encapsulates this for you and mutates DOMElement when necessary. Mix in as much hash behaviour as makes sense, or add a #to_hash to get a copy of the attributes as a hash if neccessary.
All this being the programmer answer of jumping to solutions, you're right that this is a problem that occurs far too often mainly out of ignorance.
Alexey Muranov
I am disappointed that `nil` does not qualify as a null object. But i do not like either checking the presence of a key by the "truth" of the return value, i prefer `has_key?` or at least `nil?` on the value.
skrat
Very nice! One note though. I wasn't familiar with null-object approach before, and after reading the linked article, it doesn't like such a good idea. In your case, hiding the fact that the attribute wasn't found, and pushing this fuzziness down the stack, can, and will cause issues in unexpected places. Being a polyglot, the common approach, the language supported approach would be to raise KeyError.
Emili Parreño
As mentioned above, why don't you use an OpenStruct object?
Add your comment