Don’t Repeat Yourself
December 1st, 2006[ Software Development ]
Jaimie sent me this link the other day, Don’t Repeat Yourself, saying something about hearing my voice as he read it. I have to say I like it. I live and die by it when it comes to software architecture and database design. I don’t make decisions to duplicate data lightly and will exhaust every available alternative first. To a fault some may argue.
Why? Simple, it’s painful and expensive to duplicate data.
Duplicating data is typically pretty innocuous stuff in the moment and I find it often isn’t even viewed as duplication. Say you’re youtube. Sorry, say you’re google, and you keep a table of data containing a row for every video viewing. It contains some simple data, let’s say video id, date of viewing, user id who viewed, etc. If you need to calculate how many times a particular video’s been viewed then you can perform some type of simple select:
select count(*) from VideoViewHistory where ixVideo = ‘1’
A very common form of data duplication would be to have this table as well as having a column in the Video table called NumberOfViewings. Each time the video’s viewed, we insert a row into VideoViewHistory as well as incrementing the number in Video.NumberOfViewings.
This approach is very common, bordering on a standard. It’s considered an ‘optimization’ as you’re saving a trip to the database and what could be a relatively large select statement being run very often.
I agree that it also may be the best design and it makes good sense in a lot of cases, however, what is often ignored during the design process is that this is an expensive and less maintainable approach. Trust me, any form of duplication increases your chances to introduce bugs and makes the code base more expensive to maintain.
All of this must be factored into any decision to duplicate data which is what this approach is doing. You have to write more code to ensure the multiple data sources are in sync. Guaranteed you will at some point have to deal with edge cases when the two sources of data get out of sync.
Developers often winch that the biz guys don’t understand technology. Well geek boys, it works the other way too. If you’re the one paying the bill to maintain this code base five years from now, do you care more about some extra database trips or the brittleness of your code base? Brittleness translates directly into dollars spent. There’s far more to designing good software than database trips and performance…..I think……