Anti Code Generation March 19, 2013
I've been anti code generation ever since always. I have some good reasoning behind it. Recently, though, the company I work for has inherited code from another firm (always bad in my experience), and a lot of it was generated from an internal tool, and it made me think about why I have been against it, and made me more firm in my stance.
We actually use code generation for some projects, but we're smart / not wasteful about it.
Here's my selling point. If one becomes dependent on code generation, their data architecture can suffer grave consequences, in being inefficient, and generally not thought out. They will create any old architecture because it is of no cost to them, in terms of time. If you've been doing code generation against data structures, you have no reason to design an efficient one, reuse concepts across the entire structure, optimize for efficiency, and you might be stuck with certain data types and assumptions about past data that your code generates to, that you might even get stuck.
Some explanation. CRUD operations are generally easy to generate. You generate an insert stored procedure, an update (or combine them), a select and a delete. This is for one object in your database. So you have a customer object, with an address foreign key, now you have to get the Customer->AddressID and then do another call to get the Address data. But sometimes you don't want customer and address at the same time. So now you have two stored procedures. When I wrote an ORM, this was one thing I nipped in the bud. It would do the join and get all data for customer and address in one call, if you passed in "load references = true" to the load method.
A bit around the not-thought-out part... I take great care in designing a data architecture, and I'm just a lowly software developer! I kid, of course, the data structure is at the crux of what I do. If I have a shitty data structure, I can't work with it. One recent example was an application of sorts, and a wizard with steps 1-4. It was inherited code and data structure which we couldn't change. The main application part with the customer data didn't have a date field for created or updated date time. Those were stored in a log table. So to get the time that the application was created, you had to do this:
Select app.* from app join app_log on app.id = app_log.app_id where app_log.date = (select min(date) from app_log where app_id = app.id) order by app_log.date desc
That inner select is a killer. We'll hopefully get both a createDate and updateDate fields on the app table in the near future. This actually wasn't a product of code generation but more of an example of the bad code that we inherit.
More of the not thought out stuff, since that last one wasn't to do with code gen... Sometimes there will be repeated fields or concepts. Some people get an Address table and if it's going to be a slightly different address, add another address table with all of the fields from address, and then other fields that they needed. Hey, it's simple to just generate the new code!! But the structure is repeated. I will try my damnedest to not repeat code or a data structure. If something has an int and a string, and another thing has an int and a string, they will both inherit from a base class that has an int and a string. Goddamnit!
Copying and pasting code is worse, but we're not talking about that.
Moving on to the "you might get stuck with certain data types and assumptions" part of my thesis. By assumptions, I mean, assumptions that were made when the code gen tool was written. For instance, the ID field must be an int. -1 means a null int. A lookup type table (basically an enumeration) must have display value and an internal value (usually matching up with a code enumeration) and must have int ids. Our old code generation generated stored procedures, if a parameter was a bit field, 0 meant "don't care" where 1 meant "where this field == 1". You couldn't filter based on where that value was 0. If you wanted to say 0 was the more significant one, you would name the column in a negative way. For instance, a "Deleted" field, where 0 means not deleted. You couldn't get just not deleted records. So you would have to name the column "NotDeleted", which is crap.
I was recently looking at my old code from college, it's great :) I remember my professors and how they molded me into the programmer I am today. Then all the many many hours I spent honing my skills. I wanted REUSABLE code. I never wrote a code generator for personal use. I have modified the one we use at work to be better and more acceptable, according to my standards. I'm tired and I'm going to bed...