Skip to main content

They let anyone commit to production these days

Today, more than ever so it seems, people outside the software engineering field partake in writing code that eventually makes it to production environments. An increasingly great number of companies have data analysts or researcher personnel write code as part of their job.

I find this phenomenon rather interesting as I do not believe this was the case as far as only a decade ago. Back in the day, doctors practiced medicine, chefs practiced cooking and researchers practiced, well, research. It was the software engineers who were concerned with how to get computers do what was needed to be done using code.
Today on the other hand, while it’s still quite difficult to find a brain surgeon who writes code, it is quite easy to find a computer vision researcher coding for a software company, or a statistician slamming hadoop jobs (the reason for this shift is interesting in its own right, but as design documents often state – this is out of scope).

At first sight more people writing code could imply a great productivity boost. The thing is, software engineering is a profession (I'd even risk saying art). There are things to know in order to be able to make good software - technologies, frameworks, design patters, architectural considerations, coding conventions, tools, and whatnot. In fact, it is only natural for people whose main occupation is not software, to NOT be aware of most of the above mentioned. What’s unnatural, at least to me, is for an organization to ignore this issue and let such code be committed to a production codebase as if it stood up to the standards of genuine production code.

The cost WILL be payed, it's just a matter of time. Someday one will take a look at that module and realize it has no documentation or comments, its methods span hundreds of lines, the logs are written to proprietary files waiting to blow the disk up, there's zero modularity, zero unit tests, but plenty of code duplication (because that part that does matrix multiplication is used in several places, obviously). What’s to be done now, one would ask himself, I can’t just re-write the whole thing. Right, I guess that’s how a legacy (system) is born.
As fate would have it, the bug that will trigger this chain of events will probably be one of those must-fix-by-yesterday bugs that just must be resolved yesterday, or ASAP, whichever comes first.

As products mature, maintenance becomes a considerable part of their agenda, and that's when the quality of the codebase kicks in and becomes a key factor. What was once a boost has now become a major drag. 

A question to be asked here is how can we bridge the gap between the code quality required in production and the code quality produced by research oriented specialists (who are not software engineers) in our organization? 
  • Should everyone who’s to engage coding get training? What training would that be? Are people expected to be proficient both in their main occupation (say statistics) and software engineering? Isn't that like saying they need to have two professions (does this mean they’ll get two paychecks?) 
  • Should that code be re-written from scratch by software engineers? That could easily double the efforts...
  • Should such code be written using pair programming where one is a software engineer and the other is a domain professional? 
Makes one wonder… 


  1. What about software engineers themselves? I imagine those you are talking about are not writing worse code - at least not much worse - then most inexperienced CS B.Sc. graduates.

    Perhaps software engineers have more opportunities to learn from more experienced (or better educated) ones. But at the end, those that are willing and open to recognize their bad habits and change them, will get better. Those who are not - will still make us pull our hair when we need to deal with their code.

    1. I totally agree, the drive to improve is definitely key.

      The thing is, people whose main occupation is not software engineering may not have as strong of a drive for improving their software engineering skills since they do not see it as their "thing", it's just a by product of what they do.

      The ones that do have this drive will probably end up being a true asset, as they will eventually become experts in both their native field (say, statistics) and software engineering, ka-ching.


Post a Comment

Popular posts from this blog

Sending out Storm metrics

There are a few posts talking about Storm's metrics mechanism, among which you can find Michael Noll's postJason Trost's post and the storm-metrics-statsd github project, and last but not least (or is it?)  Storm's documentation.

While all of the above provide a decent amount of information, and one is definitely encouraged to read them all before proceeding, it feels like in order to get the full picture one needs to combine them all, and even then a few bits and pieces are left missing. It is these missing bits I'll be rambling about in this post.

Dependency Injection - The good, the bad and the ugly

The Good
Dependency injection (DI, a.k.a IoC - inversion of control) is a well known technique to increase software modularity by reducing coupling between modules. To provide the benefits of DI, numerous DI frameworks have arisen (Spring, Guice, Castle Windsor, etc.) all of which essentially give you "DI capabilities" right out of the box (these frameworks tend to provide a whole lot more than just "DI capabilities", but that's not really relevant to the point I'm about to make). Now, to remove the quotes around "DI capabilities", let's define it as a DI container - a sack of objects you can manipulate using a provided API in order to wire these objects together into an object graph that makes up your application.

I've worked on quite a few projects employing Spring, so it will be my framework of reference throughout the rest of the post, but the principles and morals apply just the same.