Readability Counts

General Guidelines

Follow the Rules

As with any spoken languages, programming languages have a syntax. Those rules are necessary for 2, or more, people to understand each other. In our case, it allows the computer to understand what needs to happen, but it is also vital for other team members/programmers to understand your logic and to help improve the codebase.

Some aspects of syntax are mandatory, such as punctuators (e.g. the semi-colon in C++), or not starting a variable name by a digit in Python. When your code doesn’t comply with those rules, it does not compile/run. On the other hand, some rules are about best practice and conventions, like naming classes in camelCase whilst a snake_case format should be used for naming functions and variables in Python. Not following these guidelines will not break your code, but it will probably damage any trust your colleagues might have in your abilities.

Altogether these rules and conventions standardise the appearance of code. It brings consistency and improves readability. The codebase will evolve a lot faster if other programmers can understand it. Particularly when the time they spend on your portion of code is used for optimisation purposes, and not trying to figure out where that variable was initiated.

Complying to syntax is somewhat easy these days, as all the IDE have linting capabilities (with or without plug-ins).

Structure the Code

Be it constants, variables, or even imports, structuring the code substantially improves readability. It goes a long way to group them, away from the functional part of the code. You might only use GridSearchCV() on the validation steps, but instead of importing the module at the beginning of the validation chapter. Instead, put from sklearn.model_selection import GridSearchCV early in the file/notebook. It will already give the code reader/reviewer an idea of what is coming.

Also, remember white-spaces are an excellent way to give air to your code. (good old PEP-8 has some guidelines about these as well)

Do Not Repeat Yourself

Not only is it annoying to read the same thing over and over again, but the least character your code has, the least prone to errors/bugs it will be. This DRY principle might mean the code will not be as readable as it would if you were to have a step-by-step routine. In this case, do not hesitate to comment.

Comment the Code

You are Not Expected to Understand This (line 2238 of the Unix v6 source code, John Lions)

Docstrings for functions/classes

Not only should it explain what to expect, but it should also detail how to use the function/class. In most IDE, the docstrings of functions/classes will be available as soon as they are created, so you will also benefit from adding these details early on.

In-line comments

These are great to put context or give structure to the code. But do not be too talkative. There is no need to add visual pollution by stating the obvious.

Name the Variables Explicitly

“There are only two hard things in Computer Science: cache invalidation and naming things” (Phil Karlton)

You can name your variable x, my_var, or username, it is all the same for the interpreter. But do everyone a favour and de meaningful and explicit when you name a variable. Between local and global variables, the various states of the variable, etc. it is easy to lose track when reviewing the codebase. Good variable names will add a lot of value to your code.

Pythonic Ways

Packing/Unpacking

You can assign multiple variables using this technique. It also has several advantages such as: keeping variables in a synchronised state, or not requiring temporary variables in some calculations:

Comprehensions

As Raymond Hettinger explains in this video, the following for procedure explains step-by-step what needs to be done, whilst the list comprehension explains what you want to achieve.

Keyword Arguments

Explicit is better than implicit. (The Zen of Python, Tim Peters)

A lot of functions/classes take arguments and keywords arguments (or kwargs). You might be familiar with a function, but your colleagues might not:

Named Tuples

They are particularly handy for arguments handling or error issuing.

Last thoughts

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

Useful Links

Data Scientist

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store