Express names in code: Bad vs Clean

Beginner programmers always spend a lot of time on learning a programming language, code syntax, technology and tools. They think, if  they master the craft of technologies, they will become good programmers. However, object programming is not about mastering the tools, it is about creating a solution to a problem in a particular domain and to do it in cooperation with other programmers. Therefore, it is very important to express your thoughts in code precisely and in a result to be understood by other people.

Let’s start with a great quote by the Clean Code guru, Robert C. Martin:

“The proper use of comments is to compensate for our failure to express our self in code”

[„Clean Code: A Handbook of Agile Software Craftsmanship” Robert C. Martin]

This phrase simply means that if there is a need to comment your code, it is most probable that your code is bad. Also, it might indicate a failure when you cannot express all your thoughts about a problem or an algorithm within code without comments. And finally, that means you expressed a part of a concept in comment not in the commented code. Good code should be understandable to everyone without reading any comments. Good coding style is about storing all necessary information to understand the problem in code.

In programming theory, there is a concept of “self-describing source code”. It is a common descriptor for source code that follows certain loosely defined conventions for naming and structure. The main objective for self-describing is to make source code easier to read and understand. Therefore, it is easier to maintain or extend existing code.

Within the scope of this article I would like to present some examples of “bad code” compared with “clean code”.

Names have to reveal your intentions

Thinking about names is always a problem when writing code. Some programmers are trying to simplify, shorten or encode names in they-only-know way. Let’s have a look at a few examples:

BAD CODE:


int d;
// elapsed time in days
int ds;
int dsm;
int faid;

Name “d” could mean anything. The author used comment to reveal his intentions, instead of including it in code. Name “faid” could be mistaken for identity (ID).

CLEAN CODE:

int elapsedTimeInDays;
int daysSinceCreation;
int daysSinceModification;
int fileAgeInDays;

Avoid Disinformation

No information at all is better than misleading information. Sometimes programmers try to “hide” some important information, however they tend also to create confusing parts of code sometimes.

BAD CODE:

Customer[] customerList;
Table theTable;

Variable “customerList” is not actually a list. It is a normal array (or just a collection of customers). In the second case, “theTable” is an object with a type “Table” (which you can easily check when using IDE), and the word “the” is just an unnecessary noise.

CLEAN CODE:

Customer[] customers;
Table customers;

Good names length

In modern programming languages, long variable names are not a problem. You can create names almost without any limitations. Nevertheless the problem is that this could introduce a naming chaos in code.

BAD CODE:

var theCustomersListWithAllCustomersIncludedWithoutFilter;
var list;

A good name contains as many words as are needed to express a concept. But nothing more. Any unnecessary words make the name longer and harder to understand. Short names are good only when they describe the whole concept in the current context (it is better to say “customersInOrder” than “list” in a context of making an order).

CLEAN CODE:

var allCustomers;
var customersInOrder;

Always code in one notation, let notation help you understand the code

Any programming technology (language) has its own “style”, called notation. A programmer should create code that matches this notation, because other programmers probably know it and use it. Let’s have a look at a bad example of code without proper notation. The code below does not fit any “standard” well-known notation (like PascalCase, camelCase, Hungarian Notation). Moreover there is a meaningless name for bool (“change”). This is a verb (describes action), but the bool value in this case describes a state, so it is better to use an adjective-like form there.

BAD CODE:

const int maxcount = 1
bool change = true
public interface Repository
private string NAME
public class personaddress
void getallorders()

When you look at a part of code, you should know straight away what kind of element in object programming it is, just because of the notation.

For example: you see “_name” and you already know that it is a private variable in the current class. You have to always use notation, without any exception.

CLEAN CODE:

const int MAXCOUNT = 1
bool isChanged = true
public interface IRepository
private string _name
public class PersonAddress
void GetAllOrders()

Use one word per one concept. Don’t mix multiple concepts per one word

Defining concepts is always a problem. In software development process a lot of time is spend on analysis of a domain and proper naming of all elements. Hence, programmers also have difficulties with concepts.

BAD CODE:

//1.
void LoadSingleData()
void FetchDataFiltered()
Void GetAllData()
//2.
void SetDataToView();
void SetObjectValue(int value)

First case:

The author of the code tried to express a concept “get the data”, using multiple words “load”, “fetch”, “get”. Only one word per concept should be used in code (in a particular domain).

Second case:

A word “set” is used for 2 concepts: the first is “data loading to view”, and the second is “setting a value of object”. These concepts are not the same, so you should use different words for each one.

CLEAN CODE:

//1.
void GetSingleData()
void GetDataFiltered()
Void GetAllData()
//2.
void LoadDataToView();
void SetObjectValue(int value)

Use meaningful names in domain context

All code that programmers write is connected to some domain logic. To make code more understandable to anyone involved in solving a problem, it is better to use meaningful names in a domain context.

BAD CODE:

public class EntitiesRelation
{
Entity o1;
Entity o2;
}

When you are coding domain-specific solution, you should always use domain-specific names. In the future somebody else (not only a programmer, maybe a tester) will use your code and will be able to easily understand it in a domain context (with business logic knowledge). You should think first about the domain problem, later about how the solution is going to be implemented.

CLEAN CODE:

public class ProductWithCategory
{
Entity product;
Entity category;
}

Use meaningful names in their self context

Apart from the element name in code, there is always some context that the name is used within. The context is very important to understand a name, because it has additional information. Let’s have a look at a typical “address” context:

BAD CODE:

string addressCity;
string addressHomeNumber;
string addressPostCode;

In almost all domains, a phrase “Post Code” is a part of address and it is obvious that post code cannot exists alone (unless you are developing a post code only application). So, it is unnecessary to add “address” as a part of the name. Moreover, all the information that is connected with a variable is contained in: a variable name, a class that includes this variable and a namespace that includes the class.

In object-oriented programming the best way is to design a class that represents an entity “Address”.

CLEAN CODE:

class Address
{
string city;
string homeNumber;
string postCode;
}

Summary

To sum up, as a programmer you should:

  • Always try to express a concept when naming any element;
  • Think about the length of names, they should contain only information necessary to understand intentions;
  • Notation helps to understand the code, so use it;
  • Do not mix names for concepts;
  • Let the names be meaningful in domain context and their self context.

Feel free to share your thoughts about this issue. If you know any other problems with naming or expressing thoughts in code that developers have, please let me know in comments below 🙂

37 comments

  1. Great article. One thing I find very helpful, is using variable names that correctly describe data as it is changing state. Its sometimes too easy to re-use a variable as you make changes to it, but it can be much clearer to just create a new variable. For example: instead of always calling your data ‘data’, you can call it receivedData, verifiedData, spellCheckedString, preparsedInput, etc. There’s usually something more descriptive than str, or temp. (I am very guilty of violating this 🙂

    It actually probably falls under your category “Names have to reveal your intentions.”

    1. Really appreciate your comment with examples 🙂 It is an old habit for programmers to add such words to names in code. I think that is because a few years ago development tools (IDEs) weren’t as powerful as nowadays. In the past developers very often shared their code by email, on discussion groups (without syntax colouring, etc.). Now nobody reads code in simple “notepad” but always opens it in a specialized developer tools. So it is quick and easy to check the variable type without the need to put it in name.

  2. This is a good post with good recommendations, but there is one point I take issue with.

    The claim that comments are unnecessary is way overstating the case. I might retort that if your examples were clear enough, there would be no need for you to write the text between the examples.

    Sure, good names and clear code are best, but unless a program is small, comments are useful. Unless you think the reader of your code is intimately familiar with all the other code in your application and understands all the subtle details of every library you might invoke, there will be a need for comments, i.e., always.

    1. Nice comment, thanks 🙂 It would be an interesting experiment to post an article on “bad vs clean” containing only code examples (without the text between), and then read people’s opinions in comments. I fully agree with you that the best examples should be as simple as possible, so that any other text is unnecessary.

      Anyway, I still think that comments in code may lead to errors in many cases. For example, unclear comments may overshadow concepts. Comments could “change their place” if a careless programmer inserts some new lines of code. Finally, somebody can place important information in comment, which can easily be omitted or deleted accidentally. It is a lot easier to delete a comment than to delete an important line of code (because of compiler). IMHO the place for all comments is in technical documentation.

      1. This line of argumentation is unfair. Sure, people can write wrong/misleading comments, careless programmers could move comments to the wrong place or delete them accidentally.

        The unfair part of the argument is that you are comparing sloppy/incompetent programmers and showing it gets bad results, while for your case presupposing programmers who are so good at what they do their code doesn’t need comments.

        In my experience, I’ve find useful comments far more often than code which is so transparent in its constraints and purpose that it doesn’t need comments.

        1. Of course I don’t deny using comments at all 🙂 There are many very good examples of good comments:

          1. When you have complex regex in your code, it is a good practice to provide some explanation in comment, and some sample matching string too.

          2. When you are using a service (such as an API) that is outside your system, you may consider providing some additional information about that part of code.

          3. Commenting some parts of code is good when debugging.

  3. Clean code is very subjective, almost everyone will have a different opinion on what clean code is as soon as you are dealing with real-world code. Even in the examples you gave of undeniably clean code, I take issue with some of the cleanup you have performed, and I’m sure many others will take issue with my refinements – such is the nature of “clean code” and the varying opinions on it.

    First of all, this isn’t explicitly said, but from the first examples it looks like “short names = bad code”, which isn’t necessarily true; short names can be very succinct and meaningful in the correct context – if using `i` and `j` as the indexes in a nested for-loop, few will have difficulty grasping their meaning immediately because programmers have learned to relate the two from countless tutorials and examples. This is just a suggestion to perhaps make this distinction between good and bad names which are short, possibly alluding to a guideline of Uncle Bob’s that name lengths should reflect their scope, i.e. global names should be as long as needed to avoid ambiguity, and one-letter names are fine in a tiny scope – take anonymous Linq functions for example.

    My second two points are regarding the naming of variables in two of the above “clean code” examples. With regards to `elapsedTimeInDays`, `TimeIn` is redundant since you are already saying that the variable is tracking time because of `Days` – a more succinct name, in my opinion, would be `elapsedDays`. The next name I would change is `isChanged` – `is` is the present tense, but `Changed` is the past tense – a better name would be consistent in the tense that it is using, resulting in either `hasChanged` or `isChanging`.

    On the whole, this article contains a lot of statements without any substancial backup – in particular, you add what appear to me to be only baseless rules to what are already perfectly clear guidelines put forward by Uncle Bob. I don’t see what this article adds to what one can already read in “Clean Code” except a few toy examples and a wealth of assumptions.

    I’ll leave with a comment on the two most baseless assumptions in the text – that of

    > if there is a need to comment your code, it is most probable that your code is bad.

    and

    > Good code should be understandable to everyone without reading any comments.

    These statements are wholly unfounded and can be seen to have counter-examples in almost any moderately large code repository, consisting of what is by-and-large clean code.

    1. Thanks for an exhaustive comment! Sure, I agree with statement that “name lengths should reflect their scope”, it’s the simplest and extremely useful rule for naming. Moreover, some short names are connected with “convention”, like “i”, “j”, “k” in for-loops. This convention is so common that no-one of programmers should try to change it.

      However, in foreach-loops I think you should avoid using short names for an element in collection (for example foreach (var p in products) but rather use full name for an entity (“product in products”).

      About name “isChanged” – in my opinion it depends on the context. “changed” may be considered as past tense but also as a passive form of “change” (in this context, it may mean that something else is changing this place). In continuous form “isChanging” the context may be that “this part is changing itself now”.

      1. > However, in foreach-loops I think you should avoid using short names
        > for an element in collection (for example foreach (var p in products) but
        > rather use full name for an entity (“product in products”).

        Again, I think this changes with how long the variable will be used for; I see little benefit in coming up with an elaborate, self-explaining name for a variable that’s going to be used in a single-line scope, such as the following

        ….foreach ( Product p in products ) {
        ……..p.increasePriceBy(0.06);
        ….}

        I don’t think anything is gained by giving p a “fuller” name, but like all guidelines, this is not a hard-and-fast rule; YMMV. One thing I will say though, I would definitely declare that each product is of type Product instead of having it inferred by `var` – I don’t see the need to make programmers inspecting the code to verify what the type of the collection is; I feel that `var` should be reserved for when you don’t really need to know the type of intermediate results, like reflection, or when the intermediate types are obvious but constantly typing them all out is tedious, like when using Linq.

        With regards to `isChanged`, you seem to know more about the english language than me, but as a native speaker I can say with a lot of certainty that no other native english speaker is going to look at `isChanged` and think, “oh, he’s using the word passively”, you’d have to make it explicit by naming it `isChangedByX`, with `X` being a way of identifying the “changer”.

  4. Quick comment on your first examples: while the names you end up with are clearer, what I don’t really like is the fact that the name mixes two concepts, what the variable represents, and what unit it uses to express it (int elapsedTimeInDays, int fileAgeInDays). Unfortunately, unless you create Types for these units (like a Day type), there isn’t much you can do about it in C#. This is a place where a feature like Units of Measure in F# shines; you can simply declare a measure like
    [] type day
    And then simply use it like
    let fileAge = 42
    Not only is the units clutter gone from your variable name, but the compiler will also enforce that you can’t add, say, a mile and a day…

    1. Thanks for a great comment, especially the mention about F#. My examples and points in this article are very general and “classic”. Of course nowadays we have wonderful programming languages that provide us with great tools for creating “clean code” solutions. Some example within C# or Java may be an Object Oriented approach: developer could create a special class that limits and validates a concept of “Age measurement”.

  5. “This phrase simply means that if there is a need to comment your code, it is most probable that your code is bad.” That seems like a significant leap in logic… the quote was about the quality of comments, not code.

    1. I disagree with your statement. “The proper use of comments is to compensate for our failure to express our self in code” – the quote is about programmer’s problems with expressing intentions, so he or she put some important (not only additional), necessary to understand information in comment not in code.

      1. I’d refute your comment, but there are many others who have articulated my points much better than me in the thread at http://www.reddit.com/r/programming/comments/1eqx33/express_names_in_code_bad_vs_clean/ca2wnw7, in particular the fact that you shouldn’t be commenting “what” you are doing (since it should be obvious from the code) but “why” you are doing something that isn’t obvious, i.e. can’t be inferred from the code. The problem with comments as you describe them is because they are written in the form

        ….// do X

        as part of a more general form

        ….// do X because Y

        where the `because Y` part is omitted, possibly because the programmer thinks it is obvious, even though it is the most important part of the comment – in fact, the `do X` can be omitted if the code is not too complicated, but the `because Y` should remain if the intention is not obvious, since you can’t describe the *intention* of a piece of code through code, despite how elaborate and/or intricate you write it. Again, this topic is handled much better in the thread at http://www.reddit.com/r/programming/comments/1eqx33/express_names_in_code_bad_vs_clean/ca2xfmf.

        1. I like your point of view 🙂 It is a common programmers’ habit that they focus in comments on “what i’m doing” forgetting about “why i’m doing it”. However, in my opinion programmers should always try to write as accurately as it is possible, and my “perfect clean code” contains intentions expressed in code. Nevertheless I’m aware that in the real world, there are many examples of comments that are useful for the understanding of how software works. Thanks for comment, I really enjoy the discussion going on here.

          1. How do you express intentions in code? Unless you explicitly suffix names with what they’re going to be used for, such as `xBecauseY`, omitting comments means that anyone looking at your code can guess any number of reasons for your writing of that code, which is one of the main reasons I find reading other people’s code to be difficult. It all boils down to the differing opinions of programmers; regardless of what you write, another programmer will likely think (at some point of reading your code) “why did they write it like that?” Code on its own cannot tell people why it was written the way it was written, so we use comments, so that people reading our code can go “okay, that’s a valid argument, I won’t refactor this”, or “that argument doesn’t hold up well, I can now refactor this code without fear of misinterpreting the author’s intentions”.

  6. Nice post. I especially liked the examples you presented. Good job!

  7. I don’t necessarily agree that usage of “var” variables is a clean code practice. It is a developer usable but actually a nightmare to understand it.

    i prefer to strong type cast the variables (whenever I know the value it will hold) and use var or dynamic in case i don’t know what the value would be.

    Also I don’t use the notations that you used in the section “Always code in one notation, let notation help you understand the code” those are all old C++ practices which is not recommended by Microsoft

    the correct way would be

    const int maxCount = 1;
    bool isChanged = true;
    public interface IRepository{};
    private string name;
    public class PersonAddress{};
    void GetAllOrders(){};

  8. yes the line for writing the meaning full names is correct and it should be followed
    but in the contexts of comments
    they are also important part of codes as without comments if a begginers studies the code then he or she may remain confused till someone make him understand the code

Comments are closed.