Skip to content
Maximiliano edited this page Jun 11, 2021 · 5 revisions

The DIME Analytics Coding Guide

Taken from The DIME Analytics Data Handbook

Style and check rules

Style

  • Use soft tabs (i.e., whitespaces), not hard tabs Use white spaces (usually 2 or 4 whitespaces are used) instead of hard tabs. You can change this option in the do-file editor preferences.

  • Avoid to use abstract index names In for loops, index names should describe what the code is looping over. Hence, for example, avoid coding like this:

    foreach i of var cassava maize wheat { }
    

    Instead, when looping commands should name that index descriptively:

    foreach crop of var cassava maize wheat { }
    
  • Use proper indentations: After declaring for loop statement or if-else statement, add indentation with whitespaces (usually 2 or 4 whitespaces).

  • Use indentations after declaring newline symbols ///: After a new line statement (///), add indentation (usually 2 or 4 whitespaces).

  • Use !missing function for conditions of missing values: For clarity, use !missing(var) instead of var < . or var != .

  • Do not use delimit, instead use /// for line breaks: More information about the use of line breaks here.

  • Do not use cd command to change the current folder: Use absolute and dynamic file paths. More about this here.

  • Use line breaks for too long lines: For lines that are too long, use /// for line breaks and divide them into multiple lines. It is recommended to restrict the number of characters in a line under 80. Whereas sometimes this is difficult since, for example, Stata does not allow line breaks within double quotes, try to follow this rule when possible.

  • Add whitespaces around math symbols such as +, =, <, >, etc.: For better readability, add whitespaces around math symbols. For example, write gen a = b + c if d == e instead of gen a=b+c if d==e.

  • Specify the condition in the if statement: Always explicitly specify the condition in the if statement. For example, declare if var == 1 instead of if var.

  • Use curly brackets for global macros: Always use ${ } for global macros. For instance, use ${global} instead of $global.

Check

  • Check if missing values are properly taken into account: Note that a != 0 includes cases where a is missing.

  • Check if backslashes are not used in file paths: Check if backslashes (\) are not used in file paths. If you are using them, then replace them with forward slashes (/).

  • Check if tildes (~) are not used for negations: If you are using tildes (~) are used for negations, replace them with the bang symbol (!).

Writing a good code

“Good” code has two elements: (1) it is correct, in that it doesn’t produce any errors and its outputs are the objects intended, and (2) it is useful and comprehensible to someone who hasn’t seen it before (or even someone who has, weeks, months, or years later). Many researchers have only been trained to code correctly. But we believe that when your code runs on your computer and you get the desired results, you are only half-done writing good code.

Therefore, good code:

  • is easy to read and replicate, making it easier to spot mistakes;
  • reduces sampling, randomization, and cleaning errors;
  • can easily be reviewed by others before it’s published and can be re-used afterwards.

We always tell people to “code as if a stranger is reading it”.

You should think of good code in terms of three major elements:

  1. structure,
  2. syntax,
  3. and style.

The structure is the environment and file organization your code lives in: good structure means that it is easy to find individual pieces of code, within and across files, that correspond to specific tasks and outputs. It also means that functional code blocks are sufficiently independent from each other such that they can be shuffled around, repurposed, and even deleted without affecting the execution of other portions.

The syntax is the literal language of your code. Good syntax means that your code is readable in terms of how its mechanics implement ideas – it should not require arcane reverse-engineering to figure out what a code chunk is trying to do. It should use common commands in a generally accepted way so others can easily follow and reconstruct your intentions.

Finally, style is the way that the non-functional elements of your code convey its purpose. Elements like spacing, indentation, and naming conventions (or lack thereof) can make your code much more (or much less) accessible to someone who is reading it for the first time and needs to understand it quickly and accurately.