Sunday, October 23, 2016

Tabs or Spaces? What can we find by looking at white space?

By: Dale Fletter, California State University, Sacramento. USA (dale.fletter@csus.edu)
Associate Editor: Bogdan Vasilescu, Carnegie Mellon University. USA (@b_vasilescu)

The TV show Silicon Valley had a scene that satirized a common cultural rift among coders, spaces or tabs? [1] What difference does it make? [2] I argue that it is very important and leads us to an understanding of how the software engineering community has not yet matured its understanding of the underlying processes and artifacts that are at the core of the work.

Compilers for most languages, even those that use block structured statements, ignore most white space. To them a tab or spaces is the same. Software engineers many decades ago found value in "pretty printing", that is inserting white space with the intent of visually communicating the underlying blocks in the structures. [3] To the person writing the code, both will render identical code on their screen. Those who prefer the tab will cite the efficiency with which a coder can transfer their logic into the source code. Those who prefer spaces cite the control they have over the code presentation.

The difference between them becomes clear as soon as one does a cut-and-paste from their editor/IDE into a word processing document. Those text editors were designed to bring forward the metaphor of the typewriter tab settings where the typist set the tabs manually. When one inserts text, it is a shortcut to manual entry. To ensure the same presentation in the word processing document as in the text editor used for composition requires the software engineer to set the tab stops to match the IDE and choose a uni-space font which mimics the text editor. Failing to do either will typically render the presentation of the code in a way that significantly differs from what the coder intended which will include the visual indication of blocks.

If the only process that depended upon the visual presentation of the code were the compiler, for most languages it would be irrelevant. But the software engineer is a reader of code and the speed, accuracy and completeness with which the program is communicated is a key contributor to code comprehension. To change that presentation is to ignore the contribution the coder made to ensuring a particular layout on their device. This more expansive way of looking at code not just as a means to instruct the machine but as a human utterance worthy of consideration on its own merits is an exciting new direction of research. [4]

It is also possible this is not a transitory effect of personal preference either. It parallels an equally divisive style issue in legal writing. Due to the limitations of the typewriter and the need for comprehension in legal writing, the convention of double space after a period was adopted. This artifact has continued on into the present day despite the objection of typographers who point out that with a proportionally spaced font that this is not needed. It is not just the local choice of each law office. The use of the double space following a period is found in different legal publishers as well. 

One approach is seen in many IDEs that will do an automatic format of code to conform to style rules for the presentation of blocks. There are problems in this approach. First, it leaves the choice to do the format to the coder. If it is a choice some coders will use the tool and convert to the standard for that IDE. But as we know code is often edited with different editors by different people on different platforms. The result is a source code that has a mixture of tabs and space. Another problem is that those coders who invest a great deal of time to insert spaces to achieve a visual effect can have their work undone by the well intentioned rules. But for the coder who believes they achieve greater efficiency by using a tab instead of 4 spaces, this is an imperfect compromise.

Is there any difference in coder productivity when the tab key is used? or is this a way of expressing a personal choice in a task that is limited in its humanity and expressiveness? Is there a real time savings from those who insert a tab character from those who have developed the nervous tic of hitting the space bar four times instead? This may be worthy of analysis since a coder may need to do this many times in the course of writing and editing code.

If the informative use of white space were limited to marking off blocks of code, this might be too insignificant an issue to justify the attention. But we all know those coders who find very creative ways to draw attention to parallel structures in their code with only the use of spaces and line feeds. This visual information is as important to retain as is the code itself. This suggests that the creation of a code file is not merely the recording of a string of text but instead a form of visual communication as well.

I believe this bit of comic relief informs the lack of understanding that is still pervasive in the community. Writing code that is understandable to a human is more important than merely marking blocks of code. A more complete look at how coders present their code can illustrate what they have found to be important clues to increased readability. Those who still prefer tabs seem to be blasé about the loss of control over presentation depending on the editor that is used to look at the text string.  

[3] Programming with small blocks, Mark K. Joseph. SIGSOFT Softw. Eng. Notes 9, 5 (October 1984), 28-42.
[4] New initiative: the naturalness of software. In Proceedings of the 37th International Conference on Software Engineering - Volume 2 (ICSE '15), Vol. 2. IEEE Press, Piscataway, NJ, USA, 543-546.
[5] A metric for software readability. Raymond P.L. Buse and Westley R. Weimer. In Proceedings of the 2008 International Symposium on Software Testing and Analysis (ISSTA '08). ACM, New York, NY, USA, 121-130.

No comments:

Post a Comment