Bidding farewell to globals
The original version of this article appeared in JOOP(Journal of Object-Oriented Programming), vol. 1, no. 3, August-September 1988, pages 73-77. ©101 Communications, included with permission.
Syntax note: the examples have been adapted to the current conventions for creation instructions.
One of the major concerns in the design of Eiffel has been to help developers write reliable software. This explains many of the more original features of the language such as assertions and full static typing, and also the general paucity of its design -- the smaller the language, the less likely programmers are to make mistakes due to confusion or improper understanding.
Another aspect of Eiffel which helps considerably towards reliability is the total absence of global variables. This first column on Eiffel concepts and techniques will show how this was possible.
Global = bad?
Global variables represent data which you want to make directly accessible to all the modules in a program, or at least to several modules. Their negative effect on software quality is well-known. In 1973, at the height of the "X considered harmful" wave of papers, William Wulf and Mary Shaw convincingly criticized their use (Global Variable Considered Harmful, ACM SIGPLAN Notices, 8, 2, pp. 23-34). The flaws of globals are easily seen:
There is also another problem, less fundamental but still annoying: since a global variable does not belong to any one module in particular, it is not clear where it should be initialized. More on this below.
In spite of these deficiencies, all the programming languages I know (other than Eiffel) have some form of support for global variables. Algol 60 introduced sharing through syntactical nesting: program units may be nested within each other; all variables declared in an outer unit are accessible in inner units (unless overridden by local variables with the same names). This extreme form of sharing has found its way into Pascal, Ada and many other Algol derivatives. The same effect is obtained in C, although this language does not support nesting, by declaring a variable as "external": then it will be accessible to all functions of the program, as if it had been declared in the outermost unit of an Algol-based language.
There is also a more restricted form of sharing, in which you define named clusters of shared variables. Each such cluster is accessible to any module that explicitly requests it. This is essentially what exists in Fortran, where the clusters are known as COMMON blocks, and is also available in C, where they are known as include files.
This is more disciplined than the nesting method, since global variables are only shared between units that request the corresponding cluster. Yet they are still global variables; and programmers have a natural tendency to put too many variables in a cluster, "just in case", increasing the distribution of data, the risks of data pollution, and the difficulty of understanding and maintaining the software. The "garbage common blocks" favored by many Fortran programmers (who tend to put everything into a big COMMON) are an extreme case, but the trend exists everywhere.
A primitive definition of object-oriented programming is "modularity taken seriously". The idea is to decentralize software architectures so that components can be separately reused and individually maintained. So it would seem that global variables, the most obvious obstacle to decentralization, would have all but vanished from object-oriented languages.
This is usually not the case, however. To take just two examples, Simula is an Algol derivative where classes may be syntactically nested; in fact Simula relies on a rather traditional notion of main program. Similarly, Smalltalk supports global variables, which are shared among all instances of all classes.
In Eiffel there was never any temptation to introduce global variables. Even more than a design and programming language, Eiffel is a tool for supporting a certain idea of software construction, based on the bottom-up combination and extension of autonomous, reusable, extendible components. Introducing globals would be the negation of this whole approach.
In fact there is not even a notion of main program. "Where things start" is the least and last worry of the Eiffel designer: least because what's important in software design is to produce useful and good-quality components; last because in the process of software construction you should assemble the components at the final stage, rather than focus early on a single main function. The particular assembly chosen in the end can then easily be changed to fulfil new requirements through a different arrangement of the same components.
An assembly of components (classes) is called a system. Assembling a system simply means choosing one of the classes as "root" of the system. Executing the system means creating one instance of the root class and executing its one of its creation procedures, which will in turn create more objects and execute more operations. Rather than a main program, a root's creation procedure is like a test driver which exercises existing primitives in a certain order.
A note on terminology to make the sequel clear. In Eiffel there is no confusion between classes and objects: the text of a system is made of classes; the state of the system at any stage of its execution involves objects. Every object is an instance of a class. The text of a class describes the features of all the class's instances, including <attributes (data fields) and routines (operations applicable to the instances). Some JOOP readers may be more familiar with the terms "instance variable" and "method" respectively.
With this highly decentralized structure and the total banishment of global variables, there remains the problem of how to deal with data items that do need to be shared between several objects. Examples of legitimate sharing include:
Of course, we could do away with sharing altogether, by using argument passing as the only inter-object communication mechanism. But in this case the cure would be worse than the disease. Arguments representing shared objects such as the mouse would have to be transmitted throughout the system, polluting interfaces and endangering the consistency of the architecture. Besides, argument passing assumes that for each piece of data there is one class (and at run-time one object) which "owns" it. In the case of truly shared data this is inappropriate.
The Eiffel solution uses multiple inheritance and the notion of once routine.
First, whenever you detect a group of related facilities that are needed by more than one class, you can simply gather them in a class, say C, and use inheritance: any class whose instances need these facilities will simply inherit from C.
This addresses the problem of shared constants, at least for constants of simple types (integer, boolean, real, character). Assume for example a set of constants describing the codes associated with terminal keys; then they may be grouped in a class:
class TERM_CODES feature F1_key: INTEGER is 156 F2_key: INTEGER is 157 ... end -- class TERM_CODES
Any class that needs these codes will inherit from TERM_CODES. We see here the first key to avoiding globals in Eiffel: multiple inheritance. If classes were restricted to inheriting from at most one class, then potential parents would soon run into conflict with each other, and the technique would not be applicable. In Eiffel, there is no limit to the number of classes you can inherit from, and any name conflict problem is removed through renaming -- but this will have to be for another column.
Classes such as TERM_CODES are simpler than classes that implement full-fledged data abstractions. But they are quite legitimate. They correspond to what was called a cluster above (a Fortran COMMON block, a C include file, or an Ada package), restricted to hold only constants. The same type of class is often used to hold routines as well; they come in handy to provide access to an existing routine library. Note that such classes are usually not meant to be instantiated, but just inherited.
The above technique is not sufficient for shared objects such as the mouse in the above example (which, incidentally, I owe to Philippe Lahire). Of course, you could write a class containing a declaration of the form
where MOUSE is the class describing mice, with such attributes as mouse position, button state (up and down) etc. But this does not work because the mouse object associated with my_mouse needs to be actually created at some point. In Eiffel, an object is created through a creation instruction, as follows:
create my_mouse.make (...)
where the creation procedure make may have arguments if needed. (In the absence of a creation procedure the form is just create my_mouse. Here, however, the question is: Who will take care of this creation?
The naive answer is: "Let's have an initialization module take care of the creation". But this still leaves open the question of how the mouse object will be made available to the objects that need access to it. Argument passing seems to be the only solution. In any case, the very idea of an initialization module that initializes everybody else's data, and thus interfaces with all other classes of a system, is enough to make an Eiffel programmer scream in horror and change career plans to take up something like gardening.
A better idea is to make my_mouse a function, which returns a reference to a newly allocated mouse object. (A function is a routine that returns a result; a routine that does not return a result is a procedure.) The syntax is the following:
my_mouse: MOUSE is -- Reference to mouse do create Result.make (...) end
(The predefined name Result denotes the result of the enclosing function. The double dash -- introduces comments.)
Unfortunately this does not work: each reference to function my_mouse creates a new mouse object and returns a reference to it. This is not what we want. Ideally, the first call should create the object and all others should return a reference to that same object.
Eiffel's once mechanism fulfills this requirement. A once routine is a routine that executes its body at most once -- the first time it is called. Subsequent calls, executed by any object, will not execute the body; if the routine is a function, they will return the result computed by the first call.
This is exactly what is needed here. To make my_mouse a once routine, it suffices to replace the do keyword by once:
my_mouse: MOUSE is -- Reference to mouse once create Result.make (...) end
Assume this is in class WINDOW. The first WINDOW object that executes this call will create the mouse object and initialize it as specified. All subsequent calls will return a reference to that object.
This achieves the sharing of an object between all instances of a class and descendant classes. The method is used in Eiffel programming whenever a shared object is needed.
The once mechanism has other interesting applications. In particular it solves a general problem associated with shared variables: non-constant initialization. This problem comes up in all languages, but I haven't seen any clean solution in existing languages.
How do you initialize a global variable when the initial value is not a static constant but must be computed at run-time, based for example on user input? If the variable is made global through syntactical nesting, as in Algol-based languages, the answer is simple: the outermost unit takes care of the initialization before calling any inner unit. But, as we saw, this approach is incompatible with modularity.
In C, the declaration of an external variable may specify an initial value, but this value may only be a constant expression. Fortran is here as elsewhere almost identical to C, with shared variables initialized in BLOCK DATA units to constant values only. In both cases the restriction means that if you need a non-constant initial value you must again resort to writing an initialization module, which defeats any attempt at making the other modules autonomous.
The once mechanism solves this problem. Rather than a shared variable, you will use a once function of the form
shared: SOME_TYPE is -- ... do ... Compute initial value and assign it to Result ... end
This applies to values of basic types as well as to objects.
Initialization in packages
The notion of once routine, applicable to procedures as well as functions, also solves a rather annoying problem of software design: initialization in packages.
Many or perhaps most routine packages require the client to call a certain initialization procedure before calling any "useful" routine of the package. (A client of a module is any other module that uses its facilities.) In the X Windows package, for example, you must call procedure C_open_display before doing anything else.
Such a requirement is a nuisance. If you forget the initialization call, you will usually get a run-time failure with a less than meaningful message. On the X Windows implementations I have seen, the message is pretty terse: Core dump.
There is a good reason why such messages are usually meaningless: if you could get an informative message, it would mean that the package designer was able to write code that detects the cause for the error, namely that you haven't called the initialization routine. But then it requires a rather incompetent (or perhaps sadistic) designer to write code which in such a case prints out a message
This call will fail because you haven't called the initialization routine!
rather than quietly calling the initialization routine.
In other words: either the package designer requires an initialization routine and error recovery will be terrible; or the designer does a good job and doesn't require a user-callable initialization routine.
But how can the package designer make sure that the initialization routine is automatically called as needed? Once routines provide a simple solution. Make the initialization procedure, say initialize, a once procedure. Then begin the body of every user-callable routine in the package (in Eiffel terms, every exported routine in the corresponding classes) with a call to initialize. You know that only the first of these calls to occur during a given system execution will have an effect. Then you can stop bothering client programmers with initialization calls; in fact, initialize should not be exported.
Old habits die hard
A while ago, I specified a class (meant to be inherited) that gives programmers access to the command-line options supplied by a user when calling the system. The specification looked like this:
class ARGUMENTS feature argument_count: INTEGER is -- Number of arguments on the command line -- (excluding command name) do ... end argument (i: INTEGER): STRING is -- Command line argument number i -- (the command name if i = 0) require 0 <= i i <= argument_count do ... end end -- class ARGUMENTS
Here argument_count gives the number of arguments on the command line and argument (i)argument (i) gives the i-th argument. The precondition of argument (the require... clause) gives the constraint on i.
In my first version, no doubt influenced by years of exposures to evil influences, I had included a procedure read_command_line that decoded the command-line arguments and initializes the proper data structure. This procedure had to be called before any call to either argument_count or argument. Then my colleague Jean-Marc Nerson took a look and casually mentioned that this was a rather poor interface: it is much better to hide read_command_line from clients. Just make it a once procedure; rewrite argument_count and argument so that both start out by calling read_command_line; and let the client programmers worry about the really important stuff.
I felt rather stupid. To make sure that I would remember this the next time around, I decided to put the whole thing in print -- and such is the story behind this article.