This site contains older material on Eiffel. For the main Eiffel page, see http://www.eiffel.com.

The significance of components

Bertrand Meyer

Information hiding only makes sense if it's enforced pitilessly.

What is a component? The answer to this question depends on whether you choose the narrow definition or the wide definition. Clemens Szyperski, in his book Component Software, Beyond Object-Oriented Programming (Addison-Wesley, 1998), restricts components to binary units of software as offered by COM, Enterprise Java Beans, and CORBA. The wider definition pays less attention to the exact nature of components (binary units, classes, packages). Instead, what it sees as critical is that components are "client-oriented software" enjoying two key properties (in the definition I proposed in "On to Components", in the January 1999 issue of IEEE Computer). These are: first, the component may be used by other program elements (its "clients"), and second, the clients and their authors do not need to be known to the component's authors.

The narrow and wide views lead to a different appreciation of the relationship between component technology and object-oriented development. Szyperski subtitled his book "Beyond Object-Oriented Programming." For me, component-based development is the natural evolution of object technology; it requires object technology (if only to build the components themselves), and it shares with object technology not only its basic goal-reuse-but also its fundamental techniques.

Even with the wide definition and the complementary rather than contradictory view of the components-vs.-objects relationship, what's so special about binary components? Why are they attracting so much attention now?

Why Components?

The obvious answers-let's reuse, let's rely on other people's work and stop reinventing the wheel, let's save on software costs, let's try to address the shortage of good developers-are not completely satisfactory because they apply just as well to standard object-oriented development. Besides, in spite of the progress of components, the software development industry as a whole is still far from having mutated into an assembly-line, mix-and-match activity. I think the basic answer is simpler: binary components finally make information hiding inevitable.

First described many years ago in two seminal articles by David Parnas, (Communications of the ACM, May and December 1972), information hiding is the principle that the designer of every module should specify which of the module's properties are secret-accessible within the module only-and which are accessible to clients. The principle only makes sense if the language and development environment enforce it pitilessly: it must be impossible to write client modules that depend directly on secret properties.

Contrary to common perception, the main goal of information hiding is not to prevent the client authors from accessing the secret properties, but instead to help them by avoiding the need to learn irrelevant details of the many supplier modules they may use. Another principal benefit is to protect the clients from unforeseen changes in their suppliers: as long as these changes affect secret properties only, the clients are safe-again provided the environment doesn't let them rely (whether sneakily or unwittingly) on anything that the suppliers have specified as secret.

Obstacles to Information Hiding

Almost everyone pays lip service to the principle of information hiding. But the sad truth is that the common languages and environments make only a half-hearted effort at enforcing it. Here are three examples.

The first is the notion of global variable, still present in most object-oriented languages. As soon as you have this mechanism, information hiding is wishful thinking at best: global variables introduce furtive coupling between modules, endangering any hope of preserving their secrets and their resistance to each other's changes.

Next comes a facility that never ceases to amaze me: the presence, in languages such as C++ and Java, of direct attribute assignments of the form a.x = b. How can one can claim to have an object-oriented language and allow such blatant violations of information hiding principles? An object is a little machine accessible through an abstract interface, the class specification; letting clients play around with the internals directly in such a way is akin to letting the user of a cell phone remove the cover and play around with the wiring, while pretending that the phone will still operate as the user's manual says.

The third issue is uniform access (discussed in my Eiffel column in the October 1999 issue of JOOP). Clients should be able to use a "query" on an object-a question on the state of the object-without having to know whether the query is implemented as a function, computed each time it's needed, or an attribute, stored in the object and looked up whenever there is a request for it.

For example, the query "This month's tax deductions," on an EMPLOYEE object, might be computed from a formula or stored with the object. But this is absolutely irrelevant to a client, who should simply be able to write something like Elizabeth.tax_deduction in either case. This principle (already present in Simula as early as 1967!) is applied in Eiffel but violated in most other common languages, object-oriented or not.

The "Dilbert's Boss" Approach

It is possible to enforce information hiding, whatever the language or environmental limitations, through style rules and coding guidelines. The rules can prohibit global variables (although in the absence of more advanced mechanisms for sharing information between objects, they will have to leave room for exceptions, always a perilous situation); they can prohibit direct a.x = b assignments to object fields, requiring instead the object-oriented idiom a.set_x (b); and they can require that all queries use functions, meaning that attributes are technically private, being exported only, if required, through an associated "get" function.

But any approach based on style rules is an admission of failure-what's the point of going to a modern object-oriented language if we still have to apply dozens of extra coding requirements, as in the good old days of "structured Fortran"? It's the Dilbert's Boss approach to software quality, and it's understandably risky since we can never be sure of the extent to which the rules are applied. In addition, it's often overkill. Take the last example (uniform access): there is absolutely no conceptual reason why we should write a "get" function for every exported attribute. When I go to the bank I ask for my balance, not my "GetBalance." It's perfectly OK to export an attribute, as long as the client can't tell that it's an attribute rather than a function. The danger of overkill is clear: People get bored with having to perform tedious tasks, such as writing and calling needless "get" functions, and when there is a deadline looming, they just give up; in addition, these extra mechanisms obscure the program text, decrease maintainability, and detract from truly useful and creative work.

So information hiding, in the dominant programming languages, remains a murky proposition. But then we have major trouble. For (as Parnas understood so early in the game) it is impossible to write big, serious software without information hiding-in fact, without being dogmatic about information hiding.

If you don't apply information hiding and its consequences, you simply won't be able to design, develop, and (most importantly) maintain the kind of advanced, sophisticated, evolutionary systems that your customers are requiring today.

And this is where binary components come in. Being binary isn't that important in itself. A well-written class, in an environment supporting "dogmatic" information hiding-either through the environment itself, as in Eiffel, or through strict coding practices-will be just as suitable for reuse, extendibility, cost savings, and the other goodies of component. But if you don't have such an environment, binary components are the only way to guarantee information hiding with no cheating.

No Way to Sneak and Poke

With binary components, there is simply no way to sneak and poke around a module's internal properties. With most commercial components, you couldn't do that even if you wanted to, since your friendly supplier won't show you his source code. But-let's quickly make this clear before the open-source enthusiasts start penning their letters to the editor-availability of source code is not the issue: even if the client author somehow has access to the source code, using the module in binary form means that the client module can't take advantage of it.

So binary components change the status of information hiding: no longer a principle of programming methodology, but a fact of life, information hiding becomes no more evitable than the need to show your ticket to board an airplane.

Indeed, this is how people are using components now. The en-masse replacement of programming by plug-and-play component assembly has not occurred yet, but components are playing an everincreasing role in almost everyone's developments anyway. In large Eiffel projects, for example, we see a strikingly increasing use of EiffelCOM and Eiffel-CORBA solutions, enabling applications to communicate with other applications, whether these applications themselves use Eiffel, Java, C++, Visual Basic or anything else. Binary components provide the guarantee of encapsulation that is missing in most of the rest of the programming world. It's not the form of the modules; it's information hiding, stupid.

The Operating System Precedent

We can invoke historical precedent here. Much of the early progress in software methodology-leading up, for example, to structured programming, and in fact to the first forms of the information hiding principle-came out of work on operating systems, such as Dijkstra's THE system and Hoare's experience. It's not necessarily that OS people are more perceptive than, say, people who write payroll systems; it's rather that you can't even hope to get a halfway decent OS unless you are an extremist about issues of methodology. Global variables, for example, are not just ill-advised but deadly: if the line printer spooler shares variables with the swapping module, you won't go very far in building your OS.

Binary components are the same: because they are defined by their official interface, and by absolutely nothing else-no cheating, no peeking at the internals, no covert use of implementation information-they force you to apply information hiding as you always knew you had to, but didn't always do.

On to Contracts

Once you've mastered this discipline and started to enjoy its benefits, there is no reason, by the way, for you to stop at binary components. This is where the "wide" view of components comes back into play, and the distinction between object and component technology doesn't appear so relevant any more.

This is not the end of the story. In fact, we've hardly begun to examine the deeper question seriously. If we protect our clients from all the irrelevant details of our modules, how do we tell them about the relevant parts? The answer involves a key word: contract; and it is for another column.