Approaches to portability
The original version of this article appeared in JOOP (Journal of Object-Oriented Programming), vol. 11, no. 6, July-August 1998, pages 93-95. ©101 Communications, included with permission.
Software is portable if it can be made to run on various platforms at little effort or, ideally, no effort at all. A "platform" is a type of operating environment, as defined by the combination of hardware configuration, operating system, and other tools used by the software, such as compilers and database systems.
From a glance at today's computer industry, one might wonder at first whether portability is still such a big deal when the rapidly spreading dominant platform -- "Wintel" -- has a way of leaving room for alternatives that sometimes seems to be inspired by Attila the Hun. A substantial part of the Windows community indeed does not care at all about other platforms, or even concede their existence. (Can you remember a time when people actually asked you, before sending Word attachments, whether you were equipped to read them?)
Such a view may be partly justified for some of the lower tier of mass-marketed software products, where the Windows story is, increasingly, the only story. But as soon as we consider enterprise computing the picture changes completely. First there are the internal incompatibilities within the Windows camp: for the developer, Windows 95 and Windows NT are almost, but not quite, the same platform. But beyond Windows any enterprise solution must still take into account several very different platforms, typically including some mix of: the Unix flavors of the day (Solaris in its various, not quite compatible forms, HP-UX, SGI Irix etc.); Linux, the current growth path for Unix; OS/2; mainframe operating systems such as MVS and OS/400; Digital's VMS, which may be "deprecated" (to use the latest buzzword) but still a powerful presence in many parts of the computing world.
This variety accounts for a large part of Java's original appeal. The prospect of having a single execution platform seemed to good to miss.
Java's approach to portability, however, has shown its limitations. Talk to Java programmers these days, and they will tell you, if they have been trying to do multi-platform development, about all the trouble they get into when they try to move bytecode from machine to machine. The divergences between Sun, Microsoft and other players can only make things worse.
This article presents another approach to portability, which has shown its strength through many successful projects over the years. The technique, used by all current Eiffel implementations but one, uses C as an intermediate language in the translation process.
The belief is still widespread, in the computing community, that C and its derivatives are programming languages -- languages intended for people to write programs in. This is a regrettable misunderstanding, as anyone who has looked at the syntax of C will testify. C use by humans is problematic at any speed. I wish Dennis Ritchie would come out and dispel the confusion once and for all. It is all the more regrettable that C can be quite good when given a chance to fulfill its true avocation: that of a language for machines to talk to machines.
The very features of C that make it an unsuitable candidate for human consumption -- lack of structure, weak typing, closeness to the machine and the operating system -- strengthen its appeal as a "universal assembly language", a natural target for the output of compilers. This remains true even in the face of recent attempts to make C more typed (attempts, it must be said, that are rather out of character, and futile anyway since they will not fool any real C programmer, who can always find the appropriate workaround, straight from central casting).
All current Eiffel compilers except for one (Visual Eiffel from Object Tools) use C as their intermediate language. When hearing this, some people sometimes react "Oh, so it's only a preprocessor". The "only" part does not mean anything: every compiler is a preprocessor of a kind, and almost every compiler uses some intermediate codes before generating its final output. The Eiffel compilers cited use C as their intermediate code, but they are full-function compilers, doing everything that modern compilers do -- lexical analysis, parsing, construction and manipulation of abstract syntax trees, semantic analysis, extensive optimizations -- often complemented by support for such extra functions such as automatic documentation and configuration management. They simply rely on a C compiler for the final part of the process -- generating the actual machine code.
This technique has been shown to offer key advantages:
It's interesting to compare this technique with the Java bytecode approach. Even though the use of C might at first appear passé, it turns out, in my opinion, to be superior.
C vs. bytecode
"Bytecode" denotes a low-level code (not necessarily made of byte-sized units) that can be emitted by compilers and run directly through an interpreter. The idea of a portable bytecode is of course not new, going back at least to the P-code of UCSD Pascal. Java uses a bytecode for portability. ISE Eiffel also uses a bytecode, but in a different way, as will be seen below.
Java's approach is to use the bytecode as a portability vehicle: the compiler emits bytecode, which you can then run on any machine that has a bytecode interpreter. There are several problems with this technique.
First, the bytecode is only good if it is truly portable. As noted above, 100% Java portability is simply not there, and adaptation attempts can be difficult (remember that we are dealing with a runtime mechanism: it may be too late to fix problems!).
Then there is the efficiency issue. Java applications are notoriously slow because of the burden of interpreting the bytecode at run-time. Various attempts have been made to address the problem:
Eiffel's "C as the portability vehicle and intermediate code" approach is closer to the 100% goal. C is pretty well standardized to start with; and in 13 years of concrete dealings with dozens of C compilers, we have learned to spot and bypass all the areas of potential incompatibilities. The code we generate is the subset of C which is accepted by all these compilers. And if a small problem appears on a new platform, it's always possible to find a temporary fix (even at a user site, before we correct the problem at the source) through the traditional low-level C mechanisms: "ifdef", make files and the like.
As a result, ISE's own compiler is a single piece of source code. The bulk of it is 2,500 Eiffel classes, i.e. 300,000 lines or so; this has no machine dependency whatsoever. The C code that it generates is exactly the same on every platform that we support, be it Linux, VMS, Windows 95, Windows NT, Solaris etc. All the machine dependencies are embedded in the C part of the "run-time system", the interface with the operating system. These C functions -- about 30,000 lines of C -- are also a single source text, with some embedded platform-specific code in conditional clauses ("ifdef"). As a result, we don't have an SGI version or a Windows version, but a single system that compiles identically across all platforms.
Many large Eiffel developments are similarly running on different platforms with identical source code. The only difference with ISE's software is that they have no need for any platform-specific code or ifdefs, since the compiler has taken care of the platform dependencies once and for all; and of course they usually need far less C code, often none at all.
The basic portability mechanism is complemented by portable libraries, in particular for graphics. The EiffelVision library provides portable graphics, which will compile across all supported platforms with no source code change but adapted to the native GUI. Its two-tier architecture (figure 1) is used by other products as well. The top tier is the portable one; the bottom one includes platform-specific libraries. You can work at either level; for example, if portability is not a concern, you can use the Windows Eiffel Library directly.
The use of bytecode
Interestingly, ISE Eiffel also uses, ever since ISE Eiffel 3, a bytecode for its internal purposes, but in a different and, I think, more appropriate fashion (figure 2 depicts the translation scheme):
As part of the development environment, the "Melting Ice Technology" mechanism (figure 3) combines interpretation and compilation. Once your system has been compiled through C ("frozen"), everything that you change is "melted", i.e. compiled to bytecode only, not incurring the penalty of C compilation and linking. This is what makes fast incremental compilation possible: only the changes, automatically detected by the compiler, are melted; the rest remains frozen. Then once in a while, e.g. every few days, you can start refreeze everything.
A third compilation mode, "finalizing", is similar to freezing (it goes through C to machine code) but, in addition, performs the kind of extensive, system-wide optimization that do not make sense as long as you are still modifying your software. A typical example is dead-code removal: why bother explore the full system to find out that a certain routine is never called, when the addition of one instruction a minute later can invalidate this result? The same holds of such optimizations as static binding and inlining. Since compilation goes through C, it also benefits, of course, from the C compiler's own extensive optimizations. This is a productive combination of optimizations:
Finalization is what makes it possible to generate executables whose performance is similar to that of C or C++ programs.
This set of techniques avoids the principal disadvantage of a bytecode approach to portability. Only in development mode does bytecode serve as an execution mechanism. Where optimal speed is required, the bytecode ceases to be executed; it simply becomes the medium from which to generate optimized C and machine code.
Sooner or later, a purely bytecode-based approach hits the performance problem. By using bytecode as a development mechanism only, we can get all the advantages of bytecode, without incurring this unacceptable performance hit. And we build on the exceptional amount of intelligence and effort that has been devoted to C compilation. A big part of the object story is, after all, reuse: taking advantage of good existing solutions to go ever further and higher.