Beware of polymorphic catcalls
Bertrand Meyer, ISE
This technical note refers to the OOPSLA and TOOLS PACIFIC presentation on static typing, whose text is now also available on-line. If you have not attended one of these talks you should most likely read their text first; otherwise the present document probably will not make much sense!
Disclaimer, copyright, acknolwledgments
This note is personal research work by the author and does not in any way commit ISE.
It is copyrighted and may not be reproduced or distributed without the author's permission.
Successive drafts benefited from comments and corrections from Michael Schweitzer, Eric Bezault, James McKim, Roger Browne, Rock Howard, Peter Wegner, Kim Bruce and Fabrice Franceschi.
What follows is only the results, with some explanations. A more extensive discussion may be found in the paper mentioned above; the full discussion is in the book Object-Oriented Software Construction, second edition. Some readers might find the rules objectionable until they have seen that discussion.
Note that in spite of what one may think at first the mechanism seems implementable as part of an incremental compilation process (as opposed to system validity as defined in chapter 22 of the book Eiffel: The Language, hereafter ETL, which only works on an entire system).
The idea is, roughly speaking, to avoid the transitive closure algorithm of ETL-22 by being more pessimistic, while still not excluding any reasonable construction.
The idea is very simple; it states that for an entity p you can have at most one of the following two (each common and useful) situations:
The rules below are but a precise formalization of this principle, complemented by a similar rule avoiding potentially dangerous creations.
A system is valid if and only if it includes no polymorphic catcall, broken assignment or broken creation instruction.
An entity or function x of type T appearing in a class C is polymorphic if and only if T is not expanded and one of the following conditions holds:
An instruction is a polymorphic assignment if it is an assignment or assignment attempt, with target of type T and source y of type S, satisfying the following conditions:
A call of target t is polymorphic if and only if its target is a polymorphic entity or a call to a polymorphic function.
A call to a feature f of a class B, appearing in a class C, is a catcall if and only if one of the following properties holds for a proper descendant D of B other than NONE:
An assignment appearing in a routine r of a class C, with target t and source type S, is broken if and only if it satisfies the following conditions:
A creation instruction of target t and creation type T appearing in a routine r of a class D is polymorphic if and only if it satisfies one of the following conditions:
A creation instruction i appearing in a routine r of a class D, with creation procedure p and target t of type T in D, is broken if and only if it satisfies the following conditions, where p is the creation procedure and C is the base class of the creation type [[or, if the creation type is a formal generic parameter, any effective proper descendant of its base class other than NONE]]:
This rule eliminates the definition of system-level validity in ETL.
0.A - The general convention in ETL (General Validity Rule, VBGV, page 28) is that for any construct C a validity constraint of the form ``A specimen of C is valid if and only if XXX'' includes an implicit addition of the form ``... and all its constituents are valid''. So because ``system'' is the highest level structuring concept in Eiffel the above validity rule includes all the validity rules of ETL.
0.B - The definitions in the rest of this document introduce the concepts used in the rule - catcall, creation catcall, polymorphic call or creation instruction, broken assignment, broken creation instruction - and some needed auxiliary concepts such as polymorphic entity.
0.C - Definitions 1 to 5 address the system validity issues raised by attachments (assignment, argument passing, assignment attempt); definitions 6 and 7 address the issues raised by creation instructions.
1.B - ``Polymorphic'' (here and in the other definitions using this term) is an abbreviation for ``potentially polymorphic''. In particular, in case 1.2 we treat all formal arguments of a routine r as potentially polymorphic because we may compile the class into a library and then clients can call the routine with actual arguments of arbitrary conforming types. And in case 1.1 we treat any entity of generic type as polymorphic because its actual type can be any type conforming to the generic constraint.
1.C - Current is considered polymorphic because it represents every different descendant of C.
1.D - An entity of anchored type may not be polymorphic (see 2.1).
1.E- In case 1.4 we treat any external function as polymorphic; this is inevitable since an Eiffel compiler will have no way to find out the exact type that an object may have if it is produced by an external function. See also the discussion of precompiled libraries in part 4.
1.F - Whether an entity is polymorphic can be entirely determined on the basis of the class's own text, complemented by the knowledge of what inherited features (attributes and functions) are polymorphic. This is fundamentally different from system-level validity in ETL chapter 22, which requires a system-wide transitive closure.
2.B - Eiffel semantics does not generally distinguish between assignment and argument passing, together called attachment. The notion of polymorphic assignment is an exception, due to the decision (1.2) to treat any formal argument ipso facto as polymorphic because of the bottom-up nature of the Eiffel method and its reliance on libraries.
3.A - An unqualified call has Current as its target (see ETL page 340) and hence is always polymorphic.
3.B - The target of a call can itself be a call to a function, polymorphic or not, as in (some_function (a)).some_routine (b).
4.A - Since the General Validity rule enables us to assume that the call is valid, the export status of f in B must make it available to C.
4.B - For the same reason, the call's actual argument signature conforms to the argument signature of f in B.
4.D - ``Cat'' stands for ``Changed Availability or Type''.
6.A - For creation instructions the notion of being polymorphic is less dynamic than for calls. A call is polymorphic as soon as its target is, somewhere, the target of a polymorphic assignment. This is rather difficult to trace exactly, so definitions 1 and 2 take a somewhat pessimistic view (for example any formal argument is considered polymorphic because we do not want to look - recursively - at the actual arguments in all possible calls). For creation instructions we can be more exact without performing a system-wide transitive closure: the instruction will be polymorphic only if its creation type is anchored, redefined somewhere, or (see next note) generic.
6.B - Case 6.2 extends beyond the current language definition, which prohibits the use of a creation type as formal generic parameter. I believe, however, that a number of people would like to see this restriction removed. 6.2 makes this change possible and safe (should it be decided). If generic creation types indeed become valid, the revised validity rule (the new VGCC, page 286) should require the Creation_call to use one of the creation procedures of the base class of the generic constraint. Because 6.2 is beyond the current language, I have written it, as well as 7.2 which provides the corresponding constraint, in [[double brackets]].
7.A - To take into account the case of creation instructions with no Creation_call (just create x rather than create x.p (...)), this rule assumes the following convention: create x is viewed as using a Creation_call to the fictitious procedure no_creation, and a class with no Creators clause is viewed as having a Creation clause of the form creation no_creation.
7.B - For a compiler implementor, the exclusion of broken creation instructions essentially means that the compiler must redo the type checking of a routine containing a creation instruction in any descendant that redefines the target's type, or in every descendant if that type is anchored or generic.
Many implementations will provide the ability to precompile a set of classes and distribute it without the full source code, but only with an interface specification - the ``flat-short form''.
In a class of such a library you may find a query returning results of a certain type. (A ``query'' is a function or an attribute; the flat-short form does not distinguish, for information hiding purposes, between functions and attributes.) If you write client software and do not have access to the library's source code, you cannot know whether the query is polymorphic.
This suggests two possible approaches for compiler developers:
Approach 9.A, the more restrictive one, is easier to implement. It is in line with the treatment of external functions (see note 1.E), but may be incompatible with the expected increasing reliance on precompiled libraries.
If approach 9.B is used, the convention could be an asterisk following the type name to indicate a polymorphic result, as in:
spanning_tree: TREE* [NODE]
(To avoid any confusion, remember that this is not a language extension, simply a convention for the output of the Eiffel flat-short tool that produces class interfaces. Roger Browne has suggested using, rather than the asterisk, a form such as polymorphic TREE, perhaps more in line with other Eiffel conventions.)