In Defense of Parameterized Types
Originally published: 2009-01-17
Last updated: 2009-01-17
Parameterized types, also known as generics, were added to the Java language with the 1.5 release (JLS 3rd edition), and have been a point of contention ever since. It doesn't take long to find a well-known pundit condemning parameterized types as “half implemented.” Usually, the condemnation is based on comparison with C++ templates, and how Java doesn't provide the same functionality.
In the interest of full disclosure, I have to say that I am a novice user of C++ templates. I was still working with C++ when they were introduced (in the second edition of Stroustrup), but did not make extensive use of them. As I recall, one reason was that they caused problems with shared library initialization in our cross-platform application. Since that time, the Standard Template Library has appeared, and I've read that the modern templating engine actually provides a Turing complete programming language. For all I know, it may be self-aware.
As a result, I focused on the benefits of Java's parameterized types rather than their limitations. However, before talking about those benefits, I need to face some of the limitations head-on.
The Ugly Bits
Not Actually Typesafe
One of the common uses for parameterized types are “typesafe
collections”: collections that only accept values of a specific
type. For example, the following code won't compile, because the list
is parameterized with Integer
, and the code is
trying to add a String
:
List<Integer> data = new ArrayList<Integer>(); data.add("a string");
However, the ArrayList
instance doesn't know that it's only
supposed to contain instances of Integer
. That information
is “erased” by the compiler; the bytecode deals with instances
of Object
. You can easily cast a parameterized list it to a
non-parameterized list and add your string, with only a warning from the
compiler. And then, when you run the code, thinking that you have a truly
typesafe list, a ClassCastException
tells you that you don't.
List data = new ArrayList<Integer>(); data.add("a string"); List<Integer> data2 = data; Integer value = data2.get(0);
While many people condemn this behavior, I consider it a minor flaw: if one part of your program is putting strings into a collection that another part thinks should contain numbers, you've got bigger problems. Problems that a truly typesafe collection wouldn't solve. And moreover, if you think that C++-style templates would prevent the problem, you might want to read this.
Collections Don't Follow Array Inheritance Model
When I first saw parameterized collections in a 1.5 preview presentation, I remember being astonished that they did not follow the inheritance model established by arrays. Consider the following:
Object[] foo = new String[] {}; ArrayList<Object> bar = new ArrayList<String>(); ArrayList<? extends Object> baz = new ArrayList<String>();
The first line compiles, because String
is a subclass of
Object
, and therefore Java considers String[]
to be a subclass of Object[]
. To a programmer who thinks of
lists as equivalent to arrays, the second line should compile as well,
but the compiler rejects it with an “incompatible types” error.
To make the compiler accept this assignment, you need to use the third
line, which uses a wildcard parameterization (“extends
Object
” is unnecessary in this case, but it makes the example more clear).
What astonished me was that the compiler clearly recognizes the that the two
parameterizations are related, otherwise it wouldn't accept the wildcard. Why
then, is it unable to infer the relationship without the wildcard?
The answer is that the compiler doesn't know what the parameterized type
does with its parameters. Unlike arrays, collection classes are not part
of the language: to the compiler, a List
is an arbitrary
class that could do anything at all. To make this distinction more clear,
consider the following:
Comparator<String> foo = // some implementation class Comparator<Object> bar = foo;
Clearly, a class designed to compare two strings could not be applied to arbitrary objects. Therefore, the compiler rightly rejects this assignment statement — which is equivalent to the second line of the previous example.
The wildcard gives the compiler more information about the use of the parameterized class: it tells the compiler that there is a relationship between instances with two differing parameterizations, provided that the types used for parameterization are compatible.
Too verbose
To quote Emperor Joseph II: there are “too many notes.” Consider the following, which might be used to cache database rows by their primary key:
Map<List<Object>,List<Object>> data = new HashMap<List<Object>,List<Object>>();
To me, the object is lost in all of the angle brackets. And it just gets
worse when such objects are passed to or returned from a method. I truly
wish that the typedef
keyword had been added to the
language at the same time as parameterization:
typedef List<Object> Key; typedef List<Object> Row; Map<Key,Row> data = new HashMap<Key,Row>();
Not the end of the world, to be sure. And you could always create concrete
classes Key
and Row
— in
fact, if you're writing a database cache, you should probably do just that.
The Good Stuff
At this point you might think my goal has been to “praise with faint damns,” but that's not the case. The ugly bits can get in the way of effectively using parameterized types, and I haven't even mentioned all of them. To better understand the mechanics of parameterization I recommend Angelika Langer's Generic's FAQ — which is over 500 pages long.
However, even with the ugly bits, parameterized types are extremely useful. And in my mind, their true value comes when you replace the generic parameters with concrete types. When you do this, you are add information to your program. Information that the compiler can use to check your work, and information that you as a programmer can use to understand the program flow.
Would you write a program in which each method specified its parameters and
return type as Object
? If not, then why would you do the
same with Collection<T>
? I find that it's very rare that
I need to work with generic collections — they're typically confined
to library methods. Elsewhere, I work with collections of concrete types,
and it's senseless to leave those collections generified.
API with Typed Collections
A typical API — including the boundary layers within an application — deals with specific object types. If that API takes or returns a collection of objects, it's valuable to know the type of objects in the collection. As an example, consider the API of a workflow application: in particular, the object that represents a job, which has a method to retrieve the active tasks in that job.
public interface Job { List<Task> getActiveTasks();
This is a case where you may feel that “not really typesafe” is
an issue. However, in practice it isn't: as I said above, if you can't trust
that your library won't slip some incompatible object into a collection, you
have bigger problems. Even in the case of the parameters of library methods,
it's not an issue: worst case is that the client passes invalid data, and
gets a ClassCastException
.
The only viable alternative is to use array types, and they come with their own limitations. Foremost of these is that arrays are fixed size, which is an inconvenience for both the client of our API (which can't easily remove tasks of no interest), and the server (which needs to know the number of tasks before building the array).
Associating Type Information with Untyped Collections
JDK 1.5 was released in 2004, yet there are still many libraries that use non-parameterized collections — including parts of the JDK itself. Often you will know the type of object held by the collection, so there's no reason that your code should be littered with casts. Instead, use a method like this to “attach” parameterization to the collection:
@SuppressWarnings("unchecked") public static <T> List<T> cast(List<?> list, Class<T> klass) { for (Object obj : list) { klass.cast(obj); } return (List<T>)list; }
The point of this method is to limit the number of SuppressWarnings
annotations in your code. The annotation is needed because the compiler isn't
very happy about the cast in the return statement: it has no way to know that
you validated the contents of the list before making that cast. While you could
use this annotation in your mainline code, doing so would hide any real warnings.
The klass
parameter serves two purposes. First, because it is
parameterized, it establishes the parameterization returned by the method
(note that this requires passing either an explicit X.class
value or a variable that is similarly parameterized). Second, it allows us
to verify that the list does indeed contain only instances of that class,
ensuring that the unchecked cast is safe.
In almost all cases, this method is sufficient: it establishes the type of
the list for the compiler, and the compiler won't let you subsequently add
instances of any other type. If you are truly paranoid that something might
add a different object, the Collections
class provides the method checkedList()
, which wraps the
original list in a decorator that prevents addition of other objects. But
again, unless your code intentionally bypasses the compiler's checks, this
wrapper is unnecessary.
Template Methods
One of the “Gang of Four” design patterns, Template Method extracts boilerplate code into a superclass method, that then calls subclass methods to perform specific operations. It is useful when the same general sequence of steps can be implemented in different ways, with different data.
As an example, a Swing application might use a subclass of my
AsynchronousOperation
to perform long-running operations
outside of the event dispatch thread. All such operations have the same two
steps: (1) load data on the background thread, and (2) display data or handle
exception on the event dispatch thread. The templated method in this case is
the run()
method, shown here sans exception handling:
public abstract class AsynchronousOperation<T> implements Runnable { public final void run() { final T result = performOperation(); SwingUtilities.invokeLater(new Runnable() { public void run() { onSuccess(result); } }); } protected abstract T performOperation(); protected abstract void onSuccess(T result); }
A typical subclass (also heavily abridged) might read the contents of a
file and use it to update a JTable
in the UI:
public class FileLoader extends AsynchronousOperation<TableModel> { private final JTable _table; public FileLoader(JTable table) { _table = table; } @Override protected TableModel performOperation() { // load the file and populate the model } @Override protected void onSuccess(TableModel result) { _table.setModel(result); } }
Note that the methods use concrete types; only the class declaration is
parameterized. How, then, are these methods called from the superclass,
which refers to them via (erased) type parameters? The answer is that the
compiler creates bridge methods that call the actual methods defined by the
subclass. Running javap
on the subclass shows this:
public class com.kdgregory.example.generics.TemplateMethodExample extends net.sf.swinglib.AsynchronousOperation{ private javax.swing.JTable _table; public com.kdgregory.example.generics.TemplateMethodExample(javax.swing.JTable); protected java.lang.Object performOperation() throws java.lang.Exception; protected javax.swing.table.TableModel performOperation() throws java.lang.Exception; protected void onSuccess(java.lang.Object); protected void onSuccess(javax.swing.table.TableModel); }
The Java compiler creates bridge methods whenever you provide a concrete
implementation of a parameterized method, including implementations of
parameterized interfaces. The bridge method verifies the type of the actual
argument (possibly throwing ClassCastException
), then
calls the concrete method. In the case of parameterized interfaces, such as
Comparator<T>
, this behavior is similar to a typical
pre-1.5 implementation, where the parameter is defined by the interface
as Object
, and you cast it and assign to a typed variable
for use.
Closing Thoughts
You may have noticed that throughout this article I used the term “parameterized types.” Not “generics,” and certainly not “templates.” I did this as a reminder that the chief benefit to these types comes when you apply concrete parameters, rather than using the class in a generic fashion.
To the JVM, code that uses parameterization is no different from pre-1.5 code using explicit casts. The benefit accrues entirely to the programmer: first, by removing the clutter and finger fatigue of those explicit casts. And second, by giving the compiler enough information to double-check your assumptions.
Copyright © Keith D Gregory, all rights reserved