By Sven Amann (@svamann
), Sarah Nadi (@sarahnadi
), and Titus Winters (@TitusWinters
Associate Editor: Sridhar Chimalakonda (@ChimalakondaSri)
Software libraries and their Application Programming Interfaces (APIs) are essential for proper code reuse that leads to faster software development and less software bugs. However, a poorly designed API can end up causing more harm than good: clients of the API may incorrectly use certain functionality leading to runtime failures, performance degradations, or security issues.
Over the last couple of decades, several program analysis techniques have been proposed to catch bugs arising from incorrect API usage . Some of these techniques rely on mining patterns of API usages, whether from code or logs, and finding violations of these patterns, while others rely on checking that client code respects explicit annotations in the API code.
Some languages come with annotations that allow the API designer to indicate certain restrictions on expected values or behavior (e.g.,
in Java). Such annotations allow static checks of client code to warn the developer, e.g., when a null reference is potentially passed in a place expecting a non-null reference (e.g., IntelliJ has support for this https://www.jetbrains.com/help/idea/nullable-and-notnull-annotations.html
). However, unfortunately, such annotations are still not consistently used by API designers. Additionally, the space of possible violations and things that can go wrong is unbounded. Thus, there may be desired checks or API design problems that have no corresponding annotations. That is: even meta-design via introducing annotations cannot solve all design problems. At some level, some design problems can only be identified through experience.
Moreover, the above checks and similar analyses are all posteriori measures that typically catch only a subset of the problems. In this blog post, we take the stand that we should try building APIs that are hard to misuse in the first place. We provide a set of lessons learned from common examples of problematic API design decisions. These lessons are based on our own experiences, whether as part of industrial work or part of our research, as well as examples from various sources. This list is by no means comprehensive, but a start for collecting common pitfalls of API design. The idea for this blog post was born out of our discussions during the 2nd International Workshop on API Usage and Evolution (WAPI ‘18)
in May 2018 in Sweden.
Lessons from Bad API Design
Lesson 1: Avoid the Name/Action Mismatch
Most design guides (rightly) point out that types should be named as nouns (
), mutating/non-const methods on those types should be named like verbs/actions (
), and const accessors should be named like adjectives (
) or actions (
)). Sometimes, we don’t get that right.
In the C++ standard library, all containers (
, etc) have an
method on them. In English,
is both an adjective ("The glass is empty") and a verb ("Please empty the trash.") As such, it is entirely possible to encounter code that invokes
and does nothing with it.
std::vector<int> v = GetInts();
v.empty(); // Does nothing!
v.clear(); // Does what the author of the previous line probably meant
In the Java Class Library (JCL),
does not actually set the scale on the receiver, but instead returns a new object with a different scale
Another example is the
method on Java’s stream API. The name suggests that by calling this method, the code peeks at the first element in the stream. However, calling the method actually just returns a copy of the stream and nothing happens. The peeking only actually occurs when the stream is processed, e.g., by calling
Be very sure about your choices in naming. Make sure to consider all interpretations of your chosen API names. Be extra cautious about possible interpretations from non-native speakers. In the event that your chosen names can be misinterpreted, but changing the name isn’t an option, it may be useful to try to mitigate the problem. In C++, this has been done by adding the
attribute to the function. With this attribute, a compilation warning will be issued if the return value of the function is not used in some fashion. It’s likely that any pure function (
) and any side-effect free const method (
) should be tagged
to help identify bugs in client code.
Lesson 2: Avoid Poor/Unclear Type Invariants
In many respects, type design in an OO language is about identifying the logical state of a type (how you describe the type, or the mental model of a programmer using your type), the physical state of a type (the data members that implement that logical state), and a set of invariants on the logical and physical state.
For instance, a type like
is logically “a contiguous buffer of characters of length
, with at least one ‘\0’ character at position
.” Both these conditions can be considered the invariants of string; in other words, these are conditions that must hold at all times. Internally, a string can be represented in various ways by various standard library implementations, but a programmer does not need to know anything about the representation (the physical state) - the type provides an abstraction.
In order to do this, we rely on mechanisms such as data hiding - making data members private so that users of the type cannot arbitrarily change the internals. This is important: if the various pointers and lengths in a
were directly accessible and modifiable, it is only a matter of time before someone accesses something inappropriately and breaks our invariants - perhaps we are no longer null-terminated, or perhaps the allocation for our character buffer has been deleted. However, sometimes the invariants of a type are unclear or “poor”, in the sense that objects of the type may reach logical states in which the invariants do not hold. Especially for value types, having unclear invariants or a vague model of what a specific type actually is leads to types that are unnecessarily hard to use.
Many designers argue against “two step initialization.” That is, types that can be constructed separately from becoming usable.
Foo f; // Constructed, but not useful yet. Must call Init()
if (!f.Init(10)) throw Error(“Failed to initialize!”);
f.Run(); // Do whatever we’re supposed to do.
If we consider this from the perspective of invariants, this becomes clearer. In a two-step initialization design, every function needs to be labelled to describe whether it’s allowable to call that function before calling
. In the example above,
cannot be called unless the object is initialized using
. As the class gets bigger, merely documenting which methods are safe to call when becomes more complex - and the odds of users doing it wrong go up.
It’s much better when your type has simpler invariants: if it is constructed, it’s valid.
Foo f(10); // May throw if initialization failed
f.Run(); // Do whatever we’re supposed to do
Unfortunately, two-step initialization designs are still common, and the resulting types are consistently error prone. For instance, the design of
shows this design smell: the majority of the API for
cannot be used after construction, it requires a secondary call to
C++ Example 2:
In modern C++, there is also an inverse to two-step initialization. In C++11, the language added support for “move semantics” - an optimization of copying for an object where the source object is known to be unnecessary and can be consumed in the process of moving into the destination object.
std::string s1 = GetString();
std::string s2 = std::move(s1); // Contents of s1 has been moved into s2
s1.clear(); // s1 still exists, but is in an undefined state.
//You can only safely call methods that have no preconditions/make no assumption about its contents.
In order to avoid adding a “moved-from” state to every type, the language says that a moved-from object is left in a “valid but unspecified” state - the invariants of the type are still upheld, but you cannot reason about the state. This allows the object to be used for operations with no preconditions (you can still call
on a moved-from string, but you cannot assume whether or not a call to
Moved-from objects and invariants work just like two-step initialization - sometimes people design types where the moved-from state no longer obeys the rest of the type invariants. For instance, a smart pointer that expresses sole ownership (a la
) and whose content is declared to never be null. When such a type is moved-from (the owned pointer is moved away) the result either violates the sole-ownership invariant or the not-null invariant. Such a
is a common design with broken invariants.
Many of the Java cryptography APIs also rely on a two-step initialization. The
is one such example:
Cipher encryptCipher = Cipher.getInstance("AES/CBC/PKCS5Padding");
byte cipherText = encryptCipher.doFinal(plaintTextByteArray);
Similar to the C++ example above, calling
will not work. Simply calling
does not guarantee that the invariants of the type hold. Instead, one has to call
to initialize the cipher and get an actual cipher object that respects the description of the class
: “This class provides the functionality of a cryptographic cipher for encryption and decryption”.
Be clear what your type is, and document its invariants. Make sure that those invariants are as simple as possible and that they actually hold. Don’t add invalid states - neither at construction nor destruction (nor moved-from).
Lesson 3: Avoid Bad Default Values
To give the developers flexibility in which values they want to specify, and to simplify usage, many APIs come with default values for certain parameters or required fields. In Java, this is typically achieved through providing various overloaded methods or the Builder design pattern. While default values make it easier to create the necessary objects, developers may not be aware of what these defaults are. If the defaults chosen in the API’s design are problematic for any reason, then the default behavior of the API becomes problematic.
In the Java Cryptography Extension (JCE) library, which contains Java’s default cryptography support, the default value for the encryption mode when using the popular AES cipher in the
is the Electronic Codebook Mode (ECB), which is insecure . Thus, client developers who rely on the API’s default values end up using insecure encryption in their applications. Code reviewers might easily miss the problem, because the insecure setting is not visible in the client code. API developers cannot easily fix this problem, because changing the default value would break existing client applications that already encrypted values using the (insecure) default value.
Choose reasonable default values for your API.
If any default value may become unreasonable (or even critical), consider making it an explicit required parameter. This would at least allow the bad default value to be spotted during code review.
Clearly document these default values.
Lesson 4: Avoid Stringly-typed Parameters
A parameter is stringly-typed
, if its type is string, but it expects a more specific value that could be more appropriately typed. As a result, client developers cannot make use of many assistance tools, such as static type checkers or completion engines, when programming against the API. This makes the API harder to learn and more easy to misuse.
In the Java Cryptography Extension (JCE) library, which contains Java’s default cryptography support, all parameters to instantiate a
are encoded in a single string parameter that expects a value of the form “//”, where the last or both substrings beginning with “/” may be omitted. It is difficult for client developers to know which values may be passed to this parameter. Moreover, when encountering a valid instantiation, such as:
Cipher cipher = Cipher.getInstance(“AES”);
a client developer cannot see that the parameter value specifies only one of three actual parameters.
Use specific types instead of encoding values as strings.
Declare separate parameters for separate values, instead of joining multiple values in one string.
Lesson 5: Avoid Incorrect Placement of Functionality
Placing a method on a class where it is unexpected may lead to clients using it in unintended ways.
Java’s primitive-type wrapper
has a method
. This method returns true if there is a system property with the given name and the value “true”, and false otherwise. Developers might easily confuse this method with parse(String), which converts a string (“true”/”false”) to the corresponding boolean value.
Place methods on an API where it’s functionality is likely to be expected, e.g., the System class for querying a system property.
Poorly designed APIs can be confusing to client developers, and may lead to potential misuse of the API. In the above, we gave some examples of cases we have personally seen in the past. This is by no means a comprehensive list, and we hope that more developers contribute such examples, which can help in creating a set of design principles for API creators.
 M. P. Robillard, E. Bodden, D. Kawrykow, M. Mezini and T. Ratchford, "Automated API Property Inference Techniques," in IEEE Transactions on Software Engineering, vol. 39, no. 5, pp. 613-637, May 2013. doi: 10.1109/TSE.2012.63
 Block Cipher Mode of Operation. Wikipedia. https://en.wikipedia.org/wiki/Blockcipher