The Power of Custom Types
You’re probably wondering why on earth you should read an article about something as dry as types in computer programming. That’s fair, it does sound dreadfully boring and if you’re not a programmer then for the love of cheese and biscuits leave now. However, if you are a programmer then I hope to at least make you think for a few minutes and maybe, just maybe, improve the way you code a little bit.
Please note: The key concepts in this article apply mostly to classically object oriented languages that have inheritance and the ability to at least type hint function parameters. So for example, languages like PHP, Java, Kotlin, Dart, and D (and probably many others: C++, Rust etc.. I’ve not used them so I can’t be sure).
On Confidence and Readability
I can’t remember which programmer said this or even what the exact words used were, but the idea was essentially this:
“You deserve to have confidence in your code”
This is a powerful sentiment and something I completely agree with. If you truly have confidence in your code then when that code goes into production, you (and your ops team) can sleep soundly.
In addition to having confidence in the integrity of your code, if you’ve been developing for a while then you know that readable code is gold dust. When code is readable, code reviewers will more easily pick up bugs. When code is readable, the business logic of your code is clear and easy to follow.
To improve the confidence level and readability of your code, I’m going to talk about a couple of well known strategies and then introduce the concept of making your own types.
Concept 1: Variable names matter
You’re probably aware that in regards to types, there are two different approaches that programming languages use: “Strong Typing” and “Dynamic Typing”.
When you write a program using a strongly typed language, you will declare variables that have a specific data type. These types generally include strings, integers, floats and booleans, and in-fact some languages have a great many types. C for example has more than 27!
In contrast, dynamic languages like Javascript, Python and PHP do not make you explicitly declare the type, which means you could encounter a function as delightfully confusing as this:
function registerUser(
$n,
$e,
$c,
$h) {
...
}
When a function is written as vaguely as this, it forces you to inspect the contents of the function in order to know how to use it. After all, you need to understand what the parameters are for and what data types should be used. This is frustrating and inefficient.
If instead the function looked more like like this:
function registerUser(
$name,
$email,
$cat,
$human) {}
then things are somewhat better. You can take an educated guess about what the data types must be for “name” and “email” (both strings). However, “cats” and “humans” are more difficult to guess. Is “cat” a name? A breed name? A number?
Things become clearer again when the data types are explicitly implied by the parameter names.
function registerUser(
$name,
$email,
$numberOfCats,
$isHuman) {}
Now you can infer that $numberOfCats is an integer and isHuman is likely a boolean value.
Recap: Without good naming, programmers may have to inspect the contents of functions to understand what the parameters are and how to use them which is bad for developer ergonomics and efficiency.
A couple of job roles ago, one of our longest serving programmers was in the habit of prefixing every variable with a lower-case letter that indicated the variable type. Using her conventions, our method above would have been written like this:
function registerUser(
$sName,
$sEmail,
$iNumberOfCats,
$bIsHuman) {}
key: “s” = string, “i” = integer, “b” = boolean
This convention is definitely helpful and interesting, but it does add a bit of cognitive load separating out the types and words, and thankfully there are better solutions these days.
Concept 2 — Strong Typing
One of the trends we’ve seen with dynamic languages in recent years is fixing the lack of strong typing support. In PHP, both method and object property type hinting has been introduced, and in the world of Javascript and front-end development, Typescript came along and made compile-time strong typing a reality.
Using strong typing, our method above could be defined like this:
function registerUser(
string $name,
string $email,
int $numCats,
bool $isHuman) {}
A few observations:
- When you encounter the function and read the parameters, you are immediately certain what kind of data is required making it easy to integrate and work with; and
- As a consequence, it’s no longer as important that parameter names indicate the data type;
A further huge benefit is that your language runtime (or compiler) and your IDE can now know when you’ve made a mistake (e.g. you’ve used the wrong data type for a parameter). This means you will know about and fix bugs as you code, rather than those bugs being spotted in code review, or worse, in production.
To illustrate, if in PHP we call our registerUser function with an incorrect data type, using a string value ‘fish’ for numCats, e.g.
registerUser(‘Ada’, ‘ada.lovelace@example.com’, ‘fish’, true);
then PHP’s runtime can see the error, and reports:
“<b>Fatal error</b>: Uncaught TypeError: registerUser(): Argument #3 ($numCats) must be of type int, string given”
So, with strong typing we are now in considerably stronger position. More bugs will be caught by our IDEs and more bugs will be caught by static analysis. Moreover, your team mates will have a clearer understanding of what your code is doing and how it works, and they will reach this understanding more quickly and easily. This in turns means faster code reviews, easier code reuse and less need for documentation. That’s a lot of winning.
But wait, that’s not the end of the story.
Consider all of these uses of the registerUser method:
// Error: No name provided
registerUser(‘’, ‘ada.lovelace@example.com’, 3, true);
// Error: An invalid email address provided
registerUser(‘Ada’, ‘invalid.email’, 3, true);
// Error: -1 Cats… interesting
registerUser(‘Ada’, ‘ada.lovelace@example.com’, -1, true);
// Error: a value that is too long for your database
registerUser(‘AdaAdaAdaAdaAdaAdaAdaAdaAdaAdaAdaAdaAdaAda’, ‘ada.lovelace@example.com’, 3, true);
// Error: A highly impressive and improbable number of cats
registerUser(‘Ada, ‘ada.lovelace@example.com’, 99999999, true);
All of these uses are technically valid as far as your programming language and your IDE are concerned, but none of them are inline with your business rules.
The solutions commonly used to mitigate these problems include:
- Validating incoming data from HTTP requests using Framework validation libraries;
- Having extensive validation checks inside our methods to ensure values are reasonable;
e.g.
function registerUser(
string $name,
string $email,
int $numCats,
bool $isHuman) { // Ensure name is not empty and not too long
// Ensure email is not empty and actually an email
// Ensure numCats is not < 0 and not > 20 // Now actual business logic follows
…}
Adding validation to our methods certain works and catches problems that will prevent bugs, but I’m going to argue that this approach is suboptimal. Here’s why.
Problem 1: Validation checks obscure the business logic.
This is an unavoidable consequence of having a bunch of if statements or framework calls that proceed the point of your function. If a code reviewer or team member has to evaluate your function then they have to understand and check your validation logic before getting to your business logic. Let’s not forget, the business logic is the point of your function — the validation is necessary fluff that obscures the important business logic.
Problem 2: Coding defensively means duplicated validation
As a good programmer, you know that your code should be divided into layers (this is separation of concerns). A common pattern in enterprise software for example might be:
Controller → Service →Repository
that is, an endpoint route calls a controller, which reads in some POST parameters and in turn calls a service with those post parameters, which in term calls one or more repository methods with those post parameters to persist or fetch data.
So for example, our innocent seeming “name” variable in our registerUser function might be passed through 3 or more different layers and functions.
If you are coding defensively then you ought to be validating the constraints of that “name” parameter in all 3 layers. Why? Because sure, your service might be called from a validated controller method right now, but it may also be called by a command line task, or an event handler in the future. Similarly, your repository methods might be called in many contexts and they in turn must be sure that the value of “name” is a string, and not empty and not too long.
Thus, you may find that your validation rules are invoked or implemented multiple times and end up obscuring your business logic in lots of places.
To rub salty sand into the wound, if you decide that actually your “name” variable should now be constrained to 75 characters rather than 50, you may have to trace all validation attempts for “name” through your app and update them all accordingly. You may have consolidated such validation rules in one place, but depending on your language and framework or your thinking at the time, maybe you didn’t. And that’s really painful and error prone.
Concept 3: Custom Typing
Now imagine our registerUser function looks like this:
function registerUser(
Name $name,
EmailAddress $email,
WholeNumber $numCats,
bool $isHuman) { // The business logic is right here — all the the validation has gone.}
What you’ll notice immediately is that:
- The primitive types like string and int have been replaced with custom types like “Name”, “EmailAddress” and “WholeNumber”; and
- The validation logic is gone from our function;
The core idea is this:
We create a series of custom objects that become types. In the example above, we would have a “Name” class. This class would hold a “name”string value but also:
- Ensure the value is a string;
- Ensure the value cannot be empty / blank;
- Ensure the value cannot be longer than say 40 characters (preventing database overflow);
If any of the above rules were broken, an exception would be thrown by the class constructor.
So to illustrate:
$name = new Name(“”);
$name = new Name(45);
$name = new Name(‘THIS_VALUE_IS_TOO_LONG_AND_WE_WONT_ALLOW_IT’);
would all fail and throw an exception.
In a similar fashion, the EmailAddress object would ensure that the value it holds is actually an email address, and the WholeNumber object would ensure that the value it holds is a value >= 0. No more -1 cats!
By creating our own type system and moving the validation of values into dedicated type classes we:
- Validate values just once;
- Remove validation entirely from our key business logic methods leaving them clean and as pithy as possible;
- Make it impossible to have errors like ‘name has an empty value’ or ‘name is too long’ in our service or repository layers;
- Ensure consistent application and treatment of types (e.g. ‘Name’ in one method cannot be treated different to another);
- Allow us to write comprehensive unit tests for our types, making us extremely confident that they are handled consistently and fully;
When you create your own custom types what you’ll probably find is that you’ll end up writing the same kind of logic over and over in your type validations. For example, types like “FirstName” and “LastName” will have very similar validation code.
To mitigate this, define an abstract base class and make use of inheritance.
Here’s an example in PHP:
abstract class ConstrainedStringImmutable
{
/** @var string */
private $value; public function __construct(
string $value,
int $minLength = 0,
int $maxLength = 0
) {
$stringLen = mb_strlen($value, 'UTF-8'); if (($minLength > 0) && ($stringLen < $minLength)) {
throw new ConstraintException(sprintf('Invalid %s, value must be at least %d characters', static::class, $minLength));
} if (($maxLength > 0) && ($stringLen > $maxLength)) {
throw new ConstraintException(sprintf('Invalid %s, value must be no longer than %d characters', static::class, $maxLength)); } $this->value = $value;
} public function __toString(): string
{
return $this->value;
}
}
What you’ll notice above is that we have a base class that can enforce minimum and maximum length constraints, taking into account UTF-8 encoding.
We can then extend the class to use it like this:
class FirstName extends ConstrainedStringImmutable
{
public function __construct(string $value)
{
parent::__construct($value, 1, 30);
}
}class LastName extends ConstrainedStringImmutable
{
public function __construct(string $value)
{
parent::__construct($value, 1, 30);
}
}
We now have two custom types, FirstName and LastName that are slightly different. FirstName must be between 1 and 30 characters long. LastName must be between 2 and 40 characters long. Both share the same underlying validation logic so it’s written only once.
Hopefully you can see how easy it is to define classes that are custom types that are tailored to fit your business logic like a glove.
Once you start doing this, you will find yourself thinking “it’s not a string, it’s a EmailAddress”, or “it’s not a number, it’s an Age”. The primitive data types will start to feel vague and inadequate, just as having function parameters with poor names once did. Moreover you’ll no longer resent having to consider validation in functions where you just want to crack on with the business logic.
One last benefit that should be mentioned is that once you create your custom types, if you are working in a company where domain concepts like data types are shared between projects, you can create a central code repository where you define your data types once and import them into the projects that need them. This helps to build a consistent domain language and ensures consistent treatment of data across your projects.
The Downsides
Obviously I’m a believer in this technique of creating custom data types and I think it’s advantages outweigh the considerations I’m about to raise. However, it wouldn’t be a fair article if I didn’t go into a few down sides.
Boiler plate
Nobody likes typing more than is necessary or having to create lots of additional files. When creating custom types, not only do you have to create the custom types themselves but you may also have to create other classes to help you convert data from one form to another.
For example, say you’re using a framework that uses the “Active Record” pattern for database access. You use a Model class to fetch a user like this:
$userModel = User::findById(1)
The user model object that is returned by the framework has primitive types (strings and ints).
E.g.
// This is a primitive string
$userModel->name;// This is a primitive int.
$userModel->numCats;
Instead, you want a user object that has your custom types. To convert data from one form to another, you could make use of the Adapter pattern.
e.g.
$user = $userAdapter->fromModel($userModel);
The adapter method will probably look something like this:
public function fromModel($userModel): User
{
return new User(
new UserId($userModel→id),
new FirstName($userModel→firstName),
new LastName($userModel→lastName),
new EmailAddress($userModel→emailAddress),
…
);
}
So, introducing custom types may lead to you needing to introduce additional classes like Adapter classes to convert data from one form to the other. I don’t find this a huge chore but others may vomit.
Memory usage & Performance
When using custom types we end up creating lots of custom objects for our types. Most of the time this is not a problem but just be aware that there is an additional memory and potentially performance overhead. If you find yourself dealing with 10s of thousands of rows from a database in a single process or endpoint, then the act of creating all those objects in a loop will end up chewing up quite a bit of memory and also have a significant performance penalty.
When this is the case, you can bypass the use of the custom objects for this particular context. It’s not great, but programming is always a compromise and you have to make sensible decisions based on your needs.
Sometimes validation is too much
Last but not least, creating custom types encourages you to put constraints on your data. This is usually an excellent thing, but you may find times when the data surprises you. This is particularly the case when you’re introducing types into an existing application and you don’t know the full history of the database and business rules.
Consider for example that as far as you know, a user must always have a “company name”. This makes sense as the user must enter the company name when they fill in the registration form.
However, years ago before you joined the company it turns out that users did not provide a company name at all.
In your local development database there are no instances where this is the case so you don’t notice it, however in a testing environment or in production, actually there are users that have an empty or null company name. Unfortunately when such a user tries to login, thanks to your custom types, an exception is now thrown when trying to create your user object because the CompanyName custom type says “Hey you can’t have an empty company name!”. Ut oh, you’ve now stopped a paying user from logging in!
Tips for how to avoid this:
- Always check the database schema. If a company name can be null, the database schema will tell you so;
- If you have access to production / testing environments, check your assumptions there with a simple SQL query.
- Also check your assumptions with product managers and other developers that have been around in the company for a long time — they will more than likely know these edge cases and stop you from making an error of judgement;
- Match your size constraints to your database. If your database allows 50 characters for the first name, then match that with your type class so there’s a 1:1 correlation with your business rules and database storage.
So that’s it folks. Let me quickly recap the main points.
Using custom types increases code confidence and readability by:
- Moving validation rules into heavily unit tested type objects;
- Removing validation from where it doesn’t need to be, allowing functions to get straight into the business logic;
- Providing clear meaning and removing ambiguity, allowing faster code reviews and also allowing other developers to more easily understand how to use your functions;
- Allowing you to develop shared types that match your business domain and are used across the business in different repositories;
- Reducing programmer cognitive load by knowing that when they use these types in their own methods, they don’t have to worry about things like ‘what if this is empty’ or ‘what if this is the wrong type’.
The downsides were:
- The likelihood of you needing to write additional code such as Adapter classes;
- The impact of your objects on memory and performance;
- Validation causing issues due to false assumptions;
Thanks for reading. I hope you got something out of this article and I encourage you to give custom types a go, even just to see how it feels.