Don't call it *_id!

An abstract blue and teal coloured Ruby circuit board, AI generated

In a distributed system, we often need to store data that we do not own. We might use it as a unique identifier across domains or the system we do own needs to proxy it to yet another service.

In many cases, the origin system exposes it in the same way it is represented internally, for instance: a foreign key name of a relational database:

Fig. 1: local relational table, rev. 1

id	name	backend
42	Marley Spoon	elixir, ruby, python

Translated to JSON, we could send it over the wire as follows:

Fig. 2: example JSON payload

{ 
  "id": 42, 
  "name": "Marley Spoon",
  "backend": ["elixir", "ruby", "python"]
}

A service consuming the information may want to add prefixes to scope it for “companies” and store/cache it like so:

Fig. 3: local relational table, rev. 2

company_id	company_name	backend
42	Marley Spoon	elixir, ruby, python

But here’s the problem: as engineers, we look at _id fields and immediately think of it as integers representing a unique way to reference something. However, the consuming service has no control over the data it receives and the data type is only assumed.

If you use that distributed ID field as a local foreign key: some external system controls the value and an unforeseen change might break our setup.

Identification

I have had good experiences using the pattern *_identifier instead. It indicates that…

it is some kind of a unique identifier
some other system has control over it

If the *_identifier value is to be stored, it should always be saved as a string type. Almost anything can be coerced into a string, and that way we guarantee that the origin system can choose whatever they want for their unique identifier.

This is particularly true if the origin system decided to move to using UUIDs. A final version of the local relational table above could look like this:

Fig. 4: local relational table, rev. 4

company_identifier	company_name	…
328129ae-df4e-4168-94d3-2572b4b343ef	Marley Spoon	…

Payload

If the system exposing the data is controlled by your organisation, we can support this at the source.

It is a common pitfall of API designs, especially RESTful APIs, to expose a resource exactly like you represent it in your database. This makes sense to reduce the cognitive load of the team maintaining the API. However, the data layer will inevitably change, rendering this point moot: the DB representation has to be translated to maintain a stable contract. Why not abstract from the data layer to begin with and name the keys in the payload in a system-agnostic way?

Fig. 5: JSON payload abstracted from persistence

{ 
  "company_identifier": "328129ae-df4e-4168-94d3-2572b4b343ef",
  "company_name": "Marley Spoon",
  "backend": ["elixir", "ruby", "python"]
}

Summary

Use *_identifier instead of *_id fields for externally owned data
Prefer a string type over integer for *_identifier values
Avoid a 1:1-map of your persistence model to your external API

Don't call it *_id! A naming convention against perceived stable and unique IDs in external APIs.

Identification

Payload

Summary

Don't call it *_id!
A naming convention against perceived stable and unique IDs in external APIs.