Riding the Risk Railway

2020-03-09

When building and operating a user-facing system, especially one that is open to the public, it is important to consider the riskiness of a user, which can also be characterised as trustworthiness. These will typically be negatively correlated, with low trust indicating high risk and vice versa, but this is not always the case.

In order to better characterise this relationship, and how to judge the riskiness of a user, we can model the user lifecycle (for example, from signup to checkout) as a railway, complete with signals and switches.

A screenshot of 'Night on the Galactic Railroad', Copyright Group TAC / Nippon Herald Films 1985

'Night on the Galactic Railroad', Copyright Group TAC / Nippon Herald Films 1985

As a user moves from a position of high risk and low trust, they may demonstrate trustworthiness through their actions. Conversely, they may also demonstrate riskiness by their actions, and therefore treatment must differ. A somewhat useful model to keep in mind here is a credit score, where by demonstrating your ability be “responsible” with credit lines (at least, in the eyes of a credit agency) your score is improved, and you have an easier time taking out further lines of credit.

An example

Let’s start with looking at how we might want to change the path a user goes down post signup, based on the relative risk posed by the information they have provided. For this, we’ll use actors Giovanni (Gio) and Campanella (Cam):

{
  "name": "Giovanni",
  "email": "giovanni.giorno@example.com",
  "user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36",
  "accept_language": "fr-CH, fr;q=0.9, en;q=0.8",
  "ip_source": "Acme ISP, Zurich, CH"
}
{
  "name": "Campanella",
  "email": "c.a.m.p.an.ell.a.12.97.10+0xBADCAFE@example.com",
  "user_agent": "Mozilla/5.0 (Linux; Android 7.0; SM-G925T Build/NRD90M; en-us) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Mobile Safari/537.36 Puffin/6.1.4.16005AP",
  "accept_language": "",
  "ip_source": "DigitalOcean, Frankfurt, DE"
}

These are some clearly contrived examples, but it should be broadly clear that Cam has a higher risk profile than Gio. In fact, factors such as these can be quite telling, resulting in a black market in stolen browser profiles.

At this point, we can define two key terms: signals and switches. In a railway line, signals provide advance warning to a driver. In this case, the “driver” is our system allowing a user to sign up, and the “signal” is an interpretation of the data provided at signup, represented by the two JSON documents above. For our purposes, we’re going to call this a score:

{
  "profile": "7595ffe4fad5ba43a21d93e48ae5330a657e8aa05aabd12ee9786e86a06337eb",
  "score": 0.13
}
{
  "profile": "da6ba8eee2101817c876d0f282f82bb5b36939e958fe1eba7c352b96ef6f5753",
  "score": 0.86
}

This isn’t something that maps to any real world systems, and is only an illustrative example.

In railway jargon, points are pieces of moveable track that are controlled by the signalling system to control the flow of trains through the network, and to ensure trains go where they are supposed to go. Our equivalent here is a switch, a system or sub-system that changes the flow of data depending on our signalling system.

In this example, when our signal is above a specific threshold, a switch is activated that forces the user to complete a challenge before the signup is completed. You can see this visually below:

A graph showing a user entering a signup system, with a fork where a score threshold is checked.

In this case, Gio is able to complete their signup without issue, but Cam must pass the additional challenge. This challenge is left undefined intentionally.

Going forward

It’s possible to have many such systems or sub-systems, each with their own sets of signals and switches, as a way to control risk from users.

For example, a new user may not be able to make purchases over a certain value with additional identity verification to reduce fraud, or a user that has made lots of returns may not be able to make further purchases. This combination of systems can be thought of as the risk railway that users must ride, where some will make it to their destinations, some will not, and some will have to change at Mornington Crescent.

Some folks may also recognise this as a kind of “circuit breaker” pattern, where should a threshold be exceeded, further attempts to transition into a new state are prevented due to previous behaviour, such as a large number of failing requests to a third-party endpoint suddenly. Rather than cascade the failures, we trip the circuit breaker to insulate the calling component from the remote failure.

In our case here, the previous behaviour is indicated by the signal, and the switch acts a circuit breaker, either stopping the behaviour entirely or requiring further action in order to proceed (“half open”). By implementing switches, this also allows for further logging and correlation to be drawn between the features of those users, allowing for a feedback loop to refine the threshold of the switch.

Recap

Handling risk posed from users, be it fraud, abuse or simply being a nuisance, can be tricky to model. A risk railway is one way of modelling this problem, using signals to inform a control system made up of switches to help users assert their trustworthiness and confidently deny users that have been deemed too risky. In the example above, Gio gets to ride the Galactic Railway, Cam must get off at the Coalsack.