Eventsourcing Patterns: Forgettable Payloads

Store the sensitive payload of an event in a separate store to control access and removal.

By Mathias Verraes
Published on 13 May 2019



Forgettable Payloads

Store the sensitive payload of an event in a separate store to control access and removal.

Problem

An Event Store is an append-only chronological database of Domain Events, so ideally we never remove any event or event data. This poses a problem when data needs to be deleted, for example after a customer invokes their right to be forgotten under the GDPR.

A different problem (with a similar solution) exists when the events in the Event Store are emitted to outside consumers, but some events contain some information that must remain private to the producer.

Solution

(Note: the second problem can also be solved using Explicit Public Events or Segregated Event Layers.)

Remove the sensitive information from the definition of the event. Instead, store that information in a separate (logical) database. In the event, store a reference or URL that points to the payload’s location in the database.

Consumers still receive the events, but to read the sensitive data, they need to query the payload database. Tightly control the access to that database, so that only consumers with the right permissions can read the sensitive information. Consumers without access, can still depend on the rest of the event’s non-sensitive payload.

When a deletion request is received, remove the payload from the payload database, and leave the events in the Event Store untouched. Consumers that query the payload, get no result for a deleted payload, and show a blank instead.

Instead of deleting the payloads, they can also be replaced with a John Doe; that is, a set of fake data to substitute the real data.

Example

Original event:

CustomerHasSignedUp {
    customerId: 123,
    name: "Isaac Newton",
    organisation: "Trinity College, Cambridge",
    birthday: 1643-01-04,
    hobbies: "Physics",
    signedUpOn: 2019-05-13
    status: "Premium member"
}

Redesigned event:

CustomerHasSignedUp {
    customerId: 123,
    personalData: "example.com/personalData/789"
    signedUpOn: 2019-05-13
    status: "Premium member"
}

Body of example.com/personalData/789:

{
    personalDataId: 789,
    customerId: 123,
    name: "Isaac Newton",
    organisation: "Trinity College, Cambridge",
    birthday: 1643-01-04,
    hobbies: "Physics",
}

After a removal request: 

{
    personalDataId: 789,
    customerId: 123,
    name: "John Doe",
    organisation: "(removed under GDPR removal request #456)",
    birthday: null,
    hobbies: "(removed under GDPR removal request #456)"
}

Weaknesses

The Forgettable Payloads pattern requires to maintain a separate storage, so it breaks the concept of an Event Store as the Single Source of Truth in an eventsourced system. Consumers of the events can now no longer depend on the events alone. Instead, after receiving an event, they need to query the payload database. This increases coupling and complexity.

A second weakness is that a consumer might receive the event, query the sensitive payload, and then store that information locally When a delete request arrives, the consumer also needs to remove its local copy. This problem is not unique to eventsourcing, but there’s an elegant solution: the producer emits a GDPRDeletionWasRequested {customerId}, and the consumers react to it. A good habit is to never store the sensitive payloads, and make sure consumers always query the payload database.

Finally, sometimes a person can be identified not only through personal information, but through the associations and relations with other information. Say Alice exchanges messages with both Bob and Charlie. Alice requests a deletion and becomes Jane Doe in our records. We don’t know who Jane Doe is, but we now that Jane Doe is a friend of both Bob and Charlie. The Forgettable Payloads pattern offers no solution here.

Alternatives

The Crypto-Shredding pattern is an alternative solution to the Forgettable Payloads pattern.