At meshcloud we've implemented a declarative API and the biggest challenge has been the declarative deletion of objects.
That's why in this blog post I want to answer the question:
How do I implement deletion for a declarative API?
Ahead I cover the challenges we ran into, how other systems solve it and which solution we applied at meshcloud.
Shoot through to my blog post about implementing a declarative API if you want to start at the beginning of this two part endeavour.
For the topic of declarative deletion let's start by having a look at the advantages of declarative deletion:
Why handle deletion declaratively?
If your use-case fits a declarative API you should also think about the deletion of objects.
If an external system syncs objects into your system and also ensures that objects are deleted when they are no longer present in the primary system, a declarative API simplifies client code a lot.
If deletion could only be executed in an imperative way, the client would have to find out which objects actually have to be deleted in the target system to call the delete endpoint for those objects accordingly.
Let's have a look at the group synchronization process:
As you can see the client somehow needs to build a diff between the desired state and the actual state in the target system to determine which objects have to be deleted. Moving this complex logic to the server side extracts this high implementation effort to only do it once instead of n-times for different clients.
Additionally it can provide a big performance improvement as quite a lot of data might be needed to do the diff. If that data only needs to be handled by the backend, you can get rid of network latency and bandwidth limitation for getting that data to the client.
In case of handling it at the client you may also struggle with outdated data, as getting the current state and processing it takes some time during which the state may already have changed in the target system.
How to implement deletion for a declarative API
The central conceptual question for deletion is
"How can you identify which objects need to be deleted?".
As not only one client will be using your API, a central aspect is to group objects into a
Declarative Set. If one item in this set is missing, it will be deleted.
But items from another set are untouched. For understanding how to solve declarative deletion, let's have a look at actual implementations of it that are productively in use.
Another implementation challenge is an efficient algorithm for comparing the actual state with the desired state. This topic isn't covered in this blog post. Aspects like reducing the amount of DB queries, doing bulk queries for the objects that are part of the
Declarative Set as well as bulk deletions are things that should be considered.
How do existing declarative systems handle declarative deletion?
In Terraform, you define a
tf file that contains the desired state. When this state is applied via terraform, it writes a
tfstate locally or to a shared remote location. That way Terraform knows which state is actually expected in the target systems. It can create a diff between the actual state (
tfstate) and what is expected to be the new state (
tf ). This requires an up-to-date state file. It deletes all resources that have been present in
tfstate but are no longer present in
tf file is basically what I described as a
Declarative Set before. So in case of a shared remote location that keeps track of the state, it is always related to the
tf file and will ever only delete resources that are in the related
Terraform also provides a
terraform plan command, which will show you the actual changes Terraform would apply. That way you can verify whether the intended changes will be applied.
Additionally Terraform provides an imperative CLI command to delete specific resources or all resources in a
tf file (
-target can be defined optionally to delete specific resources).
terraform destroy -target RESOURCE_TYPE.NAME -target RESOURCE_TYPE2.NAME
The recommended approach in Kubernetes is using the imperative
kubectl delete command to delete individual objects. They recommend it, because it is explicit about which objects will be deleted (either a specific object or the reference to a yaml file).
But Kubernetes also supports the declarative approach. You have to
kubectl apply a complete configuration folder. Additionally you have to use the alpha option
--prune that actually allows deleting all objects that are no longer present in the yaml files inside the folder (see Kubernetes docs). The different ways of object management in Kubernetes are described very well here.
kubectl apply command also provides an option for a
dry-run to verify whether the intended changes will be applied.
How Kubernetes actually handles declarative deletion is explained very detailed in their documentation.
Here's a brief summary:
Kubernetes saves the complete state on server-side. When doing a
kubectl apply --prune you have to either provide a label with
-l <label> or refer to all resources in the namespaces used in the folder's yaml files via
--all. Both ways match to what I described as a
Declarative Set before. They make it possible for Kubernetes to know which objects actually belong to the set of resources that are part of the intended desired state. So when you apply again e.g. with the same label, it will simply delete all objects with this label, if they don't exist in the folder anymore. Using the label is the safer approach, as just deleting every no longer present resource in all namespaces that are referenced in a tf file is rather dangerous.
Also regarding what needs to be updated in an object, Kubernetes uses an interesting approach. It sets the actual configuration that was present in the yaml file into a
last-applied-configuration metadata on the object. That way it will only e.g. delete attributes that have been present in a previous application but not anymore. It does not overwrite other attributes that have only been set via implicit commands. So it does an actual patch, based on what was before and is currently present in the yaml file.
The determination of what changes need to be applied has traditionally been done in kubectl CLI tool. But recently a server-side apply implementation has been released as GA.
Sadly they didn't touch multi-object apply yet. It remains in the kubectl CLI. So actual declarative object deletion is not part of the server-side implementation.
AWS Cloud Formation
AWS Cloud Formation uses so called Stacks in which they are grouping created resources. The Stack is what I called
Declarative Set before. You can update an existing stack by applying a Cloud Formation Template to that Stack. You can modify the template and re-apply it. This modification can also contain removing resources. They will then be deleted in AWS when the Cloud Formation Template is applied to the Stack again.
How does meshStack handle it?
In meshStack we provide an API that takes a list of meshObjects and creates and updates them. If you want to apply declarative deletion you have to provide a
meshObjectCollection. This is how we decided to implemenent the
Declarative Set I mentioned before. This is similar to adding a label to
kubectl apply or using a Stack in AWS. meshObjects no longer present in the request will be deleted if they had been applied before using the same
meshObjectCollection parameter. The check in our backend is rather simple as we just added a
meshObjectCollection field to our entities and can query by it. That way we can do a diff between what is applied in the request and what existed already before.
When actually executing the import of meshObjects, the result contains information about whether meshStack was able to successfully reconcile the desiredState or which error occurred for a specific meshObject. A failed import of one meshObject won't result in breaking the complete process. The response will only contain details about the import result of every single meshObject.
Currently we only support access to our API for a few external systems that integrate with meshStack. Once we provide a CLI and every user can use the API that way, meshObjectCollections will be assigned to projects. That way a clear separation of access to meshObjectCollections can be guaranteed.
Scalability of this approach is definitely a topic we will have to solve in future. With the current approach all objects have to be applied in a single request. If a huge number of objects shall be processed during one request, timeouts or other performance issues could arise. A possible solution we are thinking about is doing it similar to Kubernetes. You can define your intended objects in several yaml files inside a folder. Those can be uploaded to meshStack via a CLI or an according REST call. Processing of these objects will be done asynchronously once all files are uploaded.
When to use declarative deletion
Deletion support in a declarative API can be a really nice comfort feature for your clients, as a lot of complexity will be removed for them.
Removing no longer existing objects in the request will be handled by the declarative system.
The downside of declarative deletion is that it is implicit and can easily result in removing objects that were not intended to be removed, just because they were not part of the request anymore. As long as you are managing stateless objects it might be fine to take that risk as you could simply recreate them with the next request. If objects are stateful (e.g. a databases or volumes), it might be a bad idea to remove the resource and recreate it again. In that case all data will be gone. But even in the stateful use-cases you can reduce the risk by:
- making declarative deletion explicit via an additional parameter that needs to be provided, so the client is actually aware that declarative deletion will be applied.
- exclude certain objects from deletion (e.g. volumes and databases). Those can only be deleted in an imperative way. It could also be an option to let the end-user decide which objects should be excluded from the declarative deletion by flagging them accordingly.
- implement a dry-run functionality like
terraform planthat will show the client which state will be applied in the end. This is a good option when providing access to the declarative API via a CLI for example. In case of an automated integration between two systems, it is not helpful as there is no-one to check the result of the dry-run. Still, some automation might be possible, but that would again require some complex logic on the client side, which we wanted to avoid in the first place.
In general it makes sense to additionally provide an imperative deletion, as the declarative deletion should always be an opt-in the client explicitly has to choose. The client always needs to be aware of the consequences the declarative deletion implies and that the client might need to be extra careful to always provide a complete list of resources.