Merge pull request #573 from Human-Connection/docs-354-kubernetes_cron_job_for_backups

Docs 354 kubernetes cron job for backups
2026-03-01 12:44:37 +00:00 · 2019-05-15 18:52:53 +02:00 · 2019-05-15 18:52:53 +02:00 · 5881a7d5df
commit 5881a7d5df
parent 042f208c1c 0ef2c26f03
11 changed files with 247 additions and 23 deletions
--- a/SUMMARY.md
+++ b/SUMMARY.md
@ -27,7 +27,10 @@
    * [HTTPS](deployment/digital-ocean/https/README.md)
  * [Human Connection](deployment/human-connection/README.md)
  * [Volumes](deployment/volumes/README.md)
-  * [Neo4J DB Backup](deployment/backup.md)
+    * [Neo4J Offline-Backups](deployment/volumes/neo4j-offline-backup/README.md)
+    * [Volume Snapshots](deployment/volumes/volume-snapshots/README.md)
+    * [Reclaim Policy](deployment/volumes/reclaim-policy/README.md)
+    * [Velero](deployment/volumes/velero/README.md)
  * [Legacy Migration](deployment/legacy-migration/README.md)
 * [Feature Specification](cypress/features.md)
 * [Code of conduct](CODE_OF_CONDUCT.md)
--- a/deployment/human-connection/deployment-backend.yaml
+++ b/deployment/human-connection/deployment-backend.yaml
@ -17,6 +17,8 @@
        human-connection.org/selector: deployment-human-connection-backend
    template:
      metadata:
+        annotations:
+          backup.velero.io/backup-volumes: uploads
        labels:
          human-connection.org/commit: COMMIT
          human-connection.org/selector: deployment-human-connection-backend
--- a/deployment/human-connection/deployment-neo4j.yaml
+++ b/deployment/human-connection/deployment-neo4j.yaml
@ -15,6 +15,8 @@
        human-connection.org/selector: deployment-human-connection-neo4j
    template:
      metadata:
+        annotations:
+          backup.velero.io/backup-volumes: neo4j-data
        labels:
          human-connection.org/selector: deployment-human-connection-neo4j
        name: nitro-neo4j
--- a/deployment/volumes/README.md
+++ b/deployment/volumes/README.md
@ -7,7 +7,13 @@ At the moment, the application needs two persistent volumes:

 As a matter of precaution, the persistent volume claims that setup these volumes
 live in a separate folder. You don't want to accidently loose all your data in
-your database by running `kubectl delete -f human-connection/`, do you?
+your database by running
+
+```sh
+kubectl delete -f human-connection/
+```
+
+or do you?

 ## Create Persistent Volume Claims

@ -19,24 +25,12 @@ persistentvolumeclaim/neo4j-data-claim created
 persistentvolumeclaim/uploads-claim created 
 ```

-## Change Reclaim Policy
+## Backup and Restore

-We recommend to change the `ReclaimPolicy`, so if you delete the persistent
-volume claims, the associated volumes will be released, not deleted:
-
-```sh
-$ kubectl --namespace=human-connection get pv
-
-NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                               STORAGECLASS       REASON   AGE
-pvc-bd02a715-66d0-11e9-be52-ba9c337f4551   1Gi        RWO            Delete           Bound    human-connection/neo4j-data-claim   do-block-storage            4m24s
-pvc-bd208086-66d0-11e9-be52-ba9c337f4551   2Gi        RWO            Delete           Bound    human-connection/uploads-claim      do-block-storage            4m12s
-```
-
-Get the volume id from above, then change `ReclaimPolicy` with:
-```sh
-kubectl patch pv <VOLUME-ID> -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'
-
-# in the above example
-kubectl patch pv pvc-bd02a715-66d0-11e9-be52-ba9c337f4551 -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'
-kubectl patch pv pvc-bd208086-66d0-11e9-be52-ba9c337f4551 -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'
-```
+We tested a couple of options how to do disaster recovery in kubernetes. First,
+there is the [offline backup strategy](./neo4j-offline-backup/README.md) of the
+community edition of Neo4J, which you can also run on a local installation.
+Kubernetes also offers so-called [volume snapshots](./volume-snapshots/README.md).
+Changing the [reclaim policy](./reclaim-policy/README.md) of your persistent
+volumes might be an additional safety measure. Finally, there is also a
+kubernetes specific disaster recovery tool called [Velero](./velero/README.md).
--- a/deployment/volumes/neo4j-offline-backup/README.md
+++ b/deployment/volumes/neo4j-offline-backup/README.md
@ -23,7 +23,10 @@ So, all we have to do is edit the kubernetes deployment of our Neo4J database
 and set a custom `command` every time we have to carry out tasks like backup,
 restore, seed etc.

-{% hint style="info" %} TODO: implement maintenance mode {% endhint %}
+{% hint style="info" %}
+TODO: implement maintenance mode
+{% endhint %}
+
 First bring the application into maintenance mode to ensure there are no
 database connections left and nobody can access the application.

--- a/deployment/volumes/reclaim-policy/README.md
+++ b/deployment/volumes/reclaim-policy/README.md
@ -0,0 +1,30 @@
+# Change Reclaim Policy
+
+We recommend to change the `ReclaimPolicy`, so if you delete the persistent
+volume claims, the associated volumes will be released, not deleted.
+
+This procedure is optional and an additional security measure. It might prevent
+you from loosing data if you accidently delete the namespace and the persistent
+volumes along with it.
+
+```sh
+$ kubectl --namespace=human-connection get pv
+
+NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                               STORAGECLASS       REASON   AGE
+pvc-bd02a715-66d0-11e9-be52-ba9c337f4551   1Gi        RWO            Delete           Bound    human-connection/neo4j-data-claim   do-block-storage            4m24s
+pvc-bd208086-66d0-11e9-be52-ba9c337f4551   2Gi        RWO            Delete           Bound    human-connection/uploads-claim      do-block-storage            4m12s
+```
+
+Get the volume id from above, then change `ReclaimPolicy` with:
+```sh
+kubectl patch pv <VOLUME-ID> -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'
+
+# in the above example
+kubectl patch pv pvc-bd02a715-66d0-11e9-be52-ba9c337f4551 -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'
+kubectl patch pv pvc-bd208086-66d0-11e9-be52-ba9c337f4551 -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'
+```
+
+Given that you changed the reclaim policy as described above, you should be able
+to create a persistent volume claim based on a volume snapshot content. See
+the general kubernetes documentation [here](https://kubernetes.io/blog/2018/10/09/introducing-volume-snapshot-alpha-for-kubernetes/)
+and our specific documentation for snapshots [here](../snapshot/README.md).
--- a/deployment/volumes/velero/README.md
+++ b/deployment/volumes/velero/README.md
@ -0,0 +1,112 @@
+# Velero
+
+{% hint style="danger" %}
+I tried Velero and it did not work reliably all the time. Sometimes the
+kubernetes cluster crashes during recovery or data is not fully recovered.
+
+Feel free to test it out and update this documentation once you feel that it's
+working reliably. It is very likely that Digital Ocean had some bugs when I
+tried out the steps below.
+{% endhint %}
+
+We use [velero](https://github.com/heptio/velero) for on premise backups, we
+tested on version `v0.11.0`, you can find their
+documentation [here](https://heptio.github.io/velero/v0.11.0/).
+
+Our kubernets configurations adds some annotations to pods. The annotations
+define the important persistent volumes that need to be backed up. Velero will
+pick them up and store the volumes in the same cluster but in another namespace
+`velero`.
+
+## Prequisites
+
+You have to install the binary `velero` on your computer and get a tarball of
+the latest release. We use `v0.11.0` so visit the
+[release](https://github.com/heptio/velero/releases/tag/v0.11.0) page and
+download and extract e.g. [velero-v0.11.0-linux-arm64.tar.gz](https://github.com/heptio/velero/releases/download/v0.11.0/velero-v0.11.0-linux-amd64.tar.gz).
+
+
+## Setup Velero Namespace
+
+Follow their [getting started](https://heptio.github.io/velero/v0.11.0/get-started)
+instructions to setup the Velero namespace. We use
+[Minio](https://docs.min.io/docs/deploy-minio-on-kubernetes) and
+[restic](https://github.com/restic/restic), so check out Velero's instructions
+how to setup [restic](https://heptio.github.io/velero/v0.11.0/restic):
+
+```sh
+# run from the extracted folder of the tarball
+$ kubectl apply -f config/common/00-prereqs.yaml
+$ kubectl apply -f config/minio/
+```
+
+Once completed, you should see the namespace in your kubernetes dashboard.
+
+## Manually Create an On-Premise Backup
+
+When you create your deployments for Human Connection the required annotations
+should already be in place. So when you create a backup of namespace
+`human-connection`:
+
+```sh
+$ velero backup create hc-backup --include-namespaces=human-connection
+```
+
+That should backup your persistent volumes, too. When you enter:
+
+```
+$ velero backup describe hc-backup --details
+```
+
+You should see the persistent volumes at the end of the log:
+
+```
+....
+
+Restic Backups:
+  Completed:
+    human-connection/nitro-backend-5b6dd96d6b-q77n6: uploads
+    human-connection/nitro-neo4j-686d768598-z2vhh: neo4j-data
+```
+
+## Simulate a Disaster
+
+Feel free to try out if you loose any data when you simulate a disaster and try
+to restore the namespace from the backup:
+
+```sh
+$ kubectl delete namespace human-connection
+```
+
+Wait until the wrongdoing has completed, then:
+```sh
+$ velero restore create --from-backup hc-backup
+```
+
+Now, I keep my fingers crossed that everything comes back again. If not, I feel
+very sorry for you.
+
+
+## Schedule a Regular Backup
+
+Check out the [docs](https://heptio.github.io/velero/v0.11.0/get-started). You
+can create a regular schedule e.g. with:
+
+```sh
+$ velero schedule create hc-weekly-backup --schedule="@weekly" --include-namespaces=human-connection
+```
+
+Inspect the created backups:
+
+```sh
+$ velero schedule get
+NAME               STATUS    CREATED                          SCHEDULE   BACKUP TTL   LAST BACKUP   SELECTOR
+hc-weekly-backup   Enabled   2019-05-08 17:51:31 +0200 CEST   @weekly    720h0m0s     6s ago        <none> 
+
+$ velero backup get
+NAME                              STATUS      CREATED                          EXPIRES   STORAGE LOCATION   SELECTOR
+hc-weekly-backup-20190508155132   Completed   2019-05-08 17:51:32 +0200 CEST   29d       default            <none>
+
+$ velero backup describe hc-weekly-backup-20190508155132 --details
+# see if the persistent volumes are backed up
+```
--- a/deployment/volumes/volume-snapshots/README.md
+++ b/deployment/volumes/volume-snapshots/README.md
@ -0,0 +1,50 @@
+# Kubernetes Volume Snapshots
+
+It is possible to backup persistent volumes through volume snapshots. This is
+especially handy if you don't want to stop the database to create an [offline
+backup](../neo4j-offline-backup/README.md) thus having a downtime.
+
+Kubernetes announced this feature in a [blog post](https://kubernetes.io/blog/2018/10/09/introducing-volume-snapshot-alpha-for-kubernetes/). Please make yourself familiar with it before you continue.
+
+## Create a Volume Snapshot
+
+There is an example in this folder how you can e.g. create a volume snapshot for
+the persistent volume claim `neo4j-data-claim`:
+
+```sh
+# in folder deployment/volumes/volume-snapshots/
+kubectl apply -f snapshot.yaml
+```
+
+If you are on Digital Ocean the volume snapshot should show up in the Web UI:
+
+![Digital Ocean Web UI showing a volume snapshot](./digital-ocean-volume-snapshots.png)
+
+## Provision a Volume based on a Snapshot
+
+Edit your persistent volume claim configuration and add a `dataSource` pointing
+to your volume snapshot. [The blog post](https://kubernetes.io/blog/2018/10/09/introducing-volume-snapshot-alpha-for-kubernetes/) has an example in section "Provision a new volume from a snapshot with
+Kubernetes".
+
+There is also an example in this folder how the configuration could look like.
+If you apply the configuration new persistent volume claim will be provisioned
+with the data from the volume snapshot:
+
+```
+# in folder deployment/volumes/volume-snapshots/
+kubectl apply -f neo4j-data.yaml
+```
+
+## Data Consistency Warning
+
+Note that volume snapshots do not guarantee data consistency. Quote from the
+[blog post](https://kubernetes.io/blog/2018/10/09/introducing-volume-snapshot-alpha-for-kubernetes/):
+
+> Please note that the alpha release of Kubernetes Snapshot does not provide
+> any consistency guarantees. You have to prepare your application (pause
+> application, freeze filesystem etc.) before taking the snapshot for data
+> consistency.
+
+In case of Neo4J this probably means that enterprise edition is required which
+supports [online backups](https://neo4j.com/docs/operations-manual/current/backup/).
+
--- a/deployment/volumes/volume-snapshots/digital-ocean-volume-snapshots.png
+++ b/deployment/volumes/volume-snapshots/digital-ocean-volume-snapshots.png
--- a/deployment/volumes/volume-snapshots/neo4j-data.yaml
+++ b/deployment/volumes/volume-snapshots/neo4j-data.yaml
@ -0,0 +1,18 @@
+---
+  kind: PersistentVolumeClaim
+  apiVersion: v1
+  metadata:
+    name: neo4j-data-claim
+    namespace: human-connection
+    labels:
+      app: human-connection
+  spec:
+    dataSource:
+      name: neo4j-data-snapshot
+      kind: VolumeSnapshot
+      apiGroup: snapshot.storage.k8s.io
+    accessModes:
+      - ReadWriteOnce
+    resources:
+      requests:
+        storage: 1Gi
--- a/deployment/volumes/volume-snapshots/snapshot.yaml
+++ b/deployment/volumes/volume-snapshots/snapshot.yaml
@ -0,0 +1,10 @@
+---
+  apiVersion: snapshot.storage.k8s.io/v1alpha1
+  kind: VolumeSnapshot
+  metadata:
+    name: neo4j-data-snapshot
+    namespace: human-connection
+  spec:
+    source:
+      name: neo4j-data-claim
+      kind: PersistentVolumeClaim