Merge pull request #573 from Human-Connection/docs-354-kubernetes_cron_job_for_backups

Docs 354 kubernetes cron job for backups
This commit is contained in:
Robert Schäfer 2019-05-15 18:52:53 +02:00 committed by GitHub
commit 5881a7d5df
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
11 changed files with 247 additions and 23 deletions

View File

@ -27,7 +27,10 @@
* [HTTPS](deployment/digital-ocean/https/README.md)
* [Human Connection](deployment/human-connection/README.md)
* [Volumes](deployment/volumes/README.md)
* [Neo4J DB Backup](deployment/backup.md)
* [Neo4J Offline-Backups](deployment/volumes/neo4j-offline-backup/README.md)
* [Volume Snapshots](deployment/volumes/volume-snapshots/README.md)
* [Reclaim Policy](deployment/volumes/reclaim-policy/README.md)
* [Velero](deployment/volumes/velero/README.md)
* [Legacy Migration](deployment/legacy-migration/README.md)
* [Feature Specification](cypress/features.md)
* [Code of conduct](CODE_OF_CONDUCT.md)

View File

@ -17,6 +17,8 @@
human-connection.org/selector: deployment-human-connection-backend
template:
metadata:
annotations:
backup.velero.io/backup-volumes: uploads
labels:
human-connection.org/commit: COMMIT
human-connection.org/selector: deployment-human-connection-backend

View File

@ -15,6 +15,8 @@
human-connection.org/selector: deployment-human-connection-neo4j
template:
metadata:
annotations:
backup.velero.io/backup-volumes: neo4j-data
labels:
human-connection.org/selector: deployment-human-connection-neo4j
name: nitro-neo4j

View File

@ -7,7 +7,13 @@ At the moment, the application needs two persistent volumes:
As a matter of precaution, the persistent volume claims that setup these volumes
live in a separate folder. You don't want to accidently loose all your data in
your database by running `kubectl delete -f human-connection/`, do you?
your database by running
```sh
kubectl delete -f human-connection/
```
or do you?
## Create Persistent Volume Claims
@ -19,24 +25,12 @@ persistentvolumeclaim/neo4j-data-claim created
persistentvolumeclaim/uploads-claim created
```
## Change Reclaim Policy
## Backup and Restore
We recommend to change the `ReclaimPolicy`, so if you delete the persistent
volume claims, the associated volumes will be released, not deleted:
```sh
$ kubectl --namespace=human-connection get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-bd02a715-66d0-11e9-be52-ba9c337f4551 1Gi RWO Delete Bound human-connection/neo4j-data-claim do-block-storage 4m24s
pvc-bd208086-66d0-11e9-be52-ba9c337f4551 2Gi RWO Delete Bound human-connection/uploads-claim do-block-storage 4m12s
```
Get the volume id from above, then change `ReclaimPolicy` with:
```sh
kubectl patch pv <VOLUME-ID> -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'
# in the above example
kubectl patch pv pvc-bd02a715-66d0-11e9-be52-ba9c337f4551 -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'
kubectl patch pv pvc-bd208086-66d0-11e9-be52-ba9c337f4551 -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'
```
We tested a couple of options how to do disaster recovery in kubernetes. First,
there is the [offline backup strategy](./neo4j-offline-backup/README.md) of the
community edition of Neo4J, which you can also run on a local installation.
Kubernetes also offers so-called [volume snapshots](./volume-snapshots/README.md).
Changing the [reclaim policy](./reclaim-policy/README.md) of your persistent
volumes might be an additional safety measure. Finally, there is also a
kubernetes specific disaster recovery tool called [Velero](./velero/README.md).

View File

@ -23,7 +23,10 @@ So, all we have to do is edit the kubernetes deployment of our Neo4J database
and set a custom `command` every time we have to carry out tasks like backup,
restore, seed etc.
{% hint style="info" %} TODO: implement maintenance mode {% endhint %}
{% hint style="info" %}
TODO: implement maintenance mode
{% endhint %}
First bring the application into maintenance mode to ensure there are no
database connections left and nobody can access the application.

View File

@ -0,0 +1,30 @@
# Change Reclaim Policy
We recommend to change the `ReclaimPolicy`, so if you delete the persistent
volume claims, the associated volumes will be released, not deleted.
This procedure is optional and an additional security measure. It might prevent
you from loosing data if you accidently delete the namespace and the persistent
volumes along with it.
```sh
$ kubectl --namespace=human-connection get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-bd02a715-66d0-11e9-be52-ba9c337f4551 1Gi RWO Delete Bound human-connection/neo4j-data-claim do-block-storage 4m24s
pvc-bd208086-66d0-11e9-be52-ba9c337f4551 2Gi RWO Delete Bound human-connection/uploads-claim do-block-storage 4m12s
```
Get the volume id from above, then change `ReclaimPolicy` with:
```sh
kubectl patch pv <VOLUME-ID> -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'
# in the above example
kubectl patch pv pvc-bd02a715-66d0-11e9-be52-ba9c337f4551 -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'
kubectl patch pv pvc-bd208086-66d0-11e9-be52-ba9c337f4551 -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'
```
Given that you changed the reclaim policy as described above, you should be able
to create a persistent volume claim based on a volume snapshot content. See
the general kubernetes documentation [here](https://kubernetes.io/blog/2018/10/09/introducing-volume-snapshot-alpha-for-kubernetes/)
and our specific documentation for snapshots [here](../snapshot/README.md).

View File

@ -0,0 +1,112 @@
# Velero
{% hint style="danger" %}
I tried Velero and it did not work reliably all the time. Sometimes the
kubernetes cluster crashes during recovery or data is not fully recovered.
Feel free to test it out and update this documentation once you feel that it's
working reliably. It is very likely that Digital Ocean had some bugs when I
tried out the steps below.
{% endhint %}
We use [velero](https://github.com/heptio/velero) for on premise backups, we
tested on version `v0.11.0`, you can find their
documentation [here](https://heptio.github.io/velero/v0.11.0/).
Our kubernets configurations adds some annotations to pods. The annotations
define the important persistent volumes that need to be backed up. Velero will
pick them up and store the volumes in the same cluster but in another namespace
`velero`.
## Prequisites
You have to install the binary `velero` on your computer and get a tarball of
the latest release. We use `v0.11.0` so visit the
[release](https://github.com/heptio/velero/releases/tag/v0.11.0) page and
download and extract e.g. [velero-v0.11.0-linux-arm64.tar.gz](https://github.com/heptio/velero/releases/download/v0.11.0/velero-v0.11.0-linux-amd64.tar.gz).
## Setup Velero Namespace
Follow their [getting started](https://heptio.github.io/velero/v0.11.0/get-started)
instructions to setup the Velero namespace. We use
[Minio](https://docs.min.io/docs/deploy-minio-on-kubernetes) and
[restic](https://github.com/restic/restic), so check out Velero's instructions
how to setup [restic](https://heptio.github.io/velero/v0.11.0/restic):
```sh
# run from the extracted folder of the tarball
$ kubectl apply -f config/common/00-prereqs.yaml
$ kubectl apply -f config/minio/
```
Once completed, you should see the namespace in your kubernetes dashboard.
## Manually Create an On-Premise Backup
When you create your deployments for Human Connection the required annotations
should already be in place. So when you create a backup of namespace
`human-connection`:
```sh
$ velero backup create hc-backup --include-namespaces=human-connection
```
That should backup your persistent volumes, too. When you enter:
```
$ velero backup describe hc-backup --details
```
You should see the persistent volumes at the end of the log:
```
....
Restic Backups:
Completed:
human-connection/nitro-backend-5b6dd96d6b-q77n6: uploads
human-connection/nitro-neo4j-686d768598-z2vhh: neo4j-data
```
## Simulate a Disaster
Feel free to try out if you loose any data when you simulate a disaster and try
to restore the namespace from the backup:
```sh
$ kubectl delete namespace human-connection
```
Wait until the wrongdoing has completed, then:
```sh
$ velero restore create --from-backup hc-backup
```
Now, I keep my fingers crossed that everything comes back again. If not, I feel
very sorry for you.
## Schedule a Regular Backup
Check out the [docs](https://heptio.github.io/velero/v0.11.0/get-started). You
can create a regular schedule e.g. with:
```sh
$ velero schedule create hc-weekly-backup --schedule="@weekly" --include-namespaces=human-connection
```
Inspect the created backups:
```sh
$ velero schedule get
NAME STATUS CREATED SCHEDULE BACKUP TTL LAST BACKUP SELECTOR
hc-weekly-backup Enabled 2019-05-08 17:51:31 +0200 CEST @weekly 720h0m0s 6s ago <none>
$ velero backup get
NAME STATUS CREATED EXPIRES STORAGE LOCATION SELECTOR
hc-weekly-backup-20190508155132 Completed 2019-05-08 17:51:32 +0200 CEST 29d default <none>
$ velero backup describe hc-weekly-backup-20190508155132 --details
# see if the persistent volumes are backed up
```

View File

@ -0,0 +1,50 @@
# Kubernetes Volume Snapshots
It is possible to backup persistent volumes through volume snapshots. This is
especially handy if you don't want to stop the database to create an [offline
backup](../neo4j-offline-backup/README.md) thus having a downtime.
Kubernetes announced this feature in a [blog post](https://kubernetes.io/blog/2018/10/09/introducing-volume-snapshot-alpha-for-kubernetes/). Please make yourself familiar with it before you continue.
## Create a Volume Snapshot
There is an example in this folder how you can e.g. create a volume snapshot for
the persistent volume claim `neo4j-data-claim`:
```sh
# in folder deployment/volumes/volume-snapshots/
kubectl apply -f snapshot.yaml
```
If you are on Digital Ocean the volume snapshot should show up in the Web UI:
![Digital Ocean Web UI showing a volume snapshot](./digital-ocean-volume-snapshots.png)
## Provision a Volume based on a Snapshot
Edit your persistent volume claim configuration and add a `dataSource` pointing
to your volume snapshot. [The blog post](https://kubernetes.io/blog/2018/10/09/introducing-volume-snapshot-alpha-for-kubernetes/) has an example in section "Provision a new volume from a snapshot with
Kubernetes".
There is also an example in this folder how the configuration could look like.
If you apply the configuration new persistent volume claim will be provisioned
with the data from the volume snapshot:
```
# in folder deployment/volumes/volume-snapshots/
kubectl apply -f neo4j-data.yaml
```
## Data Consistency Warning
Note that volume snapshots do not guarantee data consistency. Quote from the
[blog post](https://kubernetes.io/blog/2018/10/09/introducing-volume-snapshot-alpha-for-kubernetes/):
> Please note that the alpha release of Kubernetes Snapshot does not provide
> any consistency guarantees. You have to prepare your application (pause
> application, freeze filesystem etc.) before taking the snapshot for data
> consistency.
In case of Neo4J this probably means that enterprise edition is required which
supports [online backups](https://neo4j.com/docs/operations-manual/current/backup/).

Binary file not shown.

After

Width:  |  Height:  |  Size: 118 KiB

View File

@ -0,0 +1,18 @@
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: neo4j-data-claim
namespace: human-connection
labels:
app: human-connection
spec:
dataSource:
name: neo4j-data-snapshot
kind: VolumeSnapshot
apiGroup: snapshot.storage.k8s.io
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi

View File

@ -0,0 +1,10 @@
---
apiVersion: snapshot.storage.k8s.io/v1alpha1
kind: VolumeSnapshot
metadata:
name: neo4j-data-snapshot
namespace: human-connection
spec:
source:
name: neo4j-data-claim
kind: PersistentVolumeClaim