MongoDB scripts
This README explains how to directly access the production or staging database for backup or query purposes.
Backup script
The backup script is intended to be used as a cron job or as a single command from your laptop. It uses SSH tunneling to a remote host and dumps the mongo database on your machine. Therefore, a public SSH key needs to be copied to the remote machine.
Usage
All parameters must be supplied as environment variables:
| Name | required |
|---|---|
| SSH_USERNAME | yes |
| SSH_HOST | yes |
| MONGODB_USERNAME | yes |
| MONGODB_PASSWORD | yes |
| MONGODB_DATABASE | yes |
| NEO4J_USER | yes |
| NEO4J_PASSWORD | yes |
| OUTPUT | |
| GPG_PASSWORD |
If you set GPG_PASSWORD, the resulting archive will be encrypted (symmetrically, with the given passphrase).
This is recommended if you dump the database on your personal laptop because of data security.
After exporting these environment variables to your bash, run:
./import-legacy-db.sh
Import into your local mongo db (optional)
Run (but change the file name accordingly):
mongorestore --gzip --archive=human-connection-dump_2018-11-21.archive
If you previously encrypted your dump, run:
gpg --decrypt human-connection-dump_2018-11-21.archive.gpg | mongorestore --gzip --archive
Query remote MongoDB
In contrast to the backup script, querying the database is expected to be done interactively and on demand by the user. Therefore our suggestion is to use a tool like MongoDB compass to query the mongo db through an SSH tunnel. This tool can export a collection as .csv file and you can further do custom processing with a csv tool like q.
Suggested workflow
Read on the mongodb compass documentation how to connect to the remote mongo database through SSH. You will need all the credentials and a public SSH key on the server as for the backup script above.
Once you have a connection, use the MongoDB Compass query bar to query for the desired data. You can export the result as .json or .csv.
Once you have the .csv file on your machine, use standard SQL queries through the command line tool q to further process the data.
For example
q "SELECT email FROM ./invites.csv INTERSECT SELECT email FROM ./emails.csv" -H --delimiter=,
Q's website explains the usage fairly well.