Robert Schäfer 57a6b259eb Copy remote-dump.sh as a starting piont
Just added the two environment variables for neo4j.
2019-01-16 02:10:42 +01:00
..

MongoDB scripts

This README explains how to directly access the production or staging database for backup or query purposes.

Backup script

The backup script is intended to be used as a cron job or as a single command from your laptop. It uses SSH tunneling to a remote host and dumps the mongo database on your machine. Therefore, a public SSH key needs to be copied to the remote machine.

Usage

All parameters must be supplied as environment variables:

Name required
SSH_USERNAME yes
SSH_HOST yes
MONGODB_USERNAME yes
MONGODB_PASSWORD yes
MONGODB_DATABASE yes
NEO4J_USER yes
NEO4J_PASSWORD yes
OUTPUT
GPG_PASSWORD

If you set GPG_PASSWORD, the resulting archive will be encrypted (symmetrically, with the given passphrase). This is recommended if you dump the database on your personal laptop because of data security.

After exporting these environment variables to your bash, run:

./import-legacy-db.sh

Import into your local mongo db (optional)

Run (but change the file name accordingly):

mongorestore --gzip --archive=human-connection-dump_2018-11-21.archive

If you previously encrypted your dump, run:

gpg --decrypt human-connection-dump_2018-11-21.archive.gpg | mongorestore --gzip --archive

Query remote MongoDB

In contrast to the backup script, querying the database is expected to be done interactively and on demand by the user. Therefore our suggestion is to use a tool like MongoDB compass to query the mongo db through an SSH tunnel. This tool can export a collection as .csv file and you can further do custom processing with a csv tool like q.

Suggested workflow

Read on the mongodb compass documentation how to connect to the remote mongo database through SSH. You will need all the credentials and a public SSH key on the server as for the backup script above.

Once you have a connection, use the MongoDB Compass query bar to query for the desired data. You can export the result as .json or .csv.

Once you have the .csv file on your machine, use standard SQL queries through the command line tool q to further process the data.

For example

q "SELECT email FROM ./invites.csv INTERSECT SELECT email FROM ./emails.csv" -H --delimiter=,

Q's website explains the usage fairly well.