Checkout, Frequently Asked Questions!

MongoDB on compute nodes of Shaheen#

In this page, we will explore how to launch a mongoDB server on a Shaheen compute node and then connect to it from another compute node of Shaheen interactively.

The server launched will be submitted as a batch job to SLURM and can run for no more than 24 hours.

We will use mongo from a singularity container. Singularity has provided a Singularity Definition file or def file to create a mongo image. This can done exclusively on a Ibex compute node. Shaheen does not support creation of images from Singularity definition file. You therefore need to have access to Ibex for this step. This is a one-off step.

Ibex Jobscript to create mongo image file#

Note

For this step, you should be able to run a job on Ibex cluster.

First clone the git repository containing the Singularity definition file to create the image:

cd $HOME
git clone https://github.com/singularityhub/mongo.git

The jobscript looks as follows:

#!/bin/bash
#SBATCH --time=01:00:00
#SBATCH --ntasks=1

module load singularity
cd $HOME/mongo
export XDG_RUNTIME_DIR=$HOME
singularity build --fakeroot mongo.sif Singularity

A successful completion should result in creation of a singularity image file mongo.sif.

Working with the image:#

Since /home filesystem is shared between Ibex and Shaheen, you would be able to access this image file from Shaheen login node as well.

Let’s switch back to Shaheen. Copy or move your image file mongo.sif` to somewhere in your project directory. For example, I have copied mine in /project/k01/shaima0d/mongo_test. Mongo DB requires a write permitted space to do some housekeeping for the database. We need to create a directory, e.g. data, and bind it when launching the database instance.

cd /project/k01/shaima0d/mongo_test
mkdir data

Here is how the database launch jobscript looks like:

#!/bin/bash
#SBATCH --time=00:30:00
#SBATCH --nodes=1


module load singularity
#Grep the IP address for Cray Aries interface
export IP_ADDR=$(ifconfig ipogif0 | grep inet | cut -d " " -f 10)
echo IP_ADDRESS=$IP_ADDR

cd /project/k01/shaima0d/mongo_test
singularity run ./mongo.sif mongod --noauth --bind_ip localhost,${IP_ADDR} --dbpath=$PWD/data

The above jobscript should launch a mongodb daemon in a secure manner. Now we are ready to connect with it. Let’s connect our client. Note the IP address from the slurm-xxxxxx.out file where the database server was running, e.g. 10.109.197.13

Load the singularity module and ask for an interactive session with the srun command :

module load singularity
srun --time=00:30:00 --nodes=1 --pty singularity exec -B $PWD/data:/data/db $PWD/container/mongo.sif mongosh --host 10.109.197.13

After the resources are allocated you will see the output like this below:

srun: job 22878644 queued and waiting for resources
srun: job 22878644 has been allocated resources
Current Mongosh Log ID:     6374e621ec85174afd042398
Connecting to:              mongodb://10.109.197.13:27017/?directConnection=true&appName=mongosh+1.6.0
Using MongoDB:              6.0.2
Using Mongosh:              1.6.0

For mongosh info see: https://docs.mongodb.com/mongodb-shell/


To help improve our products, anonymous usage data is collected and sent to MongoDB periodically (https://www.mongodb.com/legal/privacy-policy).
You can opt-out by running the disableTelemetry() command.

------
The server generated these startup warnings when booting
2022-11-16T16:16:36.057+03:00: /sys/kernel/mm/transparent_hugepage/enabled is 'always'. We suggest setting it to 'never'
2022-11-16T16:16:36.058+03:00: /sys/kernel/mm/transparent_hugepage/defrag is 'always'. We suggest setting it to 'never'
2022-11-16T16:16:36.058+03:00: vm.max_map_count is too low
------

------
Enable MongoDB's free cloud-based monitoring service, which will then receive and display
metrics about your deployment (disk utilization, CPU, operation statistics, etc).

The monitoring data will be available on a MongoDB website with a unique URL accessible to you
and anyone you share the URL with. MongoDB may use this information to make product
improvements and to suggest MongoDB products and deployment options to you.

To enable free monitoring, run the following command: db.enableFreeMonitoring()
To permanently disable this reminder, run the following command: db.disableFreeMonitoring()
------

test>

Note

Since mongod launched in the Jobscript is listening on Cray Aries interconnect, it is necessary that the client runs on a compute node to connect to the IP address of the device where this server is running. The client won’t run on login node.

The legacy mongo shell is no longer included in server packages as of MongoDB 6.0. mongo has been superseded by the mongosh https://www.mongodb.com/docs/mongodb-shell/

Using pymongo Driver#

Once the Mongo server is running usingmongod as described above, we can interact with it using pymongo driver, the defacto way to use MongoDB from within python.

Following is an example python script:

#Import pymongo
from pymongo import MongoClient
import sys,datetime

# Creation of a new database
def create_db(client,db_name="mydatabase"):
    db = client[db_name]
    return db

# Creation of a new collection in a particular database
def create_collection(db,coll_name="mycol"):
    coll = db[coll_name]
    return coll


if __name__=="__main__":
    host=sys.argv[1]
    client= MongoClient(host)
    db    = create_db(client,"myFirstDB")
    col   = create_collection(db,"myFirstCol")

# The following is our entry we wish to add to our collection in database
    post = {"author": "Mike",
            "text": "My first blog post!",
            "tags": ["mongodb", "python", "pymongo"],
            "date": datetime.datetime.utcnow()}
    post_id = col.insert_one(post).inserted_id

    print("post ID inserted: ",post_id)
    print("Existing databases:",client.list_database_names())
    print("Existing collections:",db.list_collection_names())

The above test can run in a separate jobscript. We need to parse the IP address where our MongoDB is running. This is printed in the first line of the slurm output file of the MongoDB server job we submitted. E.g. our server is running on IP address: 10.128.0.95.

The following jobscript can be submitted to run the client which launches pymongo python test.

#!/bin/bash
#SBATCH --time=01:00:00
#SBATCH --nodes=1
#SBATCH --cpus-per-task=64
#SBATCH --hint=multithread

module load intelpython3
module load pymongo
DB_HOST=${1}
python pymongo_test.py ${DB_HOST}
sbatch client.slurm 10.128.0.95

Output looks as follows:

post ID inserted:  60268b1ab9e7406373dd8442
Existing databases: ['admin', 'config', 'local', 'myFirstDB']
Existing collections: ['myFirstCol']

Using mongodump#

To create a binary dump of the database and/or a collection, one can run it as a separate job. The following example jobscript creates a gzip archive of an existing database. It is assumed here that a mongodb server is already running as has been described above. Given that the IP address of the host of this server is 10.128.0.95

#!/bin/bash

#SBATCH --time=01:00:00
#SBATCH --nodes=1

module load singularity

srun singularity run ./mongo.sif mongodump --host=10.128.0.95 --db myFirstDB --collection myFirstCol --gzip --archive > data_$(date "+%Y-%m-%d").gz

This should create a file data_2021-02-24.gz (date may vary) in your present working directory.

Once run the above command as an interactive operation in a salloc session:

> salloc
> module load singularity
> srun --pty singularity shell ./mongo.sif
> mongodump --host=10.128.0.95 --db myFirstDB --collection myFirstCol --gzip --archive > data_$(date "+%Y-%m-%d").gz
> exit
> exit

Using mongorestore#

Once you have a compressed dump of your database/collection, you can copy to a remote destination to restore your database there. For instance, if we have a compressed file data_2021-02-24.gz I can scp to my workstation/laptop where I have a mongodb installation and restore there.

Note

I installed mongodb in a conda environment.

First, I start a new mongodb server on my local machine on localhost:

mkdir -p $PWD/data/db
mongod --dbpath ./data/db

Now we can start the restoration step in a new terminal:

gzip -d data_2021-02-24.gz
mongorestore --archive=data_2021-02-24
2021-02-24T17:26:59.010+0300        preparing collections to restore from
2021-02-24T17:26:59.019+0300        reading metadata for myFirstDB.myFirstCol from archive 'data_2021-02-24'
2021-02-24T17:26:59.084+0300        restoring myFirstDB.myFirstCol from archive 'data_2021-02-24'
2021-02-24T17:26:59.087+0300        no indexes to restore
2021-02-24T17:26:59.087+0300        finished restoring myFirstDB.myFirstCol (1 document)
2021-02-24T17:26:59.087+0300        done

Let us see if it has been ingested in our mongodb server:

mongo
MongoDB shell version v4.0.3
connecting to: mongodb://127.0.0.1:27017
Implicit session: session { "id" : UUID("de99ba6c-77e1-44d4-9c58-49af3270b992") }
MongoDB server version: 4.0.3
.......
> dbs
2021-02-24T17:27:26.160+0300 E QUERY    [js] ReferenceError: dbs is not defined :
@(shell):1:1
> db
test
> show dbs
admin      0.000GB
config     0.000GB
local      0.000GB
myFirstDB  0.000GB
> use myFirstDB
switched to db myFirstDB
> show collections
myFirstCol