Query a MongoDB Database using PyMongo!

Overview We will discuss how you can query a MongoDB database using the PyMongo library.

We will cover basic aggregation operations in MongoDB.

  Introduction Following the global spread of the internet, we are generating data at an unprecedented pace now.

Because performing any kind of analysis would require us to collect/query the necessary data from the database, it becomes of utmost importance that we choose the right tool to query the data.

Consequently, we cannot imagine ourselves using SQL to work with this volume of data as every single query will be expensive.

Query a MongoDB database using PyMongo This is precisely where MongoDB comes in.

MongoDB is an unstructured database that stores data in the form of documents.

Additionally, MongoDB is able to handle huge volumes of data very efficiently and is the most widely used NoSQL database as it offers rich query language and flexible and fast access to data.

In this article, we will see multiple examples of how to query a MongoDB database using PyMongo.

Further, we will see how to use the comparison operators and logical operators, regular expression, and the basics of aggregation pipelines.

This article is in continuation of the MongoDB in Python Tutorial for Beginners where we covered the challenges of unstructured databases, installation steps, and basic operations of MongoDB.

So if you are a complete beginner in MongoDB, I would recommend you go through that article first.

  Table of Contents What is PyMongo? Installation Steps Insert the data into the database Query the database Filter based on Fields Filter based on Comparison Operators Filter based on Logical Operators Regular Expressions Aggregation Pipelines End Notes   What is PyMongo? PyMongo is a Python library that enables us to connect with MongoDB.

Moreover, this is the most recommended way to work with MongoDB and Python.

Also, we have chosen Python to interact with MongoDB because it is one of the most commonly used and considerably powerful languages for data science.

PyMongo allows us to retrieve the data with dictionary-like syntax.

In case you are a beginner in Python, I will recommend you to enroll in this free course: Introduction to Python.

  Installation Steps Installing PyMongo is simple and straightforward.

Here, I am assuming you already have Python 3 and MongoDB installed.

The following command will help you to install PyMongo: pip3 install pymongo   Insert the data into the Database Now let us set things up before query a MongoDB database using PyMongo.

Firstly we will insert the data into the database.

The following steps will help you in this- Importing the libraries and connect to mongo client Start the MongoDB server on your machine.

I am assuming it is running file at localhost:27017.

Let’s start by importing some of the libraries that we are going to use.

By default, MongoDB server runs at port 27017 on the local machine.

Then, we will connect to the MongoDB client using the pymongo library.

Then get the database instance of the database sample_db.

In case it doesn’t exist, MongoDB will create one for you.

View the code on Gist.

  Create the Collections from the JSON files We will use the data from a meal delivery company that operates in multiple cities.

Additionally, they have various fulfillment centers in these cities for dispatching meal orders to their customers.

You can download the data and the code here.

weekly_demand: id: unique ID for each document week: Week Number center_id: Unique ID for fulfillment center meal_id: Unique ID for Meal checkout_price: Final price including discount, taxes & delivery charges base_price: Base price of the meal emailer_for_promotion: Emailer sent for promotion of meal homepage_featured: Meal featured at the homepage num_orders: (Target) Orders Count meal_info: meal_id: Unique ID for the meal category: Type of meal (beverages/snacks/soups….

) cuisine: Meal cuisine (Indian/Italian/…) Then we will create two collections in the sample_db database: View the code on Gist.

Insert data into Collections Now, the data we have is in the JSON format.

Then we will get the instance of the collection, read the data file, and insert the data using the insert_many function.

View the code on Gist.

  Finally, we have 456548 documents in the weekly_demand_collection and 51 documents in the meal info collection.

Now, let’s have a look at one document from each of these collections.

weekly_demand_collection View the code on Gist.

meal_info_collection View the code on Gist.

Now, our data is ready.

Let’s move on to query this database.

  Query the Database We can query a MongoDB database using PyMonfo with the find function to get all the results satisfying the given condition and also using the find_one function which will return only one result satisfying the condition.

The following is the syntax of the find and find_one: your_collection.

find( {<< query >>} , { << fields>>} ) You can query the database using the following filtering techniques- Filter based on fields For instance, you have hundreds of fields and you want to see only a few of them.

You can do that by just putting all the required field names with value 1.

For example- View the code on Gist.

  On the other hand, if you want to discard a few fields only from the complete document you can put the field names equal to 0.

Therefore, only those fields will be excluded.

Please note that you cannot use a combination of 1s and 0s to get the fields.

Either all should be one or all should be zero.

View the code on Gist.

Filter with a condition Now, in this section, we will provide a condition in the first braces and fields to discard in the second.

Consequently, it will return the first document with center_id is equal to 55 and meal_id is equal to 1885 and will also discard the fields _id and week.

  View the code on Gist.

  Filter based on Comparison Operators The following are the nine comparison operators in the MongoDB.

NAME DESCRIPTION $eq It will match the values that are equal to a specified value.

$gt It will match the values that are greater than a specified value.

$gte It will match all the values that are greater than or equal to a specified value.

$in It will match any of the values specified in an array.

$lt It will match all the values that are less than a specified value.

$lte It will match all the values that are less than or equal to a specified value.

$ne It will match all the values that are not equal to a specified value.

$nin It will match none of the values specified in an array.

The following are some examples of using these comparison operators- Equal to and Not Equal to We will find all the documents where center_id is equal to 55 and homepage_featured is not equal to 0.

Since we are going to use the find function, it will return the cursor for that command.

Further, use a for loop to iterate through the results of the query.

View the code on Gist.

In the List and Not in the List For instance, you need to match an element with multiple elements.

In that case, instead of using the $eq operator multiple times, we can use the $in operator.

We will try to find out all the documents where center_id is either 24 or 11.

View the code on Gist.

Then we, find all the documents where center_id is not present in the specified list.

The following query will return all the documents where center_id is not 24 and also not 11.

View the code on Gist.

Less than and Greater than Now, let us find all the documents where center_id is 55 and checkout_price is greater than 100 and less than 200.

Use the following syntax for this- View the code on Gist.

Filter based on Logical Operator NAME DESCRIPTION $and It will join query clauses with a logical AND and returns all documents that match both the conditions.

$not It will invert the effect of a query and returns documents that do not match the query expression.

$nor It will join the query clauses with a logical NOR and return all documents that fail to match the clauses.

$or It will join the query clauses with a logical OR and return all documents that match the conditions of either clause.

The following examples illustrate the use of logical operators- AND Operator The following query will return all the documents where the center_id is equal to 11 and also the meal_id is not equal to 1778.

The subqueries for the and operator will come inside a list.

View the code on Gist.

OR Operator The following query will return all the documents where either the center_id is equal to 11 or the meal_id is either 1207 or 2707.

Futher, the subqueries for the or operator will come inside a list.

View the code on Gist.

  Filter with Regular Expressions Regular Expressions are of great use when you have text fields and you want to search for documents with a specific pattern.

In case you want to learn more about regular expressions, I highly recommend you go through this article: Beginners Tutorial for Regular Expressions in Python.

It can be used with the operator $regex and we can provide value to the operator for the regex pattern to matc.

We will use the meal_info collection for this query and then we will find out the documents where the cuisine field starts with character C.

View the code on Gist.

  Let’s take another example of regular expressions.

We will find out all the documents in which category starts from the character “S” and the cuisine ends with “ian“.

View the code on Gist.

Aggregation Pipelines MongoDB’s aggregation pipeline provides a framework to perform a series of data transformations on a dataset.

The following is its syntax: your_collection.

aggregate( [ { <stage1> }, { <stage2> },.

] ) The first stage takes the complete set of documents as input, and from there each subsequent stage takes the previous transformation’s result set as input to the next stage and produces the output.

There are around 10 transformations available in the MongoDB aggregate out of which we will see $match and $group in this article.

We will discuss each of the transformations in detail in the upcoming article of the MongoDB.

For example, in the first stage, we will match the documents where center_id is equal to 11 and in the next stage, it will count the number of documents with center_id equal to 11.

Note that we have assigned the $count operator the value equalling total_rows in the second stage which is the name of the field that we want in the output.

View the code on Gist.

Now, let’s take another example where the first stage is the same as before i.

e.

center_id is equal to 11 and in the second stage, we want to calculate the average of the field num_orders for the center_id 11 and the unique meal_ids for the center_id 11.

View the code on Gist.

  End Notes The unfathomable amount of data generated today makes it necessary to find better alternatives like this to query data.

To summarize, in this article, we learned how to query a MongoDB database using PyMongo.

Also, we understood how to apply various filters as per the situation required.

In case you want to learn more about querying data, I recommend the following course – Structured Query Language (SQL) for Data Science In the upcoming article, we will discuss the aggregation pipelines in detail.

I encourage you to try things on your own and share your experiences in the comments section.

Additionally, if you face any problem with any of the above concepts, feel free to ask me in the comments below.

      You can also read this article on our Mobile APP Related Articles (adsbygoogle = window.

adsbygoogle || []).

push({});.

Leave a Reply