Friday, April 17, 2020

How To Query MongoDB Documents In Python

How To Query MongoDB Documents In Python


Introduction

When you need to find information on a MongoDB document, querying the right way becomes important. You want the right data as fast as possible so you can make the right decisions. There are several ways to find documents MongoDB, however, it’s best to know which one to use in order to save time. For example, you might want to use a multiple-condition query request to find documents PyMongo. Sorting data may be helpful to you as well. Learn these techniques are more in this tutorial that shows you how to query MongoDB documents Python.

Prerequisites

  • MongoDB – Verify it is installed and make sure it is still running. To do this, open a terminal window and use command mongo --version. Alternatively, while in a terminal window, type mongo, then press the Return key.
1
mongo --version
  • Python 3 – Confirm that you have it installed and it is running. >Note: Python 2 will soon be obsolete, so download and install Pytyon 3 instead.
  • MongoDB Python driver – Install it with the package manager pip3.
1
pip3 install pymongo

Create a Python script directory

Get the environment for the MongoDB server ready to use with Python. Make a directory for the document and its related files.
1
sudo mkdir python-mongo
>NOTE: For this tutorial, we’ll use a project example and call it python-mongo.

Make MongoDB class instances after importing PyMongo library

  • Import the MongoClient PyMongo library
  • Create new instances
1
2
3
4
from pymongo import MongoClient

# A MongoDB instance for Python
mongo_client = MongoClient('mongodb://localhost:27017')

Make a PyMongo MongoDB database instance

  • Query documents with the instance. Use it to access the collection and database.
1
2
# A database instance
db = mongo_client.some_database

Make a PyMongo MongoDB collection instance

  • The collections for the database, get them ready to query by making an instance.
1
2
# A collection instance
col = db.some_collection

A basic example of a MongoDB collection PyMongo query

  • The find() method passes a Python dictionary in this example below. An API call uses a Python dictionary and it queries a MongoDB collection’s documents with find().
1
result = col.find( {"some field""FIND ME!"} )
  • The result stored in the pymongo.cursor.Cursor object returns documents.

Use the regular expression “$regex” to locate documents with a partial string match

  • A nested dictionary is what you’ll make to query partial string matched documents. A nested dictionary contains two parts: (1) outer dictionary is the field you’re querying, and (2) inner dictionary key is "$regex" .
1
2
# A query dictionary object $regex
regex_query = { "field example" : {"$regex" : "PARTIAL STRING MATCH"} }
  • Next, use the find() method to pass the nested dictionary.
1
result = col.find( regex_query )

Use a Python iterator to print each document returned by MongoDB

  • You can retrieve all documents if you iterate the result object like it’s a list in Python.
1
2
for doc in result:
    print (doc)
  • Look for a result like this from every document returned:
1
{'_id': ObjectId('5ced203bd3c4454072c57040'), 'field 1': 'value', 'field 2': 'value'}

Get the values and fields of MongoDB documents

Obtain MongoDB documents fields and values. You must have access to the _id key so documents in the pymongo.cursor.Cursor object can be returned by the iterator.

How to access “_id” field of a MongoDB Python document

  • Get the iterated object’s key "_id". Then you’ll be able to obtain the _id of the document.
1
2
3
4
# iterate the returned Cursor object
for doc in result:
    # print the document's _id to terminal
    print ("doc _id:", doc["_id"])

Obtain a complete list of the result’s methods along with attributes with the dict object

  • See every attribute and method result from the Cursor object that was returned from the API call.
1
2
# the API call's results can show you all of the attributes of the Cursor object
print ("Cursor attr:", result.__dict__)

Pass the collection’s find() method to the Python list() function to return a list of a MongoDB documents

Pass the entire collection_object.find() API call into the list() function to have it return a list containing all of the collection’s documents (that matched the query) and their respective data.
Here’s an iterator that goes over all of the document dictionary objects that were returned in a list, and it print’s out their respective document _ids:
1
2
3
4
5
6
# build a Python dictionary for query
query = {"search this field" : "find this value"}

documents = list(col.find(query))
for doc in documents:
    print ("\ndoc _id:", doc["_id"])
Screenshot of IDLE for Python using the list function to get all the documents in a MongoDB collection

Obtain the document quantity amount returned after you make the MongoDB API query

  • Keep in mind, Python 2 is on its way out, and versions 3.x of MongoDB will return an error message if you try to use the old count() method with it. Older versions like those used the count() method and it was enough to get the number of documents after a find() returned a Cursor object.
1
2
3
4
5
# a query request result
result = col.find(some_query)

# the count() method
print ("number of docs:", result.count())
  • An integer for the amount of documents queried by the API call was accomplished by the count() method.

This example shows how the old count() method returns a DepreciationWarning

The Cursor object’s count() method is deprecated since v3.1 of MongoDB
Screenshot of Python's IDLE making a regex query to MongoDB and returning a result
  • There are two ways to successfully get the document count. Use the method count_documents() and make another call to that collection object or by counting when using the iterator enumerate, a Python generator, for the result object.

How to use the count_documents() method

  • With the method count_documents(), the collection’s instance is where you’ll pass the Python dictionary.
1
2
doc_count = col.count_documents(some_query)
print ("doc_count:", doc_count)

How to iterate and count documents

  • Do this in two ways: when you iterate the result object that was returned, keep track of the number of documents. Alternatively, the enumerate() can count the documents.
1
2
3
for num, doc in enumerate(result):
    print ("num,:", num, "-- _id:", doc["_id"])
    print ("total documents:", num)
Screenshot of Python using enumerate to iterate over MongoDB documents in a Cursor object

Use datetime library in Python to query PyMongo ranges

  • Python datetime requests are supported by some find() method queries.

How datetime objects in PyMongo are utilized

  • Strings are the format for datetime objects that PyMongo uses. Next, MongoDB server receives those queries passed from the datetime library.

When to import the datetime library

  • At the start of the script, import the datetime library like this:
1
import json, datetime
>NOTE: Python’s built-in exception ValueError will be raised if you pass incorrect month or day values for the datetime object. Therefore, don’t pass a month value integer over 12 or day integer over 31. If you do that by mistake, you’ll know what caused the error and can then fix it.

Pass a Python datetime string to MongoDB

  • Make a datetime object first with the datetime.datetime() method.
  • Then covert it to a string.
  • Next, pass it to MongoDB

Create a new datetime string in Python to pass to the MongoDB request

Use the datetime.datetime() method to create a datetime object for the query request to PyMongo’s find() method. Make sure to explicitly convert the datetime object to string first before passing to a query dictionary:
1
2
3
4
5
# use the parsed HTTP data and create a new datetime object
start_date = datetime.datetime(query_year, query_month, query_day)

# convert it explicitly to a datestring
start_date = str(start_date)

About the $gte and gt MongoDB query selectors

  • The MongoDB query selector for equal to or greater than is $gte and gt is greater than. The query selector is passed into the inner dictionary of a nested Python dictionary.
1
query = { "join_date"{"$gt": start_date} }
  • The example below shows the dictionary query passed into the find() method in a direct way:
1
2
# call the find() method to make a date range request
result = col.find({"join_date"{"$gt": start_date}}).sort("name")
  • The sort() method sorts the documents returned based on a particular field. In the above example, it’s the "name" field.

How to use the PyMongoMongoDB query operators

In order to use $and$not, and $or MongoDB query operators, the following rules apply.
  • The outer dictionary key must be one of the query operators $and$not, and $or.
  • In addition, dictionary parameters must be in a Python list and that Python list must be the value of the key.

Multiple conditions queries and PyMongo requests

  • The $and query operator multiple conditions script can be created using several lines or a single line.
  • Below is a multiple line, mulitple condition query with the $and operator:
1
2
3
4
5
6
7
8
9
10
11
query = {
    "$and":
        [
            {
                "field 1""MUST MATCH THIS"
            },
            {
                "field 2""..AND THIS!"
            }
        ]
    }
  • Below is the same multiple condition query with just one line:
1
query = {'$and'[{'field 1''MUST MATCH THIS'}, {'field 2''..AND THIS!'}]}

Multiple condition query and the find() method

  • There are no special steps for this one. This type of PyMongo query is passed like the others. See below:
1
result = col.find( query ).sort("field 1")

Image example of Python IDLE environment making a PyMongo multiple-condition query request using the find() method and $or query operator

Python idle screenshot making a query request to MongoDB to find one condition $or another

Conclusion

This tutorial explained how to query MongoDB Python. You learned how to use the find() method to create a query request MongoDB in a collection. You also found out about the $gte greater than or equal to the operator when using the find() method to locate documents MongoDB. In addition, you discovered how to import datetime library to query MongoDB documents Python. We went over multiple-condition querying and sorting returned results. There’s much more that we uncovered in this tutorial that should help you in your current and upcoming MongoDB projects.
For further reference, turn to the examples shown below for querying MongoDB documents in a Python script.

For further reference, turn to the examples shown below for querying MongoDB documents in a Python script.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
#!/usr/bin/env python3
#-*- coding: utf-8 -*-

# import the MongoClient class
from pymongo import MongoClient

# build a new client instance for MongoDB
mongo_client = MongoClient('localhost', 27017)

# get the employees database
db = mongo_client['employees']

# get the newly hired people
col = db['new_hires']


"""
LOGICAL OPERATORS FOR
MULTIPLE QUERY CONDITIONS
"""

# find any document of an employee who is a male AND is 26
multiple_param = { "$and"[ {"sex""male"}, {"age""26"}]}

# find any document of an employee who is a female and is NOT 26
multiple_param = { "$not"[ {"sex""female"}, {"age""22"}]}

# find any document of an employee who is male OR is 25 years old
multiple_param = { "$or"[ {"sex""male"}, {"age""25"}]}

# call the find() method to make a query request and sort order
result = col.find( multiple_param ).sort("sex")

# get all of the attributes of the Cursor object returned by API
print ("Cursor attr:", result.__dict__, "\n\n")

# iterate the result Cursor object with documents
for num, doc in enumerate(result):
    print (num, "--", doc, "\n")


"""
DATE RANGE QUERY FROM
HTTP REQUEST MESSAGE PARAMETERS
"""

# import the JSON and datetime libraries
import json, datetime

# simulate an HTTP POST request string
http_request_post = '{"user_query": {"year": 2015, "month": 4, "day": 12}}'

# convert the HTTP message into a JSON object
json_date = json.loads(http_request_post)["user_query"]

# parse out the year, month, and day from the JSON object
query_year = json_date["year"]
query_month = json_date["month"]
query_day = json_date["day"]

# create a new datetime() object from the parsed HTTP data
start_date = datetime.datetime(query_year, query_month, query_day)

# you have to explicitly cast the datetime object as a string
# to convert the object to an actual datestring
start_date = str(start_date)

# call the find() method to make a date range request
# "$gt" means "greater than"
result = col.find({"join_date"{"$gt": start_date}}).sort("name")

# iterate over the result Cursor object with enumerate()
for num, doc in enumerate(result):
    print (num, "--", doc, "\n")


"""
ITERATE OVER THE DOCUMENTS
RETURNED BY A QUERY
"""

# get a MongoDB database instance
db = mongo_client['some_database']

# get a collection instance from the database
col = db['some_collection']

# declare a new dictionary for the query body
some_query = {"field to search" : "MUST MATCH THIS"}

# use Python's list() function to return the
# Cursor object's list of MongoDB documents
documents = list(col.find( some_query ))

# iterate over the document dictionaries in the list
for doc in documents:
    # access each document's "_id" key
    print ("\ndoc _id:", doc["_id"])

# print the length of the returned list
print ("total documents found:", len(documents))

No comments:

Post a Comment