Django Queryset select_related | Boosting Query Performance

In this post, You’ll learn Django Queryset select_related() | Boosting Query Performance.

Table Of Contents

Introduction

Suppose we have a situation were 2 tables must be joined and get common object by selecting both tables for this purpose select_related() Queryset method is used. This method on the background does SQL INNER JOIN on foreign key or one-to-one field. Purpose of using this query is it enhances query performance.

Describe Case

To understand the select_related() method. We’ll use the below model as an example.
In models.py

class Country(models.Model):
    country_id = models.AutoField(primary_key=True)
    country_short = models.CharField(max_length=3, blank=True, null=True)
    country_name = models.CharField(max_length=150, blank=True, null=True)
    country_code = models.IntegerField(blank=True, null=True)

    class Meta:
        managed = True
        db_table = 'country'

class State(models.Model):
    state_id = models.AutoField(primary_key=True)
    state_name = models.CharField(max_length=30, blank=True, null=True)
    country = models.ForeignKey(Country,related_name="country",on_delete=models.CASCADE,null=True)
    is_deleted = models.IntegerField(default=0)

    class Meta:
        managed = True
        db_table = 'states'

We have created two models Country and State. The Country models have 246 rows and State models have 4,120 states information mapped with their respective
country model.

We’ll not be going in detail of creating of model our topic is rather focussed on letting developers get the benefit of select_related Queryset.

Retrieving data using all()

Firstly we’ll be using commonly used Queryset ways of getting data. In below example, we have a function loadStates() inside this function.
we have defined start_time so that we can measure the timing of Execution.

We know that Django querysets are lazy which means that calling State.objects.all() will not return all those 4,120 rows from the table.

Querysets only hit the database only when they are printed or evaluated which means that when we loop through each state on every loop the queryset calls the database and on each loop we also call the Country of the respective State object.

Now let us look at how much time it took for this process.

>>> from query_sets.models import Country, State
>>> import time
>>> def loadStates():
...     start_time = time.time()
...     states = State.objects.all()
...     state_list=[]
...     for state in states:
...         state_list.append({
...             "state_id" : state.state_id,
...             "state_name" : state.state_name,
...             "country_name" : state.country.country_name,
...         })
...     print("Execution time --- %s seconds ---" % (time.time() - start_time))
... 
>>> loadStates()
Execution time --- 4.360718250274658 seconds ---

OMG! it took a whopping 4.3607… seconds which executed 4,120 rows.

Now we’ll remove "country_name" : state.country.country_name, and check for execution time.

>>> from query_sets.models import Country, State
>>> import time
    def loadStates():
        start_time = time.time()
        states = State.objects.all()
        state_list=[]
        for state in states:
            state_list.append({
                "state_id" : state.state_id,
                "state_name" : state.state_name,
            })
        print("Execution time --- %s seconds ---" % (time.time() - start_time))

>>> loadStates()
Execution time --- 0.11328530311584473 seconds ---

GREAT! it took 0.113285… seconds which executed 4,120 rows. Now the reason for the increase in performance is that on
every iteration it does not hit the database.

Now we know take handling large data in the above method is the very inefficient task. Now we will explore how we can boost the performance of our query.
For this, we’ll be using select_related() Queryset method.

First, let us try the same query using select_related and we’ll answer what exactly select_related does.

>>> from query_sets.models import Country, State
>>> import time
>>> def loadStates():
        start_time = time.time()
        states = State.objects.select_related('country')
        state_list=[]
        for state in states:
            state_list.append({
                "state_id" : state.state_id,
                "state_name" : state.state_name,
                "country_name" : state.country.country_name,
            })
        print("Execution time --- %s seconds ---" % (time.time() - start_time))

>>> loadStates()
Execution time --- 0.18429231643676758 seconds ---

OOH, CAN YOU BELIEVE THIS! it took 0.18429… seconds to execute 4,120 rows.
Congratulations We have increased execution faster than the above queries.

Basically what select_related() queryset does is it creates joins with the specified column name. This only works with one-to-one or ForeignKey relationship.
By doing this we are reducing the number of hits to the database.

>>> states = State.objects.select_related('country')
>>> print(states.query)
SELECT `states`.`state_id`, `states`.`state_name`, `states`.`country_id`, `states`.`is_deleted`, `country`.`country_id`, `country`.`country_short`, `country`.`country_name`, `country`.`country_code` FROM `states` LEFT OUTER JOIN `country` ON (`states`.`country_id` = `country`.`country_id`)

We can make use of .query attribute to print out raw SQL used by Django to get results.

Note

For official documentation on the select_related() method visit Django Docs.

Conclusions

So we have come to end on this post on Django Queryset select_related() | Boosting Query Performance.
if you have any queries please comment and if you like our post don’t forget to give a thumbs up.

Related Posts

Summary
Review Date
Reviewed Item
Django Queryset select_related | Boosting Query Performance
Author Rating
51star1star1star1star1star
Software Name
Django Web Framework
Software Name
Windows Os, Mac Os, Ubuntu Os
Software Category
Web Development