Practical Tips for Django ORM Database Interaction

Preface

ORM is a powerful data operation tool. For developers, proficiency in using this tool is of great help for daily feature development. This article introduces some common scenarios and ORM tips. If you find this helpful, please like and follow to support me, Thanks♪(・ω・)ノ

If you have any questions or other suggestions, feel free to leave a comment.

PS: If you want to learn about the principles of ORM, you can check out my other article: One Article Is Enough To Understand ORM Frameworks

How to View Database Interaction

Have you ever encountered any of the following problems when developing with ORM tools?

  • You executed a complex query in Django ORM, but the result does not meet expectations, and you want to view the generated SQL query statement for debugging.
  • Some pages load slowly, you suspect this is caused by certain database queries, and you want to find out which queries consume a lot of time.
  • You wrote a complex query, but it returns an empty result set, and you want to confirm whether the query was correctly converted to SQL.
  • A query executed by your application returns an incorrect result, and you need to find which query caused the problem.

At this point, you need a tool to view database interactions.

Using connection.queries

connection.queries is a debugging tool in Django that can be used to view all database queries issued during a request. It can view the SQL statements executed by the current database connection and their corresponding execution time.

PS: connection.queries only takes effect when Debug=True

>>> from django.db import connection
>>> from django.contrib.auth.models import User
>>> User.objects.values('username','id').first()
>>> connection.queries[-1]
{'sql': 'SELECT `auth_user`.`username`, `auth_user`.`id` FROM `auth_user` ORDER BY `auth_user`.`id` ASC  LIMIT 1', 'time': '0.043'}

Applicable to scenarios where the query method cannot be used

>>> from django.contrib.auth.models import User
>>> from django.db.models import Count, Q
>>> print(User.objects.aggregate(
    total_users=Count('id'),
    total_active_users=Count('id', filter=Q(is_active=True)),
).query)
Traceback (most recent call last):
  File "", line 3, in 
AttributeError: 'dict' object has no attribute 'query'

>>> print(User.objects.aggregate(
total_users=Count(‘id’),
total_active_users=Count(‘id’, filter=Q(is_active=True)),
))
{‘total_users’: 232, ‘total_active_users’: 208}

>>> connection.queries[-1]
{‘sql’: ‘SELECT COUNT(auth_user.id) AS total_users, COUNT(CASE WHEN auth_user.is_active = 1 THEN auth_user.id ELSE NULL END) AS total_active_users FROM auth_user’, ‘time’: ‘0.042’}

Using the django-debug-toolbar Tool

For a related introduction, you can refer to this article: https://juejin.cn/post/6844903720304508935?searchId=202405151750017DC9DADF6C0D42A6F54B

The advantage of this tool is that it is powerful, but it is relatively heavier and requires some configuration changes. If you only need to analyze a small number of ORM statements, Django's built-in connection.queries tool can also meet your needs.

Related Object Query Optimization

When querying related objects, using select_related() and prefetch_related() for database queries can reduce additional database interactions and improve performance.

Let's first look at the result before optimization: each traversal of a related object adds an extra database IO.

>>> from django.contrib.auth.models import Permission
>>> from django.db import connection
>>> len(connection.queries)
0
>>> [p.content_type.id for p in Permission.objects.all()[:10]]
[1, 1, 1, 1, 28, 28, 28, 28, 111, 111]
>>> len(connection.queries) # Traversing 10 related objects adds 11 extra database IOs
11

Now compare the effect of using select_related to query related objects: traversing related objects does not add any extra database IO.

>>> [p.content_type.id for p in Permission.objects.select_related("content_type").all()[:10]]
[1, 1, 1, 1, 28, 28, 28, 28, 111, 111]
>>> len(connection.queries) # Traversing 10 related objects only adds 1 extra database IO
12

Similarly, prefetch_related() can also optimize queries for related objects. The main differences between the two methods are:

  1. select_related():

    • Used for one-to-one or many-to-one relationships.
    • Uses SQL's JOIN operation to fetch related objects in a single query.
    • Suitable for scenarios where you need to get the main object and its related objects in one query.
  2. prefetch_related():

    • Used for many-to-many or one-to-many relationships.
    • Uses additional queries to prefetch related objects.

Advanced Aggregation Queries

After Django 2.0, support for applying filter conditions in aggregation functions was added. This usage is very powerful, allowing you to apply more complex conditions in aggregation queries, rather than just simply counting the number of all records.

>>> from django.contrib.auth.models import User
>>> from django.db.models import Count, Q

>>> print(User.objects.aggregate(
total_users=Count(‘id’),
total_active_users=Count(‘id’, filter=Q(is_active=True)),
))
{‘total_users’: 232, ‘total_active_users’: 208}

The SQL equivalent of the above ORM query is as follows:

'SELECT COUNT(`auth_user`.`id`) AS `total_users`, COUNT(CASE WHEN `auth_user`.`is_active` = 1 THEN `auth_user`.`id` ELSE NULL END) AS `total_active_users` FROM `auth_user`'

Named Tuples (namedtuples)

The Alibaba Development Manual states:

[Mandatory] In table queries, never use * as the query field list, you must explicitly specify all required fields. Explanation: 1) Increases the parsing cost of the query analyzer. 2) Adding or removing fields can easily cause inconsistency with the resultMap configuration. 3) Unnecessary fields increase network consumption, especially for text type fields.

Therefore, the following approach is not recommended:

>>> from django.contrib.auth.models import Group
>>> print(Group.objects.all().query)
SELECT `auth_group`.`id`, `auth_group`.`name` FROM `auth_group`

So how do I specify fields for a query?

You can use the values and values_list methods

>>> print(Group.objects.values("name").query)
SELECT `auth_group`.`name` FROM `auth_group`
>>> print(Group.objects.values("name").first())
{'name': 'AI'}
>>> print(Group.objects.values_list("name").first())
('AI',)

The following approach is more recommended: convert query results to namedtuples. It is more convenient than the above two methods, more concise, and has better readability

>>> rst = Group.objects.values_list("name", named=True).first()
>>> rst.name
'AI'

Reference Links

https://juejin.cn/post/6844903558295322638


This is a separate discussion topic split from the original topic at https://juejin.cn/post/7369042628673765427