dstav blog

Django update or create objects in bulk

  • Published on

Django's update_or_create QuerySet method is a convenience method for updating an object or creating a new one based on a set of criteria. This allows you to update an existing object in the database or create a new one if it doesn't already exist, without having to write separate code for both cases.

This is fine when you are working on a single object, but does not apply when you need to update multiple objects at once. From the documentation, we can see that Django offers some bulk methods (in_bulk, bulk_create, and bulk_update) but it is not immediately apparent how to achieve the update_or_create operation.

Motivational example

Let's define an example to show the motivation behind this post. Imagine a toy application where a bookstore keeps a registry for their books. This contains the name of the book (let's assume this to be unique), the price of the book, and the amount of books in stock. The model could be something like:

# models.py
class Book(models.Model):
    name = models.CharField(max_length=100, unique=True)
    price = models.DecimalField(max_digits=6, decimal_places=2)
    stock = models.IntegerField()

Every so often, as the bookstore's web department, we receive a list of books in a CSV file. The list contains both new books that arrived in the bookstore, and existing books with updated information (the price or stock has changed). Here is an example of such a list:

name,price,stock
"Space Odyssey - The journey begins",9.99,20
"Alien Invasion - How to survive ",12.99,30
"Galactic Empire - Rise of the Tyrants",14.99,10
"Interstellar Travel - The possibilities ",11.99,25
"The Lost City of the Cosmos",14.99,5

In order to update the bookstore application we need to build the logic to import this list into the database. So we check the name of the book, if it exists then update the price and its stock numbers, or if it does not exist, create a new entry. We could of course iterate through the list and use update_or_create but that would run a query for every single book, and if the list is long, this process will be wasteful and slow. This is where the need of bulk update_or_create comes in!

PostgreSQL approach

Before we talk about the Django way of doing this, let's first see how to achieve this with PostgreSQL. After some researching, I found ON_CONFLICT clause on INSERT and the documentation says:

The optional ON CONFLICT clause specifies an alternative action to raising a unique violation or exclusion constraint violation error. For each individual row proposed for insertion, either the insertion proceeds, or, if an arbiter constraint or index specified by conflict_target is violated, the alternative conflict_action is taken. ON CONFLICT DO NOTHING simply avoids inserting a row as its alternative action. ON CONFLICT DO UPDATE updates the existing row that conflicts with the row proposed for insertion as its alternative action.

Now back to our example. Based on the above excerpt, we set the conflict_target as the name column, and the conflict_action to update the price and stock. If we pass the list of books to the INSERT statement, then if a book does not exist, it will run the insert process normally. But if the same name is found, then it will trigger the update to run and therefore the existing book record will be updated.

Implementing in Django

In Django we can utilize the above through the bulk_create method. bulk_create works by passing a list of objects as arguments and it will create them in bulk in the database. However, in Django 4.1, the arguments update_fields and unique_fields were added, which enable the ON CONFLICT functionality of PostgreSQL. In our example, we need to pass a list of the books as objects, and then set name as the unique_fields, and price stock as the update_fields.

The code would look something like this:

import csv
from bookstore.models import Book

with open('books.csv', newline='') as csvfile:
    reader = csv.DictReader(csvfile)
    books = [Book(name=row['name'], price=row['price'], stock=row['stock']) for row in reader]

Book.objects.bulk_create(books, update_fields=['price', 'stock'], unique_fields=['name'])

If we check the generated SQL query, we can see that the ON CONFLICT clause is used:

INSERT INTO "bookstore_book" ("name", "price", "stock")
VALUES ('Space Odyssey - The journey begins', 9.99, 20),
       ('Alien Invasion - How to survive ', 12.99, 30),
       ('Galactic Empire - Rise of the Tyrants', 14.99, 10),
       ('Interstellar Travel - The possibilities ', 11.99, 25),
       ('The Lost City of the Cosmos', 14.99, 5) ON CONFLICT ("name") DO
UPDATE
SET "price" = EXCLUDED."price",
    "stock" = EXCLUDED."stock"

Assuming our database is empty, the query will just create the five books.

>>> list(Book.objects.all().values_list())
[(1, 'Space Odyssey - The journey begins', Decimal('9.99'), 20),
 (2, 'Alien Invasion - How to survive ', Decimal('12.99'), 30),
 (3, 'Galactic Empire - Rise of the Tyrants', Decimal('14.99'), 10),
 (4, 'Interstellar Travel - The possibilities ', Decimal('11.99'), 25),
 (5, 'The Lost City of the Cosmos', Decimal('14.99'), 5)]

Let's assume that a new CSV file is received which includes both old books with updated price or stock, and new books.

name,price,stock
"Alien Invasion - How to survive ",13.99,10
"Galactic Empire - Rise of the Tyrants",9.99,5
"Alien Life forms - The Search Continues",15.99,20
"Journey to the center of the Universe",22.99,8

If we execute our code again, we will see that the query will update the existing books, and create the new ones. As we can see, there are 7 books in the database, all with updated price and stock.

>>> list(Book.objects.all().values_list().order_by("id"))
[(1, 'Space Odyssey - The journey begins', Decimal('9.99'), 20),
 (2, 'Alien Invasion - How to survive ', Decimal('13.99'), 10),
 (3, 'Galactic Empire - Rise of the Tyrants', Decimal('9.99'), 5),
 (4, 'Interstellar Travel - The possibilities ', Decimal('11.99'), 25),
 (5, 'The Lost City of the Cosmos', Decimal('14.99'), 5),
 (8, 'Alien Life forms - The Search Continues', Decimal('15.99'), 20),
 (9, 'Journey to the center of the Universe', Decimal('22.99'), 8)]

I hope you find this useful.