Django update or create objects in bulk
Django's update_or_create
QuerySet method is a convenience method for updating an object or creating a new one based on a set of criteria. This allows you to update an existing object in the database or create a new one if it doesn't already exist, without having to write separate code for both cases.
This is fine when you are working on a single object, but does not apply when you need to update multiple objects at once. From the documentation, we can see that Django offers some bulk methods (in_bulk
, bulk_create
, and bulk_update
) but it is not immediately apparent how to achieve the update_or_create
operation.
Motivational example
Let's define an example to show the motivation behind this post. Imagine a toy application where a bookstore keeps a registry for their books. This contains the name of the book (let's assume this to be unique), the price of the book, and the amount of books in stock. The model could be something like:
# models.py
class Book(models.Model):
name = models.CharField(max_length=100, unique=True)
price = models.DecimalField(max_digits=6, decimal_places=2)
stock = models.IntegerField()
Every so often, as the bookstore's web department, we receive a list of books in a CSV file. The list contains both new books that arrived in the bookstore, and existing books with updated information (the price or stock has changed). Here is an example of such a list:
name,price,stock
"Space Odyssey - The journey begins",9.99,20
"Alien Invasion - How to survive ",12.99,30
"Galactic Empire - Rise of the Tyrants",14.99,10
"Interstellar Travel - The possibilities ",11.99,25
"The Lost City of the Cosmos",14.99,5
In order to update the bookstore application we need to build the logic to import this list into the database. So we check the name of the book, if it exists then update the price and its stock numbers, or if it does not exist, create a new entry. We could of course iterate through the list and use update_or_create
but that would run a query for every single book, and if the list is long, this process will be wasteful and slow. This is where the need of bulk update_or_create
comes in!
PostgreSQL approach
Before we talk about the Django way of doing this, let's first see how to achieve this with PostgreSQL. After some researching, I found ON_CONFLICT
clause on INSERT
and the documentation says:
The optional ON CONFLICT clause specifies an alternative action to raising a unique violation or exclusion constraint violation error. For each individual row proposed for insertion, either the insertion proceeds, or, if an arbiter constraint or index specified by conflict_target is violated, the alternative conflict_action is taken. ON CONFLICT DO NOTHING simply avoids inserting a row as its alternative action. ON CONFLICT DO UPDATE updates the existing row that conflicts with the row proposed for insertion as its alternative action.
Now back to our example. Based on the above excerpt, we set the conflict_target
as the name
column, and the conflict_action
to update the price and stock. If we pass the list of books to the INSERT statement, then if a book does not exist, it will run the insert process normally. But if the same name
is found, then it will trigger the update to run and therefore the existing book record will be updated.
Implementing in Django
In Django we can utilize the above through the bulk_create
method. bulk_create
works by passing a list of objects as arguments and it will create them in bulk in the database. However, in Django 4.1, the arguments update_fields
and unique_fields
were added, which enable the ON CONFLICT functionality of PostgreSQL. In our example, we need to pass a list of the books as objects, and then set name
as the unique_fields
, and price
stock
as the update_fields
.
The code would look something like this:
import csv
from bookstore.models import Book
with open('books.csv', newline='') as csvfile:
reader = csv.DictReader(csvfile)
books = [Book(name=row['name'], price=row['price'], stock=row['stock']) for row in reader]
Book.objects.bulk_create(books, update_fields=['price', 'stock'], unique_fields=['name'])
If we check the generated SQL query, we can see that the ON CONFLICT
clause is used:
INSERT INTO "bookstore_book" ("name", "price", "stock")
VALUES ('Space Odyssey - The journey begins', 9.99, 20),
('Alien Invasion - How to survive ', 12.99, 30),
('Galactic Empire - Rise of the Tyrants', 14.99, 10),
('Interstellar Travel - The possibilities ', 11.99, 25),
('The Lost City of the Cosmos', 14.99, 5) ON CONFLICT ("name") DO
UPDATE
SET "price" = EXCLUDED."price",
"stock" = EXCLUDED."stock"
Assuming our database is empty, the query will just create the five books.
>>> list(Book.objects.all().values_list())
[(1, 'Space Odyssey - The journey begins', Decimal('9.99'), 20),
(2, 'Alien Invasion - How to survive ', Decimal('12.99'), 30),
(3, 'Galactic Empire - Rise of the Tyrants', Decimal('14.99'), 10),
(4, 'Interstellar Travel - The possibilities ', Decimal('11.99'), 25),
(5, 'The Lost City of the Cosmos', Decimal('14.99'), 5)]
Let's assume that a new CSV file is received which includes both old books with updated price or stock, and new books.
name,price,stock
"Alien Invasion - How to survive ",13.99,10
"Galactic Empire - Rise of the Tyrants",9.99,5
"Alien Life forms - The Search Continues",15.99,20
"Journey to the center of the Universe",22.99,8
If we execute our code again, we will see that the query will update the existing books, and create the new ones. As we can see, there are 7 books in the database, all with updated price and stock.
>>> list(Book.objects.all().values_list().order_by("id"))
[(1, 'Space Odyssey - The journey begins', Decimal('9.99'), 20),
(2, 'Alien Invasion - How to survive ', Decimal('13.99'), 10),
(3, 'Galactic Empire - Rise of the Tyrants', Decimal('9.99'), 5),
(4, 'Interstellar Travel - The possibilities ', Decimal('11.99'), 25),
(5, 'The Lost City of the Cosmos', Decimal('14.99'), 5),
(8, 'Alien Life forms - The Search Continues', Decimal('15.99'), 20),
(9, 'Journey to the center of the Universe', Decimal('22.99'), 8)]
I hope you find this useful.