Django has a sophisticated system to trigger certain logic when you save, delete, change many-to-many relations, etc. Very often people make use of such signals and for some edge-cases these are indeed the only effective solution, but there are only a few cases where using signals is appropriate.
Why is it a problem?
Signals have a variety of problems and unforeseen consequences. In the below sections, we list a few.
Signals can be circumvented
One of the main problems with signals is that signals
do not always run. Indeed the
pre_save
and post_save
signals
will not run when we save or update objects in
bulk. For example if we create multiple
Post
s with:
Post.objects.bulk_create([
Post(title='foo'),
Post(title='bar'),
Post(title='qux')
])
then the signals do not run. The same happens when you update posts, for example with:
Post.objects.all().update(views=0)
Often people assume that signals will run in that case, and for example perform calculations with the signals: they recalculate a certain field, based on the updated values. Since one can update a field without triggering the the corresponding signals, then this results in an inconsistent value. Signals thus give a false sense of security that the handler will indeed update the object accordingly.
Signals are request unaware
Every now and then people try to implement a signal that needs a request object to perform a certain piece of logic. For example one might want to send an email each time a certain model is created to the person that made the HTTP request.
Django's signal processing however does not capture the request. This makes sense since these signals get fired by models that are created, updated, removed, etc. and models are request unaware as well.
The fact that these are request unaware makes it
harder to implement certain behavior. For example a
signal that updates the owner
field of a
model object with the currently logged in user. Strictly
speaking it is possible to implement this, for
example by inspecting the call stack. If the signal is
triggered by a view, eventually one will find a call to
that view, and thus one can obtain the
request
object. Another way to do this
would be to implement middleware that keeps track of the
user that makes the request, and then uses that, but
that will introduce a global state
antipattern.
This thus means that while technically there are some ways to make signals request-aware these solutions are often ugly and furthermore can fail if a management command triggers the change.
Signals can raise exceptions and break the code flow
If the signals run, for example when we call
.save()
on a model object, then the
triggers will run. Contrary to popular belief,
signals do not run asynchronous, but in a
synchronous manner: there is a list of functions and
these will all run. A second problem is that these
signals might raise an error, and this will thus result
in the function that triggered the views, raising that
error. Developers often do not take this into
account.
If such error is raised, then eventually the
.save()
call will raise an error. Even if
the developer takes this into account, it is hard to
anticipate on the consequences: if there are multiple
handlers for the same signal, then some of the handlers
can have made changes whereas others might not have been
invoked. It thus makes it more complicated to repair the
object, since the handlers might already have changed
the object partially.
Signals can result in infinite recursion
It is also rather easy to get stuck in an infinite
loop with signals. If we for example have a model of a
Profile
with a signal that will remove the
User
if we remove the
Profile
:
from django.db import models
from django.db.models.signals import pre_delete
from django.dispatch import receiver
class Profile(models.Model):
user = models.OneToOneField(
settings.AUTH_USER_MODEL,
on_delete=models.CASCADE
)
# …
@receiver(pre_delete, sender=Profile)
def delete_profile(sender, instance, using):
instance.user.delete()
If we now remove a Profile
, this will
get stuck in an infinite loop. Indeed, first we start
removing a Profile
. This will trigger the
signal to run, which will remove the related user
object. But Django will look what to do when removing
the user, and it thus will first remove the
Profile
again triggering the signal. It is
easy to end up with infinite recursion when defining
signals. Especially if we use signals on two models that
are related to each other.
Signals run before updating many-to-many relations
The pre_save
and post_save
signals of an object run immediately before and after an
object is saved to the database. If we have a model with
a ManyToManyField
, then when we create that
object and the signals run, the
ManyToManyField
is not yet
populated. This is because a ModelForm
first needs to create the object, before that object has
a primary key and thus can start populating the
many-to-many relation. If we for example have two models
Author
and Book
with a
many-to-many relation, and we want to use a signal that
counts the number of books an Author
has
written, then the following signal will not work when we
create an Author
, and the form also to
specify the books:
from django.db.models.signals import pre_save
from django.dispatch import receiver
@receiver(pre_save, sender=Author)
def save_author(sender, instance, created, raw, using, update_fields):
instance.num_books = instance.books.count()
Regardless whether we use a pre_save
or
post_save
signal, at that moment in time
instance.books.all()
is an empty
queryset.
Signals make altering objects less predictable
Even if only one handler is attached to the the signal, and that handler can never raise an error, the handler still is often not an elegant solution. Another developer might not be aware of its existence, since it has only a "weak" binding to the model, and thus it makes the effect of saving an object less predictable.
Signals do not run in data migrations
One can construct a data migration [Django-doc], such migration could populate a database table, for example:
from django.apps import apps as global_apps
from django.db import migrations
def forwards(apps, schema_editor):
= apps.get_model('app_name', 'ModelName')
ModelName
ModelName.objects.create(='foo', last_name='Bar'
first_name
)
class Migration(migrations.Migration):
= [
operations
migrations.RunPython(forwards, migrations.RunPython.noop),
]= [
dependencies 'app_name', '1234_other_migration'),
( ]
here we thust create a ModelName
record.
We however use a historical model: that means
the ModelName
with the fields defined on
the model at the state after applying all dependencies,
not per se the current ModelName
model. Regardless, whether that model is equivalent to
current model, it is quite limited. One of the
consequences is that signals are not attached
to that model, and thus if there is for example a
post_save
signal defined on
ModelName
, than these signals, nor the
historical nor the current ones will run, and hence the
anticipated effects of these signals will thus
not take place.
Signals do not run when other programs make changes
Finally other programs can also make changes to the
database, and thus will not trigger the signals, and
this eventually could lead to the database being in an
inconsistent state. Another program could for example
create a new book for an author, but might not update
the field in the Author
model that keeps
track of the number of books written by that author. It
will be quite hard to "translate" all the handlers in
Django to other programs that interact with the same
database.
What can be done to resolve the problem?
Often it is better to avoid using signals. One can implement a lot of logic without signals.
Calculating properties on-demand
The most robust way to count the number of
Book
s of an Author
is
not to store the number of books in a field,
but use .annotate(…)
[Django-doc]
to each time annotate the Author
s with the
number of Book
s they have written. We thus
can make a query that looks like:
from django.db.models import Count
Author.objects.annotate(
num_books=Count('books')
)
Often if the number of Book
s is not that
large, this will still scale quite well. It is more
robust: if somehow another program removed a book, or a
view was triggered that somehow circumvented the update
logic, it will still work with the correct amount of
books.
Here of course we each time recalculate the number of
Book
s per Author
when we
query. If the number of Book
s and
Author
s grows, then this can become a
performance bottleneck.
Encapsulating update logic in the view/form and ModelAdmin
Another option might be to encapsulate the handler
logic in a specific function. For example if we want to
count the number of books of an Author
each
time we save/update a Book
, we can
implement the logic:
def update_book(book):
= book.author
author = author.books.count()
author.num_books author.save()
and then we can call this function in the views where we create/update the book. For example:
def my_view(request):
if request.method == 'POST':
form = BookForm(request.POST, request.FILES)
if form.is_valid():
book = form.save()
update_book(book)
# …
# …
# …
we can also construct a mixin that we can use in
class-based views and the ModelAdmin
:
from django.contrib import admin
class MyModelAdmin(admin.ModelAdmin):
def save_model(self, request, obj, form, change):
update_book(obj)
super().save_model(request, obj, form, change)
If the task takes too much time, you can set up a queue where a message is queued that will then trigger a task to update the data. This is however not something specific to encapsulate logic into a function: if you work with signals, then these signals can go in timeout as well, and thus render the server irresponsive.
Extra tips
Signals can still be a good solution if you want to
handle events raised by a third party Django
application. In many cases, this is the only effective
way to handle certain events. For example the
auth
module provides signals
when the user logs in, logs out, or fails to log
in [Django-doc] these signals are
typically more reliable, since these are not triggered
by the ORM. Often for third party applications signals
are an effective way to communicate with these
applications.
If you define signals, you should be aware that these will not run in data migrations. You will thus have to write the logic that is otherwise done by the signals in the data migration file as well.