People sometimes calculate the number of records by
using len(…)
. For example, if we have a
model where comments relate to a post, we can obtain the
number of comments with:
number_of_comments = len(Comment.objects.filter(post_id=id_of_post))
Why is it a problem?
Because it is inefficient. It means that Django will
evaluate the QuerySet
and thus
load all records into memory. When this is done, it will
determine the number of records. But this thus means
that if there are ten comments, we first will retrieve
ten records, deserialize these, and then look at the
length of the list. This means the database will send a
lot of data to the Django/Python layer.
If we do not iterate over the same queryset later on, then all that data has been retrieved without any use. Indeed, the database itself can determine the number of records. This will not only minimize the bandwidth, but a database often will determine the number of records more efficiently through indexing mechanisms than counting the individual records.
What can be done to resolve the problem?
A QuerySet
has a .count()
method [Django-doc]. This will make a
COUNT(*) FROM …
query such that the
database will determine the number of records. We thus
can transform query at the top to:
number_of_comments = Comment.objects.filter(post_id=id_of_post).count()
This is always more efficient, unless we
later iterate over the QuerySet
.
Indeed, if we would determine the number of comments and
then iterate over it, for example to print the comments,
using len(…)
is more efficient, because
then we retrieve and count the number of records with
the same query:
# we here iterate over the same queryset, so this is more
# efficient.
qs = Comment.objects.filter(post_id=id_of_post)
number_of_comments = len(qs)
for comment in qs:
print(comment)