Fork me on GitHub

People sometimes calculate the number of records by using len(…). For example, if we have a model where comments relate to a post, we can obtain the number of comments with:

number_of_comments = len(Comment.objects.filter(post_id=id_of_post))

Why is it a problem?

Because it is inefficient. It means that Django will evaluate the QuerySet and thus load all records into memory. When this is done, it will determine the number of records. But this thus means that if there are ten comments, we first will retrieve ten records, deserialize these, and then look at the length of the list. This means the database will send a lot of data to the Django/Python layer.

If we do not iterate over the same queryset later on, then all that data has been retrieved without any use. Indeed, the database itself can determine the number of records. This will not only minimize the bandwidth, but a database often will determine the number of records more efficiently through indexing mechanisms than counting the individual records.

What can be done to resolve the problem?

A QuerySet has a .count() method [Django-doc]. This will make a COUNT(*) FROM … query such that the database will determine the number of records. We thus can transform query at the top to:

number_of_comments = Comment.objects.filter(post_id=id_of_post).count()

This is always more efficient, unless we later iterate over the QuerySet. Indeed, if we would determine the number of comments and then iterate over it, for example to print the comments, using len(…) is more efficient, because then we retrieve and count the number of records with the same query:

# we here iterate over the same queryset, so this is more
# efficient.

qs = Comment.objects.filter(post_id=id_of_post)

number_of_comments = len(qs)
for comment in qs:
    print(comment)