We can chain querysets together with the chain(…)
function [python-doc] of the itertools
package [python-doc]. For example if we
have two Post
s, we can chain the posts with
a publish_date
and then the ones where the
publish_date
is NULL
:
from itertools import chain
qs1 = Post.objects.filter(publish_date__isnull=False).order_by('publish_date')
qs2 = Post.objects.filter(publish_date=None)
result = chain(qs1, qs2)
Why is it a problem?
The main problem is that the result is not a
QuerySet
, but a chain
object.
This means that all methods offered by a
QuerySet
can no longer be used. Indeed, say
that we want to filter the Post
s with:
result.filter(author=some_author)
then this will raise an error. Often such filtering
is not done explicitly in the view, but for
example by a FilterSet
the developer wants
to use.
Another problem is that a chain
can not
be enumerated multiple times. Indeed:
>>> c = chain([1,4], [2,5])
>>> list(c)
[1, 4, 2, 5]
>>> list(c)
[]
This thus means if multiple for
loops
are used, only the first will iterate over the elements.
We can work with list(…)
, and thus use
result = list(chain(qs1, qs2))
to prevent
this effect.
Another problem is that result
will
eventually perform multiple queries. In this example
there will be two queries. If we chain however
five querysets together, it results in (at least) five
queries. This thus makes it more expensive.
What can be done to resolve the problem?
Group the queries together into a single queryset. If
the order is of no importance, we can make use of the
|
operator:
result = qs1 | qs2
if the order is of importance, we can make use of .union(…)
[Django-doc]:
qs1.union(qs2, all=True)
Extra tips
We can use chain(…)
when we query for
example different models like:
from itertools import chain
qs1 = Post.objects.all()
qs2 = Author.objects.all()
result = list(chain(qs1, qs2))
But it is seldomly the case that a collection
contains elements of a different type. Especially since
very often processing Post
s will be
different from processing Author
s.