Fork me on GitHub

It often happens that we have a list of strings, and we want to obtain model objects where a certain field contains at least one of the given strings. Often this is also checked in a case-insensitive way, such that 'apple' and 'APPLE' are considered equivalent.

Usually we have a list of items that we want to match with that field, for example:

fruits = ['apple', 'blueberry', 'coconut', 'dragonfruit']

an for example look for Post objects where the content field contains at least one of these elements in a case-insensitive way.

What problems are solved with this?

We can not make use of the __in lookup [Django-doc] since this will only match items that contain exactly the name of one fruit in case-sensitive way. This thus means that the content should be 'apple', not 'APPLE', 'Apple', 'An apple', etc.:

Post.objects.filter(
    content__in=fruits
)

will thus only retrieve posts with one word: the name of a fruit defined in the fruits list.

With the pattern described here, we can perform a case-insensitive match where we can decide if the item should start with the name of a fruit, end with the name of a fruit, or simply contain the name of a fruit.

What does this pattern look like?

Regular expressions can be used to look for multiple items with one expression. The only thing that we have to do is convert our fruits list to a regular expression that will simultaneously look for an 'apple', 'blueberry', etc. Such regular expression looks like '(apple|blueberry|coconut|dragonfruit)'. If we want to restrict this further such that the content begins with an element of the fruits list, we can use a caret (^): '^(apple|blueberry|coconut|dragonfruit)' will only match contents that start with one of the elements. We can also make use of the end anchor $ such that the content ends with the given item.

There are however some problems we will need to overcome. It is for example possible that the name of the fruits contain a dot (.), pipe character (|), etc. If we join items simply together, then it is possible that we thus will match different items. One can use the escape(…) function [python-doc] to escape the items in the fruits list such that tokens that have a special meaning in a regex are escaped.

If we want to match case-insensitive, this means we should use the __iregex lookup [Django-doc]. If we thus have a list fruits with the name of the fruits, we can construct a regex and filter with:

from re import escape

myregex = f'^({"|".join(escape(fruit) for fruit in fruits)})$'

Post.objects.filter(
    content__iregex=myregex
)

The ^ and/or $ can be removed if it is not required that the item is found at start/end of the content respectively.