Django/モデル検索時の処理を読む（filter、単独テーブル）の変更点

追加された行はこの色です。
削除された行はこの色です。
Django/モデル検索時の処理を読む（filter、単独テーブル）へ行く。
Django/モデル検索時の処理を読む（filter、単独テーブル）の差分を削除
 [[Djangoを読む]]
 
 #contents
 
 *はじめに [#s96ea057]
 
 前回はobjects.all()について見ました。条件がない場合でもかなり長かったですがおかげで処理フローについてはもう見なくていいかなという感じです。
 
 というわけで、今回は条件指定、filterを呼び出した時（と、それに対応したSQL構築）を見ていきましょう。具体的には、
 
  >>> Question.objects.filter(question_text__startswith='What')
  <QuerySet [<Question: What's up?>]>
 
 について見ていきます。
 
 *QuerySet [#mb5e499d]
 
 というわけで例によってQuerySetクラスから開始です。
 
 #code(Python){{
     def filter(self, *args, **kwargs):
         """
         Returns a new QuerySet instance with the args ANDed to the existing
         set.
         """
         return self._filter_or_exclude(False, *args, **kwargs)
 }}
 
 すぐ下にはexcludeが書いてあって、ほぼ同じ処理なので共通化しているようですね。
 
 #code(Python){{
     def _filter_or_exclude(self, negate, *args, **kwargs):
         clone = self._clone()
         if negate:
             clone.query.add_q(~Q(*args, **kwargs))
         else:
             clone.query.add_q(Q(*args, **kwargs))
         return clone
 }}
 
 Q。非常に怪しげです(笑)
 
 Qクラスはdjango/db/models/query_utils.pyに書かれています。
 
 #code(Python){{
 class Q(tree.Node):
     """
     Encapsulates filters as objects that can then be combined logically (using
     `&` and `|`).
     """
     # Connection types
     AND = 'AND'
     OR = 'OR'
     default = AND
 
     def __init__(self, *args, **kwargs):
         super(Q, self).__init__(children=list(args) + list(kwargs.items()))
 }}
 
 treeモジュールはdjango.utilsにあります。
 
 #code(Python){{
 class Node(object):
     """
     A single internal node in the tree graph. A Node should be viewed as a
     connection (the root) with the children being either leaf nodes or other
     Node instances.
     """
 
     def __init__(self, children=None, connector=None, negated=False):
         """
         Constructs a new Node. If no connector is given, the default will be
         used.
         """
         self.children = children[:] if children else []
         self.connector = connector or self.default
         self.negated = negated
 }}
 
 普通のツリーのノードです。
 
 *Query [#sbd7e47d]
 
 というわけで、Qオブジェクトが何者なのかわかったのでQueryクラスに移りましょう。add_qメソッド
 
 #code(Python){{
     def add_q(self, q_object):
         """
         A preprocessor for the internal _add_q(). Responsible for doing final
         join promotion.
         """
         # For join promotion this case is doing an AND for the added q_object
         # and existing conditions. So, any existing inner join forces the join
         # type to remain inner. Existing outer joins can however be demoted.
         # (Consider case where rel_a is LOUTER and rel_a__col=1 is added - if
         # rel_a doesn't produce any rows, then the whole condition must fail.
         # So, demotion is OK.
         existing_inner = set(
             (a for a in self.alias_map if self.alias_map[a].join_type == INNER))
         clause, _ = self._add_q(q_object, self.used_aliases)
         if clause:
             self.where.add(clause, AND)
         self.demote_joins(existing_inner)
 }}
 
 alias_mapは前回も出てきましたが、今の状況では空なはずなので無視。
 また、used_aliasesも現時点では空です。
 
 _add_qに進む前に、whereが何者なのか確認しておきます。Queryクラスの__init__メソッド
 
 #code(Python){{
     def __init__(self, model, where=WhereNode):
         # 省略
         self.where = where()
         self.where_class = where
 }}
 
 WhereNodeはdjango/db/models/sql/where.pyに定義されています。
 
 #code(Python){{
 class WhereNode(tree.Node):
 }}
 
 というわけで、WHEREもツリーとして表されているようです。
 
 **_add_q [#we0a9eae]
 
 では改めて_add_qメソッド
 
 #code(Python){{
     def _add_q(self, q_object, used_aliases, branch_negated=False,
                current_negated=False, allow_joins=True, split_subq=True):
         """
         Adds a Q-object to the current filter.
         """
         connector = q_object.connector
         current_negated = current_negated ^ q_object.negated
         branch_negated = branch_negated or q_object.negated
         target_clause = self.where_class(connector=connector,
                                          negated=q_object.negated)
         joinpromoter = JoinPromoter(q_object.connector, len(q_object.children), current_negated)
         for child in q_object.children:
             if isinstance(child, Node):
                 child_clause, needed_inner = self._add_q(
                     child, used_aliases, branch_negated,
                     current_negated, allow_joins, split_subq)
                 joinpromoter.add_votes(needed_inner)
             else:
                 child_clause, needed_inner = self.build_filter(
                     child, can_reuse=used_aliases, branch_negated=branch_negated,
                     current_negated=current_negated, connector=connector,
                     allow_joins=allow_joins, split_subq=split_subq,
                 )
                 joinpromoter.add_votes(needed_inner)
             if child_clause:
                 target_clause.add(child_clause, connector)
         needed_inner = joinpromoter.update_join_types(self)
         return target_clause, needed_inner
 }}
 
 うーん、いきなりややこしくなった。とりあえずnegatedと名のつくものについては、今回の場合は全部Falseです。
 次にJoinPromoter、今回のケースではJOINは起こらないので無視してしまっていいでしょう。
 というわけで、ポイントとなるのは以下になります。
 
 #code(Python){{
         for child in q_object.children:
             if isinstance(child, Node):
                 # こっちじゃない
             else:
                 child_clause, needed_inner = self.build_filter(
                     child, can_reuse=used_aliases, branch_negated=branch_negated,
                     current_negated=current_negated, connector=connector,
                     allow_joins=allow_joins, split_subq=split_subq,
                 )
                 joinpromoter.add_votes(needed_inner)
             if child_clause:
                 target_clause.add(child_clause, connector)
 }}
 
 build_filterメソッドを呼んでる引数を確認しておくと次の通りです。
 
 :child|('question_text__startswith', 'What')
 :can_reuse|set()
 :branch_negated|False
 :current_negated|False
 :connector|'AND'
 :allow_joins|True
 :split_subq|True
 
 **build_filter [#f76b4e25]
 
 build_filterは長いのでところどころ省略
 
 #code(Python){{
     def build_filter(self, filter_expr, branch_negated=False, current_negated=False,
                      can_reuse=None, connector=AND, allow_joins=True, split_subq=True):
         arg, value = filter_expr
         lookups, parts, reffed_expression = self.solve_lookup_type(arg)
 
         # Work out the lookup type and remove it from the end of 'parts',
         # if necessary.
         value, lookups, used_joins = self.prepare_lookup_value(value, lookups, can_reuse, allow_joins)
 
         clause = self.where_class()
 
         opts = self.get_meta()
         alias = self.get_initial_alias()
         allow_many = not branch_negated or not split_subq
 
         try:
             field, sources, opts, join_list, path = self.setup_joins(
                 parts, opts, alias, can_reuse=can_reuse, allow_many=allow_many)
         except MultiJoin as e:
             # 省略
 
         if can_reuse is not None:
             can_reuse.update(join_list)
         used_joins = set(used_joins).union(set(join_list))
         targets, alias, join_list = self.trim_joins(sources, join_list, path)
 
         if field.is_relation:
             # 省略
         else:
             col = targets[0].get_col(alias, field)
             condition = self.build_lookup(lookups, col, value)
             lookup_type = condition.lookup_name
 
         clause.add(condition, AND)
 
         require_outer = lookup_type == 'isnull' and value is True and not current_negated
         return clause, used_joins if not require_outer else ()
 }}
 
 filterでは関連オブジェクトのフィールドが○○の値、という条件も書けるので実際にはJOINが起こらないような場合でもJOINが起こるかも、という前提で処理が行われるようです。上記のうち、掘り下げた方がよさそうなのは次の5メソッドです。
 
 -solve_lookup_type
 -prepare_lookup_value
 -setup_joins
 -trim_joins
 -build_lookup
 
 ***solve_lookup_type [#f762283c]
 
 ではまずsolve_lookup_typeです。一部省略
 
 #code(Python){{
     def solve_lookup_type(self, lookup):
         """
         Solve the lookup type from the lookup (eg: 'foobar__id__icontains')
         """
         lookup_splitted = lookup.split(LOOKUP_SEP)
         _, field, _, lookup_parts = self.names_to_path(lookup_splitted, self.get_meta())
         field_parts = lookup_splitted[0:len(lookup_splitted) - len(lookup_parts)]
         if len(lookup_parts) == 0:
             lookup_parts = ['exact']
         return lookup_parts, field_parts, False
 }}
 
 LOOKUP_SEPは'__'、渡されているのは'question_text__startswith'なので、['question_text', 'startswith']と分離されます。
 
 で、下請けのnames_to_pathが呼ばれるわけですがこれがまた長いので通るところの要点だけ抽出
 
 #code(Python){{
     def names_to_path(self, names, opts, allow_many=True, fail_on_missing=False):
         """
         Walks the list of names and turns them into PathInfo tuples. Note that
         a single name in 'names' can generate multiple PathInfos (m2m for
         example).
 
         'names' is the path of names to travel, 'opts' is the model Options we
         start the name resolving from, 'allow_many' is as for setup_joins().
         If fail_on_missing is set to True, then a name that can't be resolved
         will generate a FieldError.
 
         Returns a list of PathInfo tuples. In addition returns the final field
         (the last used join field), and target (which is a field guaranteed to
         contain the same value as the final field). Finally, the method returns
         those names that weren't found (which are likely transforms and the
         final lookup).
         """
         path, names_with_path = [], []
         for pos, name in enumerate(names):
             field = None
             try:
                 field = opts.get_field(name)
             except FieldDoesNotExist:
                 # 省略
 
             if hasattr(field, 'get_path_info'):
                 # 省略
             else:
                 # Local non-relational field.
                 final_field = field
                 targets = (field,)
                 break
         return path, final_field, targets, names[pos + 1:]
 }}
 
 関連オブジェクトのフィールドを参照しない場合は結局これだけです。solve_lookup_typeでは結局戻り値の4つ目しか使わず、['startswith']と検索条件が分離されて返されることになります。最終的にsolve_lookup_typeは、
 
  ['startswith'], ['question_text'], False
 
 という情報を返すことになります。
 
 ***prepare_lookup_value [#eb43ff2b]
 
 次にprepare_lookup_valueです。いつも通り一部省略
 
 #code(Python){{
     def prepare_lookup_value(self, value, lookups, can_reuse, allow_joins=True):
         # Default lookup if none given is exact.
         used_joins = []
         # Interpret '__exact=None' as the sql 'is NULL'; otherwise, reject all
         # uses of None as a query value.
         if value is None:
             if lookups[-1] not in ('exact', 'iexact'):
                 raise ValueError("Cannot use None as a query value")
             lookups[-1] = 'isnull'
             value = True
         return value, lookups, used_joins
 }}
 
 valueがNoneの時はSQLをIS NULLにする処理が行われていますが、それ以外をしている場合はそのまま返されているだけです（実際にはもう少しprepareな処理がされています）。つまり、
 
  'What', ['startswith'], []
 
 という情報が返されます。
 
 ***setup_joins [#ea9d2415]
 
 setup_joinsが呼ばれる前に、前回も出てきたget_initial_aliasメソッドが呼ばれています。つまり、検索のベースとなるテーブルはすでに設定された状態になっています。
 
 で、setup_joins。とは言うものの、今回はJOINに関係しているところは無視します（実際、for文は回りません）。メソッド説明はちょっと長いけどそのまま掲載
 
 #code(Python){{
     def setup_joins(self, names, opts, alias, can_reuse=None, allow_many=True):
         """
         Compute the necessary table joins for the passage through the fields
         given in 'names'. 'opts' is the Options class for the current model
         (which gives the table we are starting from), 'alias' is the alias for
         the table to start the joining from.
 
         The 'can_reuse' defines the reverse foreign key joins we can reuse. It
         can be None in which case all joins are reusable or a set of aliases
         that can be reused. Note that non-reverse foreign keys are always
         reusable when using setup_joins().
 
         If 'allow_many' is False, then any reverse foreign key seen will
         generate a MultiJoin exception.
 
         Returns the final field involved in the joins, the target field (used
         for any 'where' constraint), the final 'opts' value, the joins and the
         field path travelled to generate the joins.
 
         The target field is the field containing the concrete value. Final
         field can be something different, for example foreign key pointing to
         that value. Final field is needed for example in some value
         conversions (convert 'obj' in fk__id=obj to pk val using the foreign
         key field for example).
         """
         joins = [alias]
         # First, generate the path for the names
         path, final_field, targets, rest = self.names_to_path(
             names, opts, allow_many, fail_on_missing=True)
 
         # Then, add the path to the query's joins. Note that we can't trim
         # joins at this stage - we will need the information about join type
         # of the trimmed joins.
         for join in path:
             # 省略
         return final_field, targets, opts, joins, path
 }}
 
 結局、names_to_pathの戻り値がほぼそのまま返されます。
 
  question_textのCharField, (question_textのCharField,), QuestionのOptions, [QuestionのBaseTable], []
 
 ***trim_joins [#u97d9401]
 
 trim_joins。同じくfor文は無視します。
 
 #code(Python){{
     def trim_joins(self, targets, joins, path):
         """
         The 'target' parameter is the final field being joined to, 'joins'
         is the full list of join aliases. The 'path' contain the PathInfos
         used to create the joins.
 
         Returns the final target field and table alias and the new active
         joins.
 
         We will always trim any direct join if we have the target column
         available already in the previous table. Reverse joins can't be
         trimmed as we don't know if there is anything on the other side of
         the join.
         """
         joins = joins[:]
         for pos, info in enumerate(reversed(path)):
             # 省略
         return targets, joins[-1], joins
 }}
 
  (question_textのCharField,), QuestionのBaseTable, [QuestionのBaseTable]
 
 が返されます。
 
 ***build_lookup [#s13390bf]
 
 最後にbuild_lookupです。
 
 #code(Python){{
     def build_lookup(self, lookups, lhs, rhs):
         """
         Tries to extract transforms and lookup from given lhs.
 
         The lhs value is something that works like SQLExpression.
         The rhs value is what the lookup is going to compare against.
         The lookups is a list of names to extract using get_lookup()
         and get_transform().
         """
         lookups = lookups[:]
         while lookups:
             name = lookups[0]
             # If there is just one part left, try first get_lookup() so
             # that if the lhs supports both transform and lookup for the
             # name, then lookup will be picked.
             if len(lookups) == 1:
                 final_lookup = lhs.get_lookup(name)
                 if not final_lookup:
                     # We didn't find a lookup. We are going to interpret
                     # the name as transform, and do an Exact lookup against
                     # it.
                     lhs = self.try_transform(lhs, name, lookups)
                     final_lookup = lhs.get_lookup('exact')
                 return final_lookup(lhs, rhs)
             lhs = self.try_transform(lhs, name, lookups)
             lookups = lookups[1:]
 }}
 
 lhsはCharFieldなのでとget_lookupを探しても定義はありません。が、
 
 #code(Python){{
 class Field(RegisterLookupMixin):
 }}
 
 というわけでこちらに書いてありそうです。query_utilsモジュールなので見てみると、
 
 #code(Python){{
     def get_lookup(self, lookup_name):
         from django.db.models.lookups import Lookup
         found = self._get_lookup(lookup_name)
         if found is None and hasattr(self, 'output_field'):
             return self.output_field.get_lookup(lookup_name)
         if found is not None and not issubclass(found, Lookup):
             return None
         return found
 
     def _get_lookup(self, lookup_name):
         try:
             return self.class_lookups[lookup_name]
         except KeyError:
             # 省略
         except AttributeError:
             # This class didn't have any class_lookups
             pass
         return None
 }}
 
 class_lookupsへの登録はクラスメソッドとして定義されています。
 
 #code(Python){{
     @classmethod
     def register_lookup(cls, lookup, lookup_name=None):
         if lookup_name is None:
             lookup_name = lookup.lookup_name
         if 'class_lookups' not in cls.__dict__:
             cls.class_lookups = {}
         cls.class_lookups[lookup_name] = lookup
         return lookup
 }}
 
 次の疑問は、「仕組みはわかったけど、じゃあいつlookupが登録されるの？」ということですが、Lookupクラスがインポートされるときのようです。
 
 #code(Python){{
 class StartsWith(PatternLookup):
     lookup_name = 'startswith'
     prepare_rhs = False
 
     def process_rhs(self, qn, connection):
         rhs, params = super(StartsWith, self).process_rhs(qn, connection)
         if params and not self.bilateral_transforms:
             params[0] = "%s%%" % connection.ops.prep_for_like_query(params[0])
         return rhs, params
 Field.register_lookup(StartsWith)
 }}
 
 というわけで、startswithに対応するLookupが登録されます。
 で、StartsWithオブジェクトが作られてようやくfilterメソッド呼び出しから始まるオブジェクト構築が完了のようです。
 
 構築されたStartsWithオブジェクトはbuild_filter→_add_q→add_qと戻る過程で毎回メソッドローカルのWhereNodeにaddされていますが、多段になるわけではなく以下のようにself.where直下にStartsWithが来ることになります。
 
  self.where
    StartsWith(lhs=question_textのCol, rhs='What')
 
 *WhereNode [#c85bf6f1]
 
 オブジェクトが構築できたら、後は前回見たように処理が行われ、self.whereのcompileが行われます。つまり、as_sqlが呼ばれます。
 
 #code(Python){{
     def as_sql(self, compiler, connection):
         """
         Returns the SQL version of the where clause and the value to be
         substituted in. Returns '', [] if this node matches everything,
         None, [] if this node is empty, and raises EmptyResultSet if this
         node can't match anything.
         """
         result = []
         result_params = []
         if self.connector == AND:
             full_needed, empty_needed = len(self.children), 1
         else:
             full_needed, empty_needed = 1, len(self.children)
 
         for child in self.children:
             try:
                 sql, params = compiler.compile(child)
             except EmptyResultSet:
                 empty_needed -= 1
             else:
                 if sql:
                     result.append(sql)
                     result_params.extend(params)
                 else:
                     full_needed -= 1
             # 省略
         conn = ' %s ' % self.connector
         sql_string = conn.join(result)
         if sql_string:
             if self.negated:
                 # Some backends (Oracle at least) need parentheses
                 # around the inner SQL in the negated case, even if the
                 # inner SQL contains just a single expression.
                 sql_string = 'NOT (%s)' % sql_string
             elif len(result) > 1:
                 sql_string = '(%s)' % sql_string
         return sql_string, result_params
 }}
 
 子ノードをcompileしてつなげているだけです。
 
 **StartsWith [#e915ee33]
 
 では子ノードのStartsWithについて見ていきましょう。まずは継承関係
 
 #code(Python){{
 class StartsWith(PatternLookup):
 
 class PatternLookup(BuiltinLookup):
 
 class BuiltinLookup(Lookup):
 }}
 
 BuildinLookupまで行くとas_sqlメソッドが書かれています。
 
 #code(Python){{
     def as_sql(self, compiler, connection):
         lhs_sql, params = self.process_lhs(compiler, connection)
         rhs_sql, rhs_params = self.process_rhs(compiler, connection)
         params.extend(rhs_params)
         rhs_sql = self.get_rhs_op(connection, rhs_sql)
         return '%s %s' % (lhs_sql, rhs_sql), params
 }}
 
 process_lhsはas_sqlのすぐ上に書かれています。DBMSに応じた処理がされていますが結局のところ、Colオブジェクトがcompileされて'"polls_question"."question_text"'が得られます。
 
 process_rhs。StartsWithでオーバーライドされています。bilateral_transformsは空リストなはず→結論を先取りしますがparamsは['What']なので%が足されて['What%']、つまり、前方一致になります。
 
 #code(Python){{
     def process_rhs(self, qn, connection):
         rhs, params = super(StartsWith, self).process_rhs(qn, connection)
         if params and not self.bilateral_transforms:
             params[0] = "%s%%" % connection.ops.prep_for_like_query(params[0])
         return rhs, params
 }}
 
 で親クラス、Lookupの方のprocess_rhs。ただの文字列なのでget_db_prep_lookupが呼ばれます。
 
 #code(Python){{
     def process_rhs(self, compiler, connection):
         value = self.rhs
         if self.bilateral_transforms:
             # 省略
         # Due to historical reasons there are a couple of different
         # ways to produce sql here. get_compiler is likely a Query
         # instance, _as_sql QuerySet and as_sql just something with
         # as_sql. Finally the value can of course be just plain
         # Python value.
         if hasattr(value, 'get_compiler'):
             value = value.get_compiler(connection=connection)
         if hasattr(value, 'as_sql'):
             sql, params = compiler.compile(value)
             return '(' + sql + ')', params
         if hasattr(value, '_as_sql'):
             sql, params = value._as_sql(connection=connection)
             return '(' + sql + ')', params
         else:
             return self.get_db_prep_lookup(value, connection)
 
     def get_db_prep_lookup(self, value, connection):
         return ('%s', [value])
 }}
 
 get_rhs_opはPatternLookupにも書かれていますがBuiltinLookup（親クラス）のメソッドに流れるはず。
 
 #code(Python){{
     def get_rhs_op(self, connection, rhs):
         return connection.operators[self.lookup_name] % rhs
 }}
 
 sqlite3の場合、operatorsは以下の通りです。つまり、LIKEを使って前方一致が実現されています。
 
 #code(Python){{
     operators = {
         'exact': '= %s',
         'iexact': "LIKE %s ESCAPE '\\'",
         'contains': "LIKE %s ESCAPE '\\'",
         'icontains': "LIKE %s ESCAPE '\\'",
         'regex': 'REGEXP %s',
         'iregex': "REGEXP '(?i)' || %s",
         'gt': '> %s',
         'gte': '>= %s',
         'lt': '< %s',
         'lte': '<= %s',
         'startswith': "LIKE %s ESCAPE '\\'",
         'endswith': "LIKE %s ESCAPE '\\'",
         'istartswith': "LIKE %s ESCAPE '\\'",
         'iendswith': "LIKE %s ESCAPE '\\'",
     }
 }}
 
 最終的に、WhereNodeがコンパイルされると以下の文字列（とパラメータリスト）になります。
 
  '"polls_question"."question_text" LIKE %s ESCAPE \'\\\'', ['What%']
 
 *おわりに [#id175bb9]
 
 今回はfilterメソッドを使い検索条件が指定された時の処理を見てきました。前回に引き続き、記述をオブジェクトとして表現するということが徹底されていて相当難解になっていました。一度オブジェクトが構築されればcompileについては機械的な変換です。
 
 今回（と前回）、JOINについては実際には発生しないのでばっさり省略していますが、この部分もちゃんと見ないとなという感想です。とりあえずその前に、次回はモデルの逆参照（参照されてる側のモデルが集約してるモデルへのアクセス経路を持つ）について見ていく予定です。
Django/モデル検索時の処理を読む（filter、単独テーブル） の変更点

Django/モデル検索時の処理を読む（filter、単独テーブル）の変更点