Django/モデル検索時の処理を読む（all）

はじめに
django/db/models/query.py
- QuerySet
- ModelIterable
django.db.models.sql.compiler.SQLCompiler
おわりに

はじめに †

「APIで遊んでみる」の冒頭、saveメソッドが最終的にINSERTになりデータベースへの保存が行われました。その後、値を変更してもう一度saveメソッドを呼び出しています。今度はUPDATEが呼び出されるわけですが、この状況では「変更があったフィールドだけUPDATEする」ではなく、全フィールドがUPDATEの対象になりあまり意味がないので飛ばします。とすると次の読解対象は、

# objects.all() displays all the questions in the database.
>>> Question.objects.all()
<QuerySet [<Question: Question object>]>

です。

↑

django/db/models/query.py †

↑

QuerySet †

objectsは前回も少し触れましたがManagerクラスのインスタンスです。つまり、allメソッドはQuerySetクラスに書かれていることになります。

 
-
|
|
!

    def all(self):
        """
        Returns a new QuerySet that is a copy of the current one. This allows a
        QuerySet to proxy for a model manager in some cases.
        """
        return self._clone()

シンプル。allメソッドを呼んだ時点ではQuerySetのコピーが作られるだけであり実際の検索は行われていないことがわかります。_cloneの先は属性をコピーしているだけなので省略します。ていうかQueryオブジェクトのコピー処理記述が長い(笑)

↑

repr †

シェルを使う場合、repr関数を使ってオブジェクトの表示が行われます。というわけで__repr__メソッドを見てみましょう。

    def __repr__(self):
        data = list(self[:REPR_OUTPUT_SIZE + 1])
        if len(data) > REPR_OUTPUT_SIZE:
            data[-1] = "...(remaining elements truncated)..."
        return '<QuerySet %r>' % data

↑

getitem †

__getitem__に進む。初めに書いてある不正値チェックは省略。

 
-
|
!

    def __getitem__(self, k):
        """
        Retrieves an item or slice from the set of results.
        """
        if self._result_cache is not None:
            return self._result_cache[k]
 
        if isinstance(k, slice):
            qs = self._clone()
            if k.start is not None:
                start = int(k.start)
            else:
                start = None
            if k.stop is not None:
                stop = int(k.stop)
            else:
                stop = None
            qs.query.set_limits(start, stop)
            return list(qs)[::k.step] if k.step else qs
 
        qs = self._clone()
        qs.query.set_limits(k, k + 1)
        return list(qs)[0]

kはsliceです。で、

start: None
stop: REPR_OUTPUT_SIZE + 1 = 21
step: None

です。

一応、set_limitsを見ておきましょう。queryはsqlモジュールのQueryオブジェクト、紛らわしいことに、sql.queryモジュール（フルパスで言うとdjango.db.models.sql.query）に書かれています。

 
-
|
|
|
|
|
|
|
!

    def set_limits(self, low=None, high=None):
        """
        Adjusts the limits on the rows retrieved. We use low/high to set these,
        as it makes it more Pythonic to read and write. When the SQL query is
        created, they are converted to the appropriate offset and limit values.
 
        Any limits passed in here are applied relative to the existing
        constraints. So low is added to the current low value and both will be
        clamped to any existing high value.
        """
        if high is not None:
            if self.high_mark is not None:
                self.high_mark = min(self.high_mark, self.low_mark + high)
            else:
                self.high_mark = self.low_mark + high
        if low is not None:
            if self.high_mark is not None:
                self.low_mark = min(self.high_mark, self.low_mark + low)
            else:
                self.low_mark = self.low_mark + low
 
        if self.low_mark == self.high_mark:
            self.set_empty()

無駄にややこしい(笑)。high_markの初期値はNone、low_markの初期値は0です。というわけで、

high_mark: 21
low_mark: 0

という値が設定されます。

↑

iter †

さて、処理はQuerySetに返ってきます。stepは設定されてないのでcloneしたQuerySetオブジェクトが返されて次、__repr__再掲。

    def __repr__(self):
        data = list(self[:REPR_OUTPUT_SIZE + 1])
        if len(data) > REPR_OUTPUT_SIZE:
            data[-1] = "...(remaining elements truncated)..."
        return '<QuerySet %r>' % data

listはiterableを受け付けるので、__iter__が呼ばれるはずです。

 
-
|
|
|
|
|
|
|
|
|
|
|
|
!

    def __iter__(self):
        """
        The queryset iterator protocol uses three nested iterators in the
        default case:
            1. sql.compiler:execute_sql()
               - Returns 100 rows at time (constants.GET_ITERATOR_CHUNK_SIZE)
                 using cursor.fetchmany(). This part is responsible for
                 doing some column masking, and returning the rows in chunks.
            2. sql/compiler.results_iter()
               - Returns one row at time. At this point the rows are still just
                 tuples. In some cases the return values are converted to
                 Python values at this location.
            3. self.iterator()
               - Responsible for turning the rows into model objects.
        """
        self._fetch_all()
        return iter(self._result_cache)

コメントを見るとややこしそうですが、コードを見る限り、レコードを取得してキャッシュに保存、キャッシュからイテレータを作成する、ということが行われているようです。

↑

_fetch_all †

_fetch_allに進みます。

    def _fetch_all(self):
        if self._result_cache is None:
            self._result_cache = list(self.iterator())
        if self._prefetch_related_lookups and not self._prefetch_done:
            self._prefetch_related_objects()

prefetchは無視。iteratorメソッドに進みます。

 
-
|
|
!

    def iterator(self):
        """
        An iterator over the results from applying this QuerySet to the
        database.
        """
        return iter(self._iterable_class(self))

尻尾がつかめたと思ったらまた離れた感があります。_iterable_classのデフォルトはModelIterable。取得フィールド指定したときに処理をすげ替える仕組みのようですね。

↑

ModelIterable †

ModelIterableクラスはdjango/db/models/query.pyの上の方に書かれています。__iter__メソッド、必要なさそうなところは省略、

    def __iter__(self):
        queryset = self.queryset
        db = queryset.db
        compiler = queryset.query.get_compiler(using=db)
        # Execute the query. This will also fill compiler.select, klass_info,
        # and annotations.
        results = compiler.execute_sql()
        select, klass_info, annotation_col_map = (compiler.select, compiler.klass_info,
                                                  compiler.annotation_col_map)
        model_cls = klass_info['model']
        select_fields = klass_info['select_fields']
        model_fields_start, model_fields_end = select_fields[0], select_fields[-1] + 1
        init_list = [f[0].target.attname
                     for f in select[model_fields_start:model_fields_end]]
        related_populators = get_related_populators(klass_info, select, db)
        for row in compiler.results_iter(results):
            obj = model_cls.from_db(db, init_list, row[model_fields_start:model_fields_end])
            if related_populators:
                for rel_populator in related_populators:
                    rel_populator.populate(row, obj)
            if annotation_col_map:
                for attr_name, col_pos in annotation_col_map.items():
                    setattr(obj, attr_name, row[col_pos])
 
            # Add the known related objects to the model, if there are any
            # 省略
 
            yield obj

まずcompiler取得してSQL実行
結果それぞれに対してモデルを取得、yieldで情報を渡す

ということが行われています。ぱっと見な感じだとrelated_populatorsは空な気がしますが詳しく見ていきましょう。

↑

django.db.models.sql.compiler.SQLCompiler †

今回はqueryはQueryオブジェクトなので生成されるコンパイラはSQLCompilerです。このSQLCompilerは抽象クラスというわけではなく、このクラスでSELECTが組み立てられるようです。

execute_sqlメソッドを見てみましょう。長いのでエラー処理とかは省略。

 
-
|
|
|
!
 
 
 
-
!
 
 
 
 
-
!
-
!
 
 
 
 
 
 
-
|
|
!
 
-
!

    def execute_sql(self, result_type=MULTI):
        """
        Run the query against the database and returns the result(s). The
        return value is a single data item if result_type is SINGLE, or an
        iterator over the results if the result_type is MULTI.
        """
        try:
            sql, params = self.as_sql()
        except EmptyResultSet:
            # 省略
 
        cursor = self.connection.cursor()
        try:
            cursor.execute(sql, params)
        except Exception:
            # 省略
 
        # result_typeがMULTI以外の処理
 
        result = cursor_iter(
            cursor, self.connection.features.empty_fetchmany_value,
            self.col_count
        )
        if not self.connection.features.can_use_chunked_reads:
            try:
                # If we are using non-chunked reads, we return the same data
                # structure as normally, but ensure it is all read into memory
                # before going any further.
                return list(result)
            finally:
                # done with the cursor
                cursor.close()
        return result

やっていることは、

SQLを構築して
実行して
結果をイテレータとして返す

ということになります。ちなみに、sqlite3の場合、can_use_chunked_readsがFalseになるのでイテレータが返されるわけじゃなくてここでリスト化されて返されるようです。

↑

as_sql †

as_sqlも長いので必要ないところは省略すると、

    def as_sql(self, with_limits=True, with_col_aliases=False, subquery=False):
        """
        Creates the SQL for this query. Returns the SQL string and list of
        parameters.
 
        If 'with_limits' is False, any limit/offset information is not included
        in the query.
        """
        refcounts_before = self.query.alias_refcount.copy()
        try:
            extra_select, order_by, group_by = self.pre_sql_setup()
 
            # This must come after 'select', 'ordering', and 'distinct' -- see
            # docstring of get_from_clause() for details.
            from_, f_params = self.get_from_clause()
 
            where, w_params = self.compile(self.where) if self.where is not None else ("", [])
            having, h_params = self.compile(self.having) if self.having is not None else ("", [])
            params = []
            result = ['SELECT']
 
            out_cols = []
            for _, (s_sql, s_params), alias in self.select + extra_select:
                params.extend(s_params)
                out_cols.append(s_sql)
 
            result.append(', '.join(out_cols))
 
            result.append('FROM')
            result.extend(from_)
            params.extend(f_params)
 
            if where:
                result.append('WHERE %s' % where)
                params.extend(w_params)
 
            # GROUP BY, HAVING, ORDER BYの処理
 
            if with_limits:
                if self.query.high_mark is not None:
                    result.append('LIMIT %d' % (self.query.high_mark - self.query.low_mark))
                if self.query.low_mark:
                    result.append('OFFSET %d' % self.query.low_mark)
 
            return ' '.join(result), tuple(params)
        finally:
            # Finally do cleanup - get rid of the joins we created above.
            self.query.reset_refcounts(refcounts_before)

省略してもそこそこ長いですが、基本的には淡々とSELECT文を構築しています。なお、今回については条件は設定してないのでwhereは空です。compileメソッド呼び出して実際のWHEREが作られるんだなーということでその先は次回見ます。

↑

pre_sql_setup †

as_sqlメソッドではself.select等の設定が行われていません。というわけでpre_sql_setupに進む。

 
-
|
|
|
!

    def pre_sql_setup(self):
        """
        Does any necessary class setup immediately prior to producing SQL. This
        is for things that can't necessarily be done in __init__ because we
        might not have all the pieces in place at that time.
        """
        self.setup_query()
        order_by = self.get_order_by()
        self.where, self.having = self.query.where.split_having()
        extra_select = self.get_extra_select(order_by, self.select)
        group_by = self.get_group_by(self.select + extra_select, order_by)
        return extra_select, order_by, group_by

さらにsetup_queryに進む。

    def setup_query(self):
        if all(self.query.alias_refcount[a] == 0 for a in self.query.tables):
            self.query.get_initial_alias()
        self.select, self.klass_info, self.annotation_col_map = self.get_select()
        self.col_count = len(self.select)

tablesは今まで出てきてないので空なはず。とはいうものの、all( () )はTrueになるためget_initial_aliasは実行されます。

 
-
|
|
!

    def get_initial_alias(self):
        """
        Returns the first alias for this query, after increasing its reference
        count.
        """
        if self.tables:
            alias = self.tables[0]
            self.ref_alias(alias)
        else:
            alias = self.join(BaseTable(self.get_meta().db_table, None))
        return alias

BaseTableはdjango.db.models.sql.datastructuresモジュールのクラスです。どうやらこのモジュールで結合も考慮したFROMが形作られているようですね。

joinメソッド。といっても結合はしないのでそこら辺はばっさり省略

    def join(self, join, reuse=None):
        alias, _ = self.table_alias(join.table_name, create=True)
        join.table_alias = alias
        self.alias_map[alias] = join
        return alias

うーん、深い。もうそろそろ以下省略でもいいような気はしますがtable_aliasメソッド

 
-
|
|
|
|
|
!
 
 
-
!
-
!
-
!

    def table_alias(self, table_name, create=False):
        """
        Returns a table alias for the given table_name and whether this is a
        new alias or not.
 
        If 'create' is true, a new alias is always created. Otherwise, the
        most recently created alias for the table (if one exists) is reused.
        """
        alias_list = self.table_map.get(table_name)
 
        # Create a new alias for this table.
        if alias_list:
            # 省略
        else:
            # The first occurrence of a table uses the table name directly.
            alias = table_name
            self.table_map[alias] = [alias]
        self.alias_refcount[alias] = 1
        self.tables.append(alias)
        return alias, True

というわけで、aliasといいつつ今回はテーブル名そのものが返されます。ここまでをまとめると、クエリーで使っているテーブル名をオブジェクトに設定するという処理が行われているようです。

↑

get_select †

さて、テーブル名設定するのに随分手間がかかりましたが（※実際には別名考慮したりJOIN考慮したりするためです）やっと取得フィールドを決めていると思われる処理です。SQLCompilerに戻ってget_select。少しわかりにくいですが、特にフィールドを指定していなければdefault_colsがTrueでそれのみが処理されます。

 
-
|
|
|
|
|
|
!

    def get_select(self):
        """
        Returns three values:
        - a list of 3-tuples of (expression, (sql, params), alias)
        - a klass_info structure,
        - a dictionary of annotations
 
        The (sql, params) is what the expression will produce, and alias is the
        "AS alias" for the column (possibly None).
 
        The klass_info structure contains the following information:
        - Which model to instantiate
        - Which columns for that model are present in the query (by
          position of the select clause).
        - related_klass_infos: [f, klass_info] to descent into
 
        The annotations is a dictionary of {'attname': column position} values.
        """
        select = []
        klass_info = None
        annotations = {}
        select_idx = 0
        if self.query.default_cols:
            select_list = []
            for c in self.get_default_columns():
                select_list.append(select_idx)
                select.append((c, None))
                select_idx += 1
            klass_info = {
                'model': self.query.model,
                'select_fields': select_list,
            }
 
        ret = []
        for col, alias in select:
            ret.append((col, self.compile(col, select_format=True), alias))
        return ret, klass_info, annotations

get_default_columnsメソッド。いつも通りに必要ないところは省略。ちなみに、メソッドドキュメントの二段落目は昔の動作説明がそのまま残ってる感じですね（紛らわしいので下記では消しています）

 
-
|
|
!
 
 
 
 
 
 
 
-
|
|
!
 
 
 
-
|
!

    def get_default_columns(self, start_alias=None, opts=None, from_parent=None):
        """
        Computes the default columns for selecting every field in the base
        model. Will sometimes be called to pull in related models (e.g. via
        select_related), in which case "opts" and "start_alias" will be given
        to provide a starting point for the traversal.
        """
        result = []
        if opts is None:
            opts = self.query.get_meta()
        if not start_alias:
            start_alias = self.query.get_initial_alias()
        # The 'seen_models' is used to optimize checking the needed parent
        # alias for a given field. This also includes None -> start_alias to
        # be used by local fields.
        seen_models = {None: start_alias}
 
        for field in opts.concrete_fields:
            model = field.model._meta.concrete_model
            # A proxy model will have a different model and concrete_model. We
            # will assign None if the field belongs to this model.
            if model == opts.model:
                model = None
            alias = self.query.join_parent_model(opts, model, start_alias,
                                                 seen_models)
            column = field.get_col(alias)
            result.append(column)
        return result

queryのjoin_parent_modelはJOIN周りの処理を行っていますが今回の場合単純にstart_aliasと同じもの（つまり、対象となるモデル）が返されます。

fieldのget_colメソッド。場所はdjango/db/models/field/__init__.pyです。最後のcached_colが呼ばれるはず。

    def get_col(self, alias, output_field=None):
        if output_field is None:
            output_field = self
        if alias != self.model._meta.db_table or output_field != self:
            from django.db.models.expressions import Col
            return Col(alias, self, output_field)
        else:
            return self.cached_col
 
    @cached_property
    def cached_col(self):
        from django.db.models.expressions import Col
        return Col(self.model._meta.db_table, self)

Col、また新しい概念が出てきました。メソッド内importにあるようにdjango/db/models/expressions.pyに書かれています。とりあえずクラス定義だけ

 
 
 
-
|
!
 
 
-
|
!
 
 
-
|
|
!

class Col(Expression):
 
class Expression(BaseExpression, Combinable):
    """
    An expression that can be combined with other expressions.
    """
 
class BaseExpression(object):
    """
    Base class for all query expressions.
    """
 
class Combinable(object):
    """
    Provides the ability to combine one or two objects with
    some connector. For example F('foo') + F('bar').
    """

うーん、どうやら式を表現するクラスでColとはColumnのことのようです。

再度SQLCompilerに戻って、compileメソッド

    def compile(self, node, select_format=False):
        vendor_impl = getattr(node, 'as_' + self.connection.vendor, None)
        if vendor_impl:
            sql, params = vendor_impl(self, self.connection)
        else:
            sql, params = node.as_sql(self, self.connection)
        if select_format and not self.subquery:
            return node.output_field.select_format(self, sql, params)
        return sql, params

nodeとは今回の場合Colオブジェクトです。compileメソッドはas_sqlでも呼ばれていました。つまり、SELECTのカラム、WHEREの条件式などがExpressionとして表現され、それのas_sqlを呼び出すことで文字列が取得されるという仕組みのようです。Colクラスのas_sqlは普通なので省略。これでようやくSELECTされるカラム名が取得できました。ちなみに、Fieldクラスのselect_formatはデフォルトでは特に何もしないでsqlとparamsをそのまま返しているようです。

↑

get_from_clause †

ようやくSELECTされる列が処理できたので次にFROMです。例によって一部省略

    def get_from_clause(self):
        """
        Returns a list of strings that are joined together to go after the
        "FROM" part of the query, as well as a list any extra parameters that
        need to be included. Sub-classes, can override this to create a
        from-clause via a "select".
 
        This should only be called after any SQL construction methods that
        might change the tables we need. This means the select columns,
        ordering and distinct must be done first.
        """
        result = []
        params = []
        for alias in self.query.tables:
            try:
                from_clause = self.query.alias_map[alias]
            except KeyError:
                # Extra tables can end up in self.tables, but not in the
                # alias_map if they aren't in a join. That's OK. We skip them.
                continue
            clause_sql, clause_params = self.compile(from_clause)
            result.append(clause_sql)
            params.extend(clause_params)
        return result, params

alias_mapに入っているのは、大分前に見たからもう忘れ気味ですが、今回はBaseTableオブジェクトです。 BaseTableがcompileに渡されていますが（つまり、as_sqlが呼ばれますが）BaseTableはExpressionではありません。ですが、as_sqlがあるので問題はありません。ダックタイピングの利用例ですね:-)

↑

cursor_iter †

こんな感じにSQLが作られて、実行されて、実行後の処理です。execute_sql再掲

 
-
|
|
|
!
 
 
 
-
!
 
 
 
 
-
!
-
!
 
 
 
 
 
 
-
|
|
!
 
-
!

    def execute_sql(self, result_type=MULTI):
        """
        Run the query against the database and returns the result(s). The
        return value is a single data item if result_type is SINGLE, or an
        iterator over the results if the result_type is MULTI.
        """
        try:
            sql, params = self.as_sql()
        except EmptyResultSet:
            # 省略
 
        cursor = self.connection.cursor()
        try:
            cursor.execute(sql, params)
        except Exception:
            # 省略
 
        # result_typeがMULTI以外の処理
 
        result = cursor_iter(
            cursor, self.connection.features.empty_fetchmany_value,
            self.col_count
        )
        if not self.connection.features.can_use_chunked_reads:
            try:
                # If we are using non-chunked reads, we return the same data
                # structure as normally, but ensure it is all read into memory
                # before going any further.
                return list(result)
            finally:
                # done with the cursor
                cursor.close()
        return result

cursor_iter。こいつは関数（正確にはジェネレータ）です。

 
-
|
|
!

def cursor_iter(cursor, sentinel, col_count):
    """
    Yields blocks of rows from a cursor and ensures the cursor is closed when
    done.
    """
    try:
        for rows in iter((lambda: cursor.fetchmany(GET_ITERATOR_CHUNK_SIZE)),
                         sentinel):
            yield [r[0:col_count] for r in rows]
    finally:
        cursor.close()

えーっと、fetchmanyにより返されるのはリストのリスト（1レコードがリスト。複数レコードなのでリストのリスト）です。つまり、rowsはリストのリストです。それを、「for r in rows」で1レコード取り出してyeildしているようですね。なんでさらにスライスしてるんだろう。

↑

results_iter †

長かったexecute_sqlが終わり、どこまで戻ってくるかというとModelIterableまで戻ってきます。再掲

    def __iter__(self):
        queryset = self.queryset
        db = queryset.db
        compiler = queryset.query.get_compiler(using=db)
        # Execute the query. This will also fill compiler.select, klass_info,
        # and annotations.
        results = compiler.execute_sql()
        select, klass_info, annotation_col_map = (compiler.select, compiler.klass_info,
                                                  compiler.annotation_col_map)
        model_cls = klass_info['model']
        select_fields = klass_info['select_fields']
        model_fields_start, model_fields_end = select_fields[0], select_fields[-1] + 1
        init_list = [f[0].target.attname
                     for f in select[model_fields_start:model_fields_end]]
        related_populators = get_related_populators(klass_info, select, db)
        for row in compiler.results_iter(results):
            obj = model_cls.from_db(db, init_list, row[model_fields_start:model_fields_end])
            if related_populators:
                for rel_populator in related_populators:
                    rel_populator.populate(row, obj)
            if annotation_col_map:
                for attr_name, col_pos in annotation_col_map.items():
                    setattr(obj, attr_name, row[col_pos])
 
            # Add the known related objects to the model, if there are any
            # 省略
 
            yield obj

results_iter。

 
-
|
!

    def results_iter(self, results=None):
        """
        Returns an iterator over the results from executing this query.
        """
        converters = None
        if results is None:
            results = self.execute_sql(MULTI)
        fields = [s[0] for s in self.select[0:self.col_count]]
        converters = self.get_converters(fields)
        for rows in results:
            for row in rows:
                if converters:
                    row = self.apply_converters(row, converters)
                yield row

converterは実際動いているようですが、もうここまででいいですよね、長いし(笑)

↑

おわりに †

今回はデータベースの華（？）、検索処理について見てきました。検索条件があるとややこしそうととりあえず単純なallを見てみたわけですが十分にややこしかったです(笑)。 allの場合でもあくまで条件設定がされないだけで実行されるフローは同じでした。その中で、SELECTされる列やFROMされるテーブルの決定は今回のケースでは過剰とも言える処理が行われていました。汎用性を追求するとこうなるんだなという感想です。