Django/テンプレートシステムを読む（テンプレートのパース）

はじめに
django/template/base.py
- Lexer
- Parser
  - 変数
  - ブロック
タグライブラリ
- タグの読み込み
- タグライブラリの仕組み
おわりに

はじめに †

前回のおさらい。django.template.loaderのget_templateを呼び出すと最終的にdjango.template.loaders.baseのLoaderクラスのget_templateメソッドに行きつきます。

 
-
|
|
|
|
!
 
 
 
-
|
!

    def get_template(self, template_name, template_dirs=None, skip=None):
        """
        Calls self.get_template_sources() and returns a Template object for
        the first template matching template_name. If skip is provided,
        template origins in skip are ignored. This is used to avoid recursion
        during template extending.
        """
        tried = []
 
        args = [template_name]
        # RemovedInDjango20Warning: Add template_dirs for compatibility with
        # old loaders
        if func_supports_parameter(self.get_template_sources, 'template_dirs'):
            args.append(template_dirs)
 
        for origin in self.get_template_sources(*args):
            if skip is not None and origin in skip:
                tried.append((origin, 'Skipped'))
                continue
 
            try:
                contents = self.get_contents(origin)
            except TemplateDoesNotExist:
                tried.append((origin, 'Source does not exist'))
                continue
            else:
                return Template(
                    contents, origin, origin.template_name, self.engine,
                )
 
        raise TemplateDoesNotExist(template_name, tried=tried)

今回はこの中の、Templateインスタンス生成の中に入っていきます。

↑

django/template/base.py †

Templateとは何者なのか確認。

from django.template import Origin, Template, TemplateDoesNotExist

django/template/__init__.py

-
!

# Template parts
from .base import (                                                     # NOQA isort:skip
    Context, Node, NodeList, Origin, RequestContext, StringOrigin, Template,
    Variable,
)

うーん、ややこしい。というわけでTemplateクラスはdjango/templateのbase.pyに書かれています。Templateの__init__メソッド

 
 
 
 
 
 
 
-
|
|
|
!

class Template(object):
    def __init__(self, template_string, origin=None, name=None, engine=None):
        try:
            template_string = force_text(template_string)
        except UnicodeDecodeError:
            raise TemplateEncodingError("Templates can only be constructed "
                                        "from unicode or UTF-8 strings.")
        # If Template is instantiated directly rather than from an Engine and
        # exactly one Django template engine is configured, use that engine.
        # This is required to preserve backwards-compatibility for direct use
        # e.g. Template('...').render(Context({...}))
        if engine is None:
            from .engine import Engine
            engine = Engine.get_default()
        if origin is None:
            origin = Origin(UNKNOWN_SOURCE)
        self.name = name
        self.origin = origin
        self.engine = engine
        self.source = template_string
        self.nodelist = self.compile_nodelist()

compile_nodelistメソッドによりコンパイルが行われるようです。

 
-
|
|
|
|
!

    def compile_nodelist(self):
        """
        Parse and compile the template source into a nodelist. If debug
        is True and an exception occurs during parsing, the exception is
        is annotated with contextual line information where it occurred in the
        template source.
        """
        if self.engine.debug:
            lexer = DebugLexer(self.source)
        else:
            lexer = Lexer(self.source)
 
        tokens = lexer.tokenize()
        parser = Parser(
            tokens, self.engine.template_libraries, self.engine.template_builtins,
            self.origin,
        )
 
        try:
            return parser.parse()
        except Exception as e:
            if self.engine.debug:
                e.template_debug = self.get_exception_info(e, e.token)
            raise

デバッグ用の処理が少し追加されていますが、典型的なコンパイル処理になっています。すなわち、

Lexerでトークン化
Parserでトークンからノードに変換

それぞれ見ていきましょう。

↑

Lexer †

Lexerは同じbase.py内に書かれています。

 
-
|
!

    def tokenize(self):
        """
        Return a list of tokens from a given template_string.
        """
        in_tag = False
        lineno = 1
        result = []
        for bit in tag_re.split(self.template_string):
            if bit:
                result.append(self.create_token(bit, None, lineno, in_tag))
            in_tag = not in_tag
            lineno += bit.count('\n')
        return result

tag_reはbase.pyの先頭の方に書かれています。{%, %}で囲むことでブロック（いわゆるタグ）、{{, }}で変数参照を表します。 ※}}を含むのでシンタックスハイライトなしで貼ってます

BLOCK_TAG_START = '{%'
BLOCK_TAG_END = '%}'
VARIABLE_TAG_START = '{{'
VARIABLE_TAG_END = '}}'
COMMENT_TAG_START = '{#'
COMMENT_TAG_END = '#}'

# match a variable or block tag and capture the entire tag, including start/end
# delimiters
tag_re = (re.compile('(%s.*?%s|%s.*?%s|%s.*?%s)' %
          (re.escape(BLOCK_TAG_START), re.escape(BLOCK_TAG_END),
           re.escape(VARIABLE_TAG_START), re.escape(VARIABLE_TAG_END),
           re.escape(COMMENT_TAG_START), re.escape(COMMENT_TAG_END))))

正規表現オブジェクトのsplitの挙動ですが以下のように記載されています。

キャプチャグループの丸括弧が pattern で使われていれば、
パターン内のすべてのグループのテキストも結果のリストの一部として返されます。

実際、splitを実行してみると以下のようにテンプレートのタグ部分、タグ以外の部分、というようにsplitされます。

['', '{% if latest_question_list %}',
 '\n    <ul>\n    ', '{% for question in latest_question_list %}',
 '\n        <li><a href="/polls/', '{{ question.id }}',
 '/">', '{{ question.question_text }}',
 '</a></li>\n    ', '{% endfor %}',
 '\n    </ul>\n', '{% else %}',
 '\n    <p>No polls are available.</p>\n', '{% endif %}']

create_tokenではタグの種類によってTokenオブジェクトを作成しています。

 
-
|
|
|
!
 
-
|
|
|
!

    def create_token(self, token_string, position, lineno, in_tag):
        """
        Convert the given token string into a new Token object and return it.
        If in_tag is True, we are processing something that matched a tag,
        otherwise it should be treated as a literal string.
        """
        if in_tag and token_string.startswith(BLOCK_TAG_START):
            # The [2:-2] ranges below strip off *_TAG_START and *_TAG_END.
            # We could do len(BLOCK_TAG_START) to be more "correct", but we've
            # hard-coded the 2s here for performance. And it's not like
            # the TAG_START values are going to change anytime, anyway.
            block_content = token_string[2:-2].strip()
            if self.verbatim and block_content == self.verbatim:
                self.verbatim = False
        if in_tag and not self.verbatim:
            if token_string.startswith(VARIABLE_TAG_START):
                token = Token(TOKEN_VAR, token_string[2:-2].strip(), position, lineno)
            elif token_string.startswith(BLOCK_TAG_START):
                if block_content[:9] in ('verbatim', 'verbatim '):
                    self.verbatim = 'end%s' % block_content
                token = Token(TOKEN_BLOCK, block_content, position, lineno)
            elif token_string.startswith(COMMENT_TAG_START):
                content = ''
                if token_string.find(TRANSLATOR_COMMENT_MARK):
                    content = token_string[2:-2].strip()
                token = Token(TOKEN_COMMENT, content, position, lineno)
        else:
            token = Token(TOKEN_TEXT, token_string, position, lineno)
        return token

verbatimはテンプレートの処理を無効化する（つまり、タグが書かれていてもTOKEN_TEXTとして扱う）ものです。

さて、対象のtoken_stringがタグなのかそうじゃないのかを表すin_tagですが、呼び出し元のtokenizeメソッドでループを回るたびにFalse→True→Falseと反転するようになっています。何故これでいいのか不思議な感じですが先のsplit例を見ると、テキスト→タグ→テキストと分割されていたので単純に反転させるだけでタグなのかテキストなのかを判定できるようです。

↑

Parser †

Parserもbase.py内に書かれています。parseメソッドは長いので重要なところだけ抜き出し。

↑

変数 †

            elif token.token_type == 1:  # TOKEN_VAR
                if not token.contents:
                    raise self.error(token, 'Empty variable tag on line %d' % token.lineno)
                try:
                    filter_expression = self.compile_filter(token.contents)
                except TemplateSyntaxError as e:
                    raise self.error(token, e)
                var_node = VariableNode(filter_expression)
                self.extend_nodelist(nodelist, var_node, token)

compile_filterはFilterExpressionオブジェクト生成しているだけなのでFilterExpressionの__init__メソッドを見てみます（注目部分のみ抜き出し）

    def __init__(self, token, parser):
        self.token = token
        matches = filter_re.finditer(token)
        var_obj = None
        filters = []
        for match in matches:
            if var_obj is None:
                var, constant = match.group("var", "constant")
                if constant:
                    # 省略
                elif var is None:
                    # 省略
                else:
                    var_obj = Variable(var)
            else:
                # 省略
 
        self.filters = filters
        self.var = var_obj

filter_reはFilterExpressionクラス定義のすぐ上にあります。フィルター引数まで考慮した正規表現なのでなかなかすごいことになっていますが、%記法をうまいこと使って階層的に記述が行われています。

ともかく変数は、

VariableNode
  FilterExpression
    Variable

というオブジェクトの構造になるようです。

↑

ブロック †

 
 
 
 
 
 
-
|
|
!
 
-
|
|
!
-
|
!
 
 
 
-
|
!
 
 
 
 
-
!

            elif token.token_type == 2:  # TOKEN_BLOCK
                try:
                    command = token.contents.split()[0]
                except IndexError:
                    raise self.error(token, 'Empty block tag on line %d' % token.lineno)
                if command in parse_until:
                    # A matching token has been reached. Return control to
                    # the caller. Put the token back on the token list so the
                    # caller knows where it terminated.
                    self.prepend_token(token)
                    return nodelist
                # Add the token to the command stack. This is used for error
                # messages if further parsing fails due to an unclosed block
                # tag.
                self.command_stack.append((command, token))
                # Get the tag callback function from the ones registered with
                # the parser.
                try:
                    compile_func = self.tags[command]
                except KeyError:
                    self.invalid_block_tag(token, command, parse_until)
                # Compile the callback into a node object and add it to
                # the node list.
                try:
                    compiled_result = compile_func(self, token)
                except Exception as e:
                    raise self.error(token, e)
                self.extend_nodelist(nodelist, compiled_result, token)
                # Compile success. Remove the token from the command stack.
                self.command_stack.pop()

二つのことが行われています。

コマンドに対するタグオブジェクトを取得して呼び出す。結果をNodeListに加える
終わりのコマンドに来たら処理を終了してreturnする。見た感じ、1つ目の呼び出しがparseを再帰呼び出ししている雰囲気

というわけでここからは各論です。

↑

タグライブラリ †

↑

タグの読み込み †

チュートリアルのテンプレート例ではifやforが使われていました。言わずもがなこれらはビルトインのタグです。これらがいつ準備されてか確認していきましょう。逆算的にまずはParserのコンストラクタ

    def __init__(self, tokens, libraries=None, builtins=None, origin=None):
        self.tokens = tokens
        self.tags = {}
        self.filters = {}
        self.command_stack = []
 
        if libraries is None:
            libraries = {}
        if builtins is None:
            builtins = []
 
        self.libraries = libraries
        for builtin in builtins:
            self.add_library(builtin)
        self.origin = origin

ややこしいですがadd_libraryメソッドでtagsへの追加が行われています。

  1
  2
  3

    def add_library(self, lib):
        self.tags.update(lib.tags)
        self.filters.update(lib.filters)

次にParserを作っている部分。初めの方に見たdjango.template.base.Templateのcompile_nodelistメソッドです。

        parser = Parser(
            tokens, self.engine.template_libraries, self.engine.template_builtins,
            self.origin,
        )

engineは、django.template.loaders.base.Loaderから渡されています。

  1
  2
  3

                return Template(
                    contents, origin, origin.template_name, self.engine,
                )

今度はLoaderにengineを渡した相手を探します。前回飛ばしたところ、Loaderオブジェクトを生成しているところで渡されています（ここでのselfはdjango.template.engine.Engineです）

    def find_template_loader(self, loader):
        if isinstance(loader, (tuple, list)):
            args = list(loader[1:])
            loader = loader[0]
        else:
            args = []
 
        if isinstance(loader, six.string_types):
            loader_class = import_string(loader)
            return loader_class(self, *args)
        else:
            raise ImproperlyConfigured(
                "Invalid value in template loaders configuration: %r" % loader)

というわけで、engineとはdjango.template.engine.Engineです。

template_builtinsに設定されているものを確認。

class Engine(object):
    default_builtins = [
        'django.template.defaulttags',
        'django.template.defaultfilters',
        'django.template.loader_tags',
    ]
 
    def __init__(self, dirs=None, app_dirs=False, context_processors=None,
                 debug=False, loaders=None, string_if_invalid='',
                 file_charset='utf-8', libraries=None, builtins=None, autoescape=True):
        # 省略
        if builtins is None:
            builtins = []
 
        # 省略
        self.builtins = self.default_builtins + builtins
        self.template_builtins = self.get_template_builtins(self.builtins)

get_template_builtinsはimportしてるだけなので、django.template.defaulttagsを見てみましょう。

↑

タグライブラリの仕組み †

defaulttags.py内の前半はタグに対応していると思われるノードのクラス定義、後半がタグ処理関数の定義です。処理関数の方を確認

 
 
-
!
 
 
 
 
 
-
|
!
-
!
 
 
 
 
-
!

@register.tag('if')
def do_if(parser, token):
    # {% if ... %}
    bits = token.split_contents()[1:]
    condition = TemplateIfParser(parser, bits).parse()
    nodelist = parser.parse(('elif', 'else', 'endif'))
    conditions_nodelists = [(condition, nodelist)]
    token = parser.next_token()
 
    # {% elif ... %} (repeatable)
    # 省略
 
    # {% else %} (optional)
    if token.contents == 'else':
        nodelist = parser.parse(('endif',))
        conditions_nodelists.append((None, nodelist))
        token = parser.next_token()
 
    # {% endif %}
    assert token.contents == 'endif'
 
    return IfNode(conditions_nodelists)

TemplateIfParserの中に立ち入ると長くなるので省略します。ともかくこれで以下のようなnodelistができることになります。

[
  (ifに書かれている式を表すオブジェクト, ifがTrueの場合に使われるNodeList),
  (None, elseの場合に使われるNodeList)
]

なお、@registerのregisterはdjango.template.libraryのLibraryオブジェクトです。

register = Library()

Libraryクラスのtagメソッド

 
 
-
!
 
 
-
!
 
-
!
 
 
 
-
!
 
 
-

    def tag(self, name=None, compile_function=None):
        if name is None and compile_function is None:
            # @register.tag()
            return self.tag_function
        elif name is not None and compile_function is None:
            if callable(name):
                # @register.tag
                return self.tag_function(name)
            else:
                # @register.tag('somename') or @register.tag(name='somename')
                def dec(func):
                    return self.tag(name, func)
                return dec
        elif name is not None and compile_function is not None:
            # register.tag('somename', somefunc)
            self.tags[name] = compile_function
            return compile_function
        else:
            # 省略

今回の場合、nameがnot Noneでstr、compile_functionがNoneです。つまり、

elifの1つ目、さらにelseに進み関数内関数のdecが返される。これはただの関数呼び出しです
デコレータの機能としてdecが呼び出される。tagメソッドが2引数で実行される
elifの2つ目の部分が実行されtagsに登録される

という動作をします。

↑

おわりに †

さて、テンプレートファイルの読み込み、解析まで見てきました。これで後はテンプレートのレンダリングを残すのみです。というわけで次回に続く。