Commons Digesterの仕組みを読む

はじめに
Digester.addXXXXXXXX()
Digester.parse()
SetNestedPropertiesRule
おわりに

はじめに †

Commons DigesterはXMLファイルをJavaオブジェクトにマッピングするツールです。Digesterでは、

d.addObjectCreate("address-book/person", Person.class);
d.addSetProperties("address-book/person");
d.addSetNext("address-book/person", "addPerson");        
d.addCallMethod("address-book/person/email", "addEmail", 2);
d.addCallParam("address-book/person/email", 0, "type");
d.addCallParam("address-book/person/email", 1);
d.addSetNestedProperties("address-book/person/address");

のようにどのタグに行き当たったら何をするかのルールを設定した後、

AddressBook book = new AddressBook();
d.push(book);
File srcfile = new File(filename);
d.parse(srcfile);

とするとオブジェクトが構築されます(上記のコードはDigesterのソースに添付されているexamples/api/addressbookです)。ここら辺をどうやっているのかを読んでいきましょう。

なお、今回読んだDigesterのバージョンは1.8です。

↑

Digester.addXXXXXXXX() †

まず、ルールの追加部分を見てみましょう。

public void addObjectCreate(String pattern, Class clazz) {
    addRule(pattern,
            new ObjectCreateRule(clazz));
}

public void addSetProperties(String pattern) {
    addRule(pattern,
            new SetPropertiesRule());
}

ということで各メソッドは個別のRule実装クラスを構築しルールを登録しているみたいです。addRule()はこんな感じ、

public void addRule(String pattern, Rule rule) {
    rule.setDigester(this);
    getRules().add(pattern, rule);
}

getRules()はRulesインターフェースを実装しているRulesBaseオブジェクトを返しています。というわけでRulesBase.add()を見てみましょう。

public void add(String pattern, Rule rule) {
    ...略...
    List list = (List) cache.get(pattern);
    if (list == null) {
        list = new ArrayList();
        cache.put(pattern, list);
    }
    list.add(rule);
    rules.add(rule);
    ...略...
}

cacheはHashMap、rulesはArrayListのインスタンス変数です。cacheはRules.match()で該当ルールを素早く検索するために、rulesはRules.rules()を実装するために用意しているようです。

↑

Digester.parse() †

それでは次にparse()を見ていきましょう。parse()はいろいろなバリエーションがありますがどれもInputSourceオブジェクトを構築し、

getXMLReader().parse(is);

としています。XMLRedaerはjavax.xml.sax.XMLReaderです。どうやらDigesterはSAXを使ってXMLからJavaオブジェクトを構築しているようです。

というわけでSAXのイベントハンドラメソッドを見てみましょう。まず、パーサがXMLの各開始タグを読み込むと呼ばれるstartElement()です。

public void startElement(String namespaceURI, String localName,
                         String qName, Attributes list)
        throws SAXException {
    ...略...
    // Save the body text accumulated for our surrounding element
    bodyTexts.push(bodyText);
    ...略...
    bodyText = new StringBuffer();

    // the actual element name is either in localName or qName, depending 
    // on whether the parser is namespace aware
    String name = localName;
    ...略...

    // Compute the current matching rule
    StringBuffer sb = new StringBuffer(match);
    if (match.length() > 0) {
        sb.append('/');
    }
    sb.append(name);
    match = sb.toString();
    ...略...

    // Fire "begin" events for all relevant rules
    List rules = getRules().match(namespaceURI, match);
    matches.push(rules);
    if ((rules != null) && (rules.size() > 0)) {
        ...略...
        for (int i = 0; i < rules.size(); i++) {
            try {
                Rule rule = (Rule) rules.get(i);
                ...略...
                rule.begin(namespaceURI, name, list);
            } catch (Exception e) {
                ...略...
            }
        }
    } else {
       ...略...
    }
}

startElement()では、

現在のパスを構築し、
パスに対するルールを検索し、
各ルールのbegin()メソッドを呼び出す

ということをしているようです。

次に、タグ内部のテキストを読み込むと呼ばれるdharacters()を見てみましょう。

public void characters(char buffer[], int start, int length)
       throws SAXException {
       ...略...
    bodyText.append(buffer, start, length);
}

なんの変哲もありません。ところで、bodyTextはStringBufferオブジェクトなわけですが何故appendしているのでしょう？理由はcharacters()は何回呼ばれるかわからないためです。詳しくはこちらを参照してください。

最後に、終了タグを読み込むと呼ばれるendElement()です。

public void endElement(String namespaceURI, String localName,
                       String qName) throws SAXException {
    ...略...
    // the actual element name is either in localName or qName, depending 
    // on whether the parser is namespace aware
    String name = localName;
    ...略...

    // Fire "body" events for all relevant rules
    List rules = (List) matches.pop();
    if ((rules != null) && (rules.size() > 0)) {
        String bodyText = this.bodyText.toString();
        ...略...
        for (int i = 0; i < rules.size(); i++) {
            try {
                Rule rule = (Rule) rules.get(i);
                ...略...
                rule.body(namespaceURI, name, bodyText);
            } catch (Exception e) {
                ...略...
            }
        }
    } else {
        ...略...
    }

    // Recover the body text from the surrounding element
    bodyText = (StringBuffer) bodyTexts.pop();
    ...略...

    // Fire "end" events for all relevant rules in reverse order
    if (rules != null) {
        for (int i = 0; i < rules.size(); i++) {
            int j = (rules.size() - i) - 1;
            try {
                Rule rule = (Rule) rules.get(j);
                ...略...
                rule.end(namespaceURI, name);
            } catch (Exception e) {
                ...略...
            }
        }
    }

    // Recover the previous match expression
    int slash = match.lastIndexOf('/');
    if (slash >= 0) {
        match = match.substring(0, slash);
    } else {
        match = "";
    }
}

というわけで、endDocument()が呼ばれるまで内部テキストが確定しないため、各ルールのbody()メソッドはendDocument()で呼んでいるようです。また、うっかり見落としそうなところですがend()メソッドは登録の逆順、つまり、最後に登録されたルールから呼ばれています。forループで逆回しを書けばいいのにと思うのは私だけでしょうか？

以上がDigester.parse()の中身です。parse()は枠を用意しているだけで各パスに対して何を行うかはすべてルールに任されているようです。Template Methodパターンですね。

↑

SetNestedPropertiesRule †

各ルールはリフレクションを使ってオブジェクトを作成したりメソッドを呼び出したりしています。その中でSetNestedPropertiesRuleが興味深かったので紹介します。begin()メソッドとbodyメソッドは以下のようになっています。

public void begin(String namespace, String name, Attributes attributes) 
                  throws Exception {
    Rules oldRules = digester.getRules();
    AnyChildRule anyChildRule = new AnyChildRule();
    anyChildRule.setDigester(digester);
    AnyChildRules newRules = new AnyChildRules(anyChildRule);
    newRules.init(digester.getMatch()+"/", oldRules);
    digester.setRules(newRules);
}

public void body(String bodyText) throws Exception {
    AnyChildRules newRules = (AnyChildRules) digester.getRules();
    digester.setRules(newRules.getOldRules());
}

というわけで、begin()メソッドが呼ばれるとルール一覧をすり替えるという処理を行っています。ところでルール一覧の復元はbody()メソッドよりもend()メソッドの方がふさわしい気がするのですがなんでbody()でやっているのでしょう？

次に、AddAnyChildRulesのmatch()メソッドを見てみましょう。

public List match(String namespaceURI, String matchPath) {
    List match = decoratedRules.match(namespaceURI, matchPath);
    if ((matchPath.startsWith(matchPrefix)) &&
        (matchPath.indexOf('/', matchPrefix.length()) == -1)) {
        // The current element is a direct child of the element
        // specified in the init method, so we want to ensure that
        // the rule passed to this object's constructor is included
        // in the returned list of matching rules.
        if ((match == null || match.size()==0)) {
            // The "real" rules class doesn't have any matches for
            // the specified path, so we return a list containing
            // just one rule: the one passed to this object's
            // constructor.
            return rules;
        }
        else {
            // The "real" rules class has rules that match the current
            // node, so we return this list *plus* the rule passed to
            // this object's constructor.
            //
            // It might not be safe to modify the returned list,
            // so clone it first.
            LinkedList newMatch = new LinkedList(match);
            newMatch.addLast(rule);
            return newMatch;
        }
    }            
    else {
        return match;
    }
}

というわけで、指定パス直下の子要素の場合はAnyChildRuleが返されるようになっています。これにより、子要素の内部テキストがスタックトップのオブジェクトに設定されます。また、元々のルールもちゃんと返されています。これはDecoratorパターンになっています。

↑

おわりに †

今回はCommons Digesterのオブジェクト構築方法を学びました。読んだ感想としては、

フレームワークはシンプルに。個別の処理は個別のクラスで
されど拡張性（ルールのすり替えとか）をちゃんと用意

といったところです。それではみなさんもよいコードリーディングを。