ものがたり(旧)

atsushieno.hatenablog.com に続く

the "true" checklists: XML Performance

MicrosoftがPatterns & Practicesとかいうタイトルで何か書いてることは知ってたけど、XMLパフォーマンスに関するネタまで書いているとは知らなかった。

で、まず僕はこれらの執筆者を誰も知らないのだけど、ひどい内容である。こんなのが本当だと信じてチンタラ動くコードを書かされる人がかわいそうなので、ここにまともなチェックリストを載せておく。

  • Design Considerations
    • Avoid XML as long as possible.
    • Avoid processing large documents.
    • Avoid validation. XmlValidatingReader is 2-3x slower than XmlTextReader.
    • Avoid DTD, especially IDs and entity references.
    • Use streaming interfaces.
    • Consider hard coded processing, including validation.
    • Shorten node name length.
    • Consider sharing NameTable, but only when names are likely to be really common. With more and more irrelevant names, it becomes slower and slower.
  • Parsing XML
    • Use XmlTextReader and avoid validating readers.
    • When node is required, consider using XmlDocument.ReadNode(), not the entire Load().
    • Set null for XmlResolver property on some XmlReaders to avoid access to external resources.
    • Make full use of MoveToContent() and Skip(). They avoids extraneous name creation. However, it becomes almost nothing when you use XmlValidatingReader.
    • Avoid accessing Value for Text/CDATA nodes as long as possible.
  • Validating XML
    • Avoid extraneous validation.
    • Consider caching schemas.
    • Avoid identity constraint usage. Not only because it stores key/fields for the entire document, but also because the keys are boxed.
    • Avoid extraneous strong typing. It results in XmlSchemaDatatype.ParseValue(). It could also result in avoiding access to Value string.
  • Writing XML
    • Write output directly as long as possible.
    • To save documents, XmlTextWriter without indentation is better than TextWriter/Stream/file output (all indented) except for human reading.
  • DOM Processing
    • Avoid InnerXml. It internally creates XmlTextReader/XmlTextWriter. InnerText is fine.
    • Avoid PreviousSibling. XmlDocument is very inefficient for backward traverse.
    • Append nodes as soon as possible. Adding a big subtree results in longer extraneous run to check ID attributes.
    • Prefer FirstChild/NextSibling and avoid to access ChildNodes. It creates XmlNodeList which is initially not instantiated.
  • XPath Processing
    • Consider to use XPathDocument but only when you need the entire document. With XmlDocument you can use ReadNode() but no equivalent for XPathDocument.
    • Avoid preceding-sibling and preceding axes queries, especially over XmlDocument. They would result in sorting, and for XmlDocument they need access to PreviousSibling.
    • Avoid // (descendant). The returned nodes are mostly likely to be irrelevant.
    • Avoid position(), last() and positional predicates (especially something like foo[last()-1]).
    • Compile XPath string to XPathExpression and reuse it for frequent query.
    • Don't run XPath query frequently. It is costy since it always have to Clone() XPathNavigators.
  • XSLT Processing
    • Reuse (cache) XslTransform objects.
    • Avoid id() and key() in XSLT. They can return all kind of nodes that prevents node-type based optimization.
    • Avoid document() especially with dynamic argument.
    • PushPull style query is usually better than template match.
    • Minimize output size. More importantly, minimize input.

内容は一部MSのとかぶってるけど、一部は全く反対のことを書いている。おもしろいかもしれないから、後でmonogatariの方に載せておこう。今日はDTLLの話を書いてしまったのでナシ。