{"id":321,"date":"2011-03-22T13:56:44","date_gmt":"2011-03-22T13:56:44","guid":{"rendered":"http:\/\/www.ericwhite.com\/home2\/bm8qcmjy\/public_html\/blog\/?p=321"},"modified":"2011-04-08T05:45:47","modified_gmt":"2011-04-08T05:45:47","slug":"generating-open-xml-wordprocessingml-documents-using-xpath-expressions-in-content-controls","status":"publish","type":"post","link":"https:\/\/www.ericwhite.com\/blog\/2011\/03\/22\/generating-open-xml-wordprocessingml-documents-using-xpath-expressions-in-content-controls\/","title":{"rendered":"Generating Open XML WordprocessingML Documents using XPath Expressions in Content Controls"},"content":{"rendered":"<p>Over the last few days, I have completed a new prototype of an approach to Open XML WordprocessingML document generation. In this approach, I control the document generation process by placing XPath expressions in content controls. In contrast, the previous approach in this series of posts on document generation was controlled by writing C# code in content controls.<\/p>\n<p>This post is the 13th in a series of blog posts on generating Open XML documents. Here is the complete list: <a href=\"https:\/\/www.ericwhite.com\/blog\/map\/generating-open-xml-wordprocessingml-documents-blog-post-series\/\">Generating Open XML WordprocessingML Documents Blog Post Series<\/a><\/p>\n<p>When I started down this path of discovery around document generation, I would not have predicted it, but the <strong>XPath-in-Content-Controls<\/strong> approach is, in my opinion, <strong><span style=\"text-decoration: underline;\">much superior<\/span><\/strong> to the <strong>C#-in-Content-Controls<\/strong> approach. Going forward, I am going to abandon the <strong>C#-in-Content-Controls<\/strong> approach, and focus on this approach using XPath. There are some very cool places that we can take this approach.<\/p>\n<p>To compare and contrast, the <strong>C#-in-Content-Controls<\/strong> prototype consists of less than 400 lines of code. While it was not fully fleshed-out, and there remain many necessary refinements, I would expect that a finished version would be perhaps 3000 lines of code.<\/p>\n<p>The <strong>XPath-in-Content-Controls<\/strong> prototype that I am introducing in this post is even smaller. It is less than 240 lines of code. It is simpler, more robust, and more amenable to polishing. I expect that the finished example, including integration into a document-level add-in for Word 2010 will be less than 1000 lines of code. I\u2019ll be posting V1 of the prototype with the next post in this series.<\/p>\n<h1 id=\"xmldoc\">Driven from an XML Document<\/h1>\n<p>One of the nice things about the <strong>C#-in-Content-Controls<\/strong> approach is that you could drive the document generation process from literally any data you could get your hands on from the .NET framework. In contrast, with this approach, there is one and only one form of data source, which is an XML document. And in this first prototype, I am restricting the data to an XML document that contains XML in no namespace. Allowing for namespaces in the XML means that I would need to provide mapping between namespaces and namespace prefixes, and that would get in the way of discussing the architecture and merits of this approach. I\u2019ll deal with this in the future.<\/p>\n<p>In the meantime, if you have XML that uses namespaces (or any other variety of data sources), your first task is to transform that data source to XML in no namespace.<\/p>\n<p>The XML document should look something like this:<br \/>\n<code><br \/>\n&lt;Customers&gt;<br \/>\n&nbsp;&nbsp;&lt;Customer&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&lt;CustomerID&gt;1&lt;\/CustomerID&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&lt;Name&gt;Andrew&lt;\/Name&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&lt;HighValueCustomer&gt;True&lt;\/HighValueCustomer&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&lt;Orders&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;Order&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;ProductDescription&gt;Bike&lt;\/ProductDescription&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;Quantity&gt;2&lt;\/Quantity&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;OrderDate&gt;5\/1\/2002&lt;\/OrderDate&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;\/Order&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;Order&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;ProductDescription&gt;Sleigh&lt;\/ProductDescription&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;Quantity&gt;2&lt;\/Quantity&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;OrderDate&gt;11\/1\/2000&lt;\/OrderDate&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;\/Order&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;Order&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;ProductDescription&gt;Plane&lt;\/ProductDescription&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;Quantity&gt;2&lt;\/Quantity&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;OrderDate&gt;2\/19\/2000&lt;\/OrderDate&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;\/Order&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&lt;\/Orders&gt;<br \/>\n&nbsp;&nbsp;&lt;\/Customer&gt;<br \/>\n&nbsp;&nbsp;&lt;Customer&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&lt;CustomerID&gt;2&lt;\/CustomerID&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&lt;Name&gt;Bob&lt;\/Name&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&lt;HighValueCustomer&gt;False&lt;\/HighValueCustomer&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&lt;Orders&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;Order&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;ProductDescription&gt;Boat&lt;\/ProductDescription&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;Quantity&gt;2&lt;\/Quantity&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;OrderDate&gt;8\/9\/2000&lt;\/OrderDate&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;\/Order&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;Order&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;ProductDescription&gt;Boat&lt;\/ProductDescription&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;Quantity&gt;4&lt;\/Quantity&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;OrderDate&gt;3\/25\/2001&lt;\/OrderDate&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;\/Order&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;Order&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;ProductDescription&gt;Bike&lt;\/ProductDescription&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;Quantity&gt;1&lt;\/Quantity&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;OrderDate&gt;6\/5\/2002&lt;\/OrderDate&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;\/Order&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&lt;\/Orders&gt;<br \/>\n&nbsp;&nbsp;&lt;\/Customer&gt;<br \/>\n&nbsp;&nbsp;&lt;Customer&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&lt;CustomerID&gt;3&lt;\/CustomerID&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&lt;Name&gt;Celcin&lt;\/Name&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&lt;HighValueCustomer&gt;False&lt;\/HighValueCustomer&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&lt;Orders&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;Order&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;ProductDescription&gt;Bike&lt;\/ProductDescription&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;Quantity&gt;2&lt;\/Quantity&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;OrderDate&gt;2\/24\/2001&lt;\/OrderDate&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;\/Order&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;Order&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;ProductDescription&gt;Boat&lt;\/ProductDescription&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;Quantity&gt;4&lt;\/Quantity&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;OrderDate&gt;5\/6\/2001&lt;\/OrderDate&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;\/Order&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&lt;\/Orders&gt;<br \/>\n&nbsp;&nbsp;&lt;\/Customer&gt;<br \/>\n&lt;\/Customers&gt;<br \/>\n<\/code><\/p>\n<p>While it isn\u2019t required, it is more convenient to use a form where the <strong>Orders <\/strong>element is a child of the <strong>Customer<\/strong> element. The reason for this will become clear.<\/p>\n<h1>The XPath-in-Content-Controls Template Document<\/h1>\n<p>The next step in introducing this approach is to take a look at the template document that will drive document generation. While looking at this template, you can compare and contrast it to the template that contains C# code in content controls.<\/p>\n<p>In this template document, I am going to borrow some nomenclature from XSLT. One of the attributes of the <strong>xsl:apply-templates<\/strong> element is the <strong>select<\/strong> attribute. If you place an XPath expression in the optional <strong>select<\/strong> attribute, XSLT will apply templates to the set of nodes that are selected by the XPath expression. The XPath expression is applied relative to the current context of the node that is currently being transformed by the sequence constructor. I am going to use a very similar approach in the template document. In effect, I am going to turn an Open XML WordprocessingML document into something that is analogous to an XSLT style sheet. Don\u2019t worry if this is not immediately clear. It will be before the end of this blog post series. The point of this paragraph is that I\u2019m going to use the term <strong>Select<\/strong> to indicate an XPath expression that will be evaluated, and the results of the evaluation will become the current context for other operations.<\/p>\n<p>As usual, I am going to show content controls in design mode. Here is the template document, in its entirety. Of course, the circles and arrows are added by me to aid in explanation.<\/p>\n<p><a href=\"https:\/\/www.ericwhite.com\/blog\/wp-content\/uploads\/2011\/03\/image.png\"><img loading=\"lazy\" decoding=\"async\" style=\"background-image: none; padding-left: 0px; padding-right: 0px; display: inline; padding-top: 0px; border: 0px;\" title=\"image\" src=\"https:\/\/www.ericwhite.com\/blog\/wp-content\/uploads\/2011\/03\/image_thumb.png\" border=\"0\" alt=\"image\" width=\"600\" height=\"951\" \/><\/a><\/p>\n<h2>The Config Content Control (*1)<\/h2>\n<p>Starting at the bottom of the document, there is the <strong>Config<\/strong> content control, which contains XML, with a root element of <strong>Config<\/strong>.<\/p>\n<p>The <strong>DataFileName<\/strong> element specifies the source XML document that contains the data that drives the document generation process.<\/p>\n<p>The <strong>SelectDocuments<\/strong> element specifies an XPath expression that when evaluated against the root element of the document returns a collection of elements, each of which represent a document to be generated. In the case of the XML data file that I presented earlier, the XPath expression \u201c.\/Customer\u201d returns a collection of the <strong>Customer<\/strong> child elements of the root <strong>Customers<\/strong> element. Given that source data file, the document generation process will generate three documents.<\/p>\n<p>The <strong>DocumentGenerationInfo<\/strong> element, and its child elements contains the necessary information to control the actual physical generation of the documents \u2013 the directory where the documents will be placed, a .NET <strong>StringFormat<\/strong> that works in conjunction with the <strong>SelectDocumentName<\/strong> XPath expression to assemble the generated <strong>FileName<\/strong>.<\/p>\n<p>As an aside, I initially played around with nested content controls instead of having a single content control that contains XML. While this approach works, maintaining nested content controls using the Word 2007 or Word 2010 user interface is idiosyncratic. I could write a pretty detailed bug report around the maintainability of nested content controls. Maintaining the XML in a single content control is a more satisfactory approach.<\/p>\n<h2>The SelectValue Content Control (*2)<\/h2>\n<p>At the top of the template document, you can see the <strong>SelectValue<\/strong> content controls. As mentioned in the last section, the <strong>SelectDocuments<\/strong> XPath expression selects multiple <strong>Customer<\/strong> elements. While generating each document in turn, each <strong>Customer<\/strong> element becomes the current context. The <strong>SelectValue<\/strong> XPath expression is then evaluated in the context of each <strong>Customer<\/strong> element in turn. One of the circled <strong>SelectValue<\/strong> XPath expressions selects the <strong>Name<\/strong> child element of the <strong>Customer<\/strong> element. The other circled <strong>SelectValue<\/strong> XPath expression selects the <strong>CustomerID<\/strong> child element of the <strong>Customer<\/strong> element. In XML, the value of an element is defined to be the concatenated descendant text nodes (in other words, its textual content). The document generation engine retrieves the value of the selected element and replaces the content control with the value.<\/p>\n<h2>The Table Content Control (*3)<\/h2>\n<p>Just as the <strong>SelectValue<\/strong> content control is evaluated in the context of a <strong>Customer<\/strong> element, the <strong>SelectRows<\/strong> content control is also evaluated in the context of a <strong>Customer<\/strong> element. The difference is that <strong>SelectValue<\/strong> is expected to select a single element, whereas the <strong>SelectRows<\/strong> expression is expected to select a collection of elements, one for each row in the table. For customer #1 (Andrew), the <strong>SelectRows<\/strong> XPath expression selects three <strong>Customer<\/strong> elements. The XPath expressions (pointed to by *4) stored in the prototype row (the second row in the table) are evaluated in the context of each row selected by the <strong>SelectRows<\/strong> expression.<\/p>\n<p>You also often see a similar pattern in properly written XSLT style sheets. One template is evaluated in the context of the root element, which selects a set of elements. An <strong>xsl:apply-templates<\/strong> causes an XPath expression to be evaluated in the context of each element selected by the first template. And an <strong>xsl:apply-templates<\/strong> in the sequence constructor of the second template causes an XPath expression to be evaluated in the context of each element selected by the second template, thereby causing a third set of templates to be applied.<\/p>\n<p>Once you are familiar with this approach (sometimes called the \u2018pull\u2019 approach), you never write XSLT style sheets in any other way. Inexperienced XSLT developers sometimes try to write style sheets by using loops and calling templates explicitly, instead of letting the pattern matching power of XSLT to do the heavy lifting. This incorrect approach is sometimes called the \u2018push\u2019 approach.<\/p>\n<p>To summarize, the <strong>SelectDocuments<\/strong> expression selects multiple elements, one for each document. The <strong>SelectRows<\/strong> expression, evaluated in the context of the elements selected by <strong>SelectDocuments<\/strong>, selects multiple elements, one for each row. The XPath expressions in the prototype row are evaluated in the context of the row elements selected by <strong>SelectRows<\/strong>.<\/p>\n<h2>The Conditional Content Control (*5)<\/h2>\n<p>The conditional content control works in exactly the same way as <strong>SelectValue<\/strong> and <strong>SelectRows<\/strong>. The <strong>SelectTestValue<\/strong> expression is evaluated in the context of the <strong>Customer<\/strong> element. The retrieved value is compared to the contents of the <strong>Match<\/strong> content control. If there is a match, the <strong>Conditional<\/strong> content control is replaced by the contents of the <strong>Content<\/strong> content control in the generated document.<\/p>\n<h1>Advantages of the XPath-in-Content-Controls Approach<\/h1>\n<p>There are several advantages to the <strong>XPath-in-Content-Controls<\/strong> approach over the <strong>C#-in-Content-Controls<\/strong> approach:<\/p>\n<ul>\n<li>We eliminate the two-step process for generating documents. The program that processes the template (and processes all of the XPath expressions in the template) does the actual document generation. We don\u2019t need to generate code, and then compile and run the generated code.<\/li>\n<li>We can catch errors in the XPath expressions, and supply the template designer with good error messages that indicate the specific XPath expression that contains the error.<\/li>\n<li>We eliminate all of the issues associated with typing C# code into content controls. When entering C# code in Word, of course there is no Intellisense. It could be difficult to catch errors in the C# code. The issues associated with replacing single or double quotes with smart quotes is significantly reduced. Note that the issues around quotes is not entirely eliminated. There are circumstances where the template designer may need to use single or double quotes in XPath expressions.<\/li>\n<\/ul>\n<p>In the next post, I\u2019ll show a video of this approach in action.<\/p>\n<p>Future posts:<\/p>\n<ul>\n<li>Show this approach at scale<\/li>\n<li>Review XPath semantics of LINQ to XML<\/li>\n<li>Examine the issues around namespaces in the source XML document<\/li>\n<li>Show the process of changing the schema<\/li>\n<li>Add robustness and error handling<\/li>\n<li>Integrate as a document-level managed add-in for Word 2010.<\/li>\n<\/ul>\n<p>This is fun!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduces the approach of configuring the document generation process by entering XPath expressions in content controls in a template document.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"_s2mail":"","footnotes":""},"categories":[7,3,5],"tags":[],"class_list":["post-321","post","type-post","status-publish","format-standard","hentry","category-document-generation-series","category-open-xml","category-wordprocessingml"],"_links":{"self":[{"href":"https:\/\/www.ericwhite.com\/blog\/wp-json\/wp\/v2\/posts\/321","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.ericwhite.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.ericwhite.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.ericwhite.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.ericwhite.com\/blog\/wp-json\/wp\/v2\/comments?post=321"}],"version-history":[{"count":7,"href":"https:\/\/www.ericwhite.com\/blog\/wp-json\/wp\/v2\/posts\/321\/revisions"}],"predecessor-version":[{"id":345,"href":"https:\/\/www.ericwhite.com\/blog\/wp-json\/wp\/v2\/posts\/321\/revisions\/345"}],"wp:attachment":[{"href":"https:\/\/www.ericwhite.com\/blog\/wp-json\/wp\/v2\/media?parent=321"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.ericwhite.com\/blog\/wp-json\/wp\/v2\/categories?post=321"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.ericwhite.com\/blog\/wp-json\/wp\/v2\/tags?post=321"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}