{"id":1866,"date":"2016-02-03T09:59:05","date_gmt":"2016-02-03T09:59:05","guid":{"rendered":"http:\/\/www.ericwhite.com\/home2\/bm8qcmjy\/public_html\/blog\/?page_id=1866"},"modified":"2016-03-18T10:39:46","modified_gmt":"2016-03-18T10:39:46","slug":"new-version-of-documentbuilder","status":"publish","type":"page","link":"https:\/\/www.ericwhite.com\/blog\/new-version-of-documentbuilder\/","title":{"rendered":"New Version of DocumentBuilder"},"content":{"rendered":"<p><span class=\"Back\"><a class=\"Back\" href=\"https:\/\/www.ericwhite.com\/blog\/documentbuilder-developer-center\/\">Return to the<br \/>DocumentBuilder<br \/>Developer Center<\/a><\/span>At long last, I have completed and posted a <b><i>much requested<\/i><\/b> and <b><i>sorely needed<\/i><\/b> update to <b>DocumentBuilder<\/b>.&nbsp; <b>DocumentBuilder<\/b> is code that is part of the <a href=\"https:\/\/github.com\/officedev\/open-xml-powertools\">Open-Xml-PowerTools<\/a> project that handles issues of interrelated markup, enabling you to generate new documents from existing documents in a variety of ways. &nbsp;For example, you can assemble a new document from just the first few paragraphs of an existing document. &nbsp;Another example is that you can effectively delete portions of a document by importing the portion of the document before the part you want deleted, and importing the portion after the part you want deleted.&nbsp; The resulting document contains merged styles, comments, fonts, bookmarks, endnotes and footnotes, and so on.&nbsp; If an image is not imported from a source document, then it is not included in the generated document, which is appropriate. &nbsp;There are a number of ways to pick specific portions of content of WordprocessingML documents and assemble them into a new document.<\/p>\n<p>To download the new version of <b>DocumentBuilder<\/b>, clone or fork the repo at <a href=\"https:\/\/github.com\/OfficeDev\/Open-Xml-PowerTools\">Open-Xml-PowerTools<\/a>.<\/p>\n<p>The primary difference is that this version is MUCH more robust.&nbsp; It handles many, many cases that the original <b>DocumentBuilder<\/b> did not, including images in headings, smart art everywhere, images in smart art, external relationships, and on and on.&nbsp; It works properly if you import the same content more than once.&nbsp; I completely ripped the old version apart and reassembled it using a recursive approach that handles many cases of related parts in a much more generic approach.<\/p>\n<p>In the near future, I&#8217;ll be providing new documentation about <b>DocumentBuilder<\/b> here on OpenXMLDeveloper.org.&nbsp; I&#8217;ll post lots of information about the various use-cases, as well as a number of examples and sample documents that show the various ways that you can use <b>DocumentBuilder<\/b> to assemble documents.<\/p>\n<h1>Broken Backwards Compatibility<\/h1>\n<p>One key point to mention here is that I made a small adjustment to the programming interface to document builder that breaks existing programs.&nbsp; The new API is very similar to the old <b>DocumentBuilder<\/b> programming interface, and it is slightly easier to use.&nbsp; However, ease of use is not the reason I broke backwards compatibility.&nbsp; I did it because it was absolutely necessary.&nbsp; Here is why:<\/p>\n<p>In the programming interface to the old <b>DocumentBuilder<\/b> class, you created a <b>List&lt;Source&gt;<\/b> that contained a list of sources for the document to be built.&nbsp; Each <b>Source<\/b> object contained an open <b>WordprocessingDocument<\/b> object, which is of course the class in the Open XML SDK that you use to access and manipulate word-processing documents.&nbsp; There-in lies the rub.&nbsp; There is no way using the Open XML SDK to clone an open <b>WordprocessingDocument<\/b> object, but to enable the case where you want to import content from a single document more than once, it was highly desirable to be able to clone the document for each import.&nbsp; Further, <b>WordprocessingDocument<\/b> implements <b>IDisposable<\/b>, which really complicates the code.&nbsp; You must either use the <b>using <\/b>construct of C#, or you must explicitly dispose of those objects when done with them, raising the possibility of bugs where documents were not disposed of properly.<\/p>\n<p>Instead of using open <b>WordprocessingDocument<\/b> objects to specify sources, I decided that it was much more convenient to simply pass byte arrays around.&nbsp; It is super-easy to open a <b>WordprocessingDocument<\/b> from a byte array.&nbsp; It is also easy to clone, easy to serialize to disk, and easy to serialize to a SharePoint document library.&nbsp; To make it even easier to work with these byte arrays, I have defined a small class, <b>WmlDocument<\/b>, which encapsulates the small bits of functionality that you want around these byte arrays.&nbsp; The entire definition of the <b>WmlDocument<\/b> class looks like this:<\/p>\n<p>&nbsp;<\/p>\n<pre class=\"prettyprint\">public class WmlDocument\r\n{\r\n    public byte[] RawDocument { get; set; }\r\n \r\n    public WordprocessingDocument GetWordprocessingDocument()\r\n    {\r\n        MemoryStream mem = new MemoryStream();\r\n        mem.Write(RawDocument, 0, RawDocument.Length);\r\n        WordprocessingDocument doc = WordprocessingDocument.Open(mem, true);\r\n        return doc;\r\n    }\r\n \r\n    public WmlDocument(WmlDocument original)\r\n    {\r\n        RawDocument = new byte[original.RawDocument.Length];\r\n        Array.Copy(original.RawDocument, RawDocument, original.RawDocument.Length);\r\n    }\r\n \r\n    public WmlDocument(string fileName)\r\n    {\r\n        RawDocument = File.ReadAllBytes(fileName);\r\n    }\r\n \r\n    public WmlDocument(byte[] byteArray)\r\n    {\r\n        RawDocument = new byte[byteArray.Length];\r\n        Array.Copy(byteArray, RawDocument, byteArray.Length);\r\n    }\r\n \r\n    public void Save(string fileName)\r\n    {\r\n        File.WriteAllBytes(fileName, RawDocument);\r\n    }\r\n}<\/pre>\n<p>&nbsp;<\/p>\n<p>Now that you&#8217;ve seen the <b>WmlDocument<\/b> class, you can see the new code to use <b>DocumentBuilder<\/b>.&nbsp; The following example code (taken from the example that is delivered with the new <b>DocumentBuilder<\/b> class) shows five cases:<\/p>\n<ul>\n<li>Generate a new document that contains just a sub-document consisting of 10 paragraphs, starting at paragraph 5.<\/li>\n<li>&#8216;Delete&#8217; a range of a document by importing the same WmlDocument twice, specifying to ranges that leave out a chunk of content between them.<\/li>\n<li>Concatenate two documents, using the section info (headers and footers) from the first document.<\/li>\n<li>Concatenate two documents, using the section info from the second document.<\/li>\n<li>Generate a new document from the first five paragraphs of one document, as well as the first five paragraphs of the second document.&nbsp; In this case, DocumentBuilder.BuildDocument returns a WmlObject, which you can serialize to wherever you need to.<\/li>\n<\/ul>\n<p>You can see that the code to set up a list of sources and then fire off the <b>DocumentBuilder<\/b> is very similar to code for V1 of this class.<\/p>\n<p>&nbsp;<\/p>\n<pre class=\"prettyprint\">string source1 = \"..\/..\/Source1.docx\";\r\nstring source2 = \"..\/..\/Source2.docx\";\r\nstring source3 = \"..\/..\/Source3.docx\";\r\nList&lt;Source&gt; sources = null;\r\n \r\n\/\/ Create new document from 10 paragraphs starting at paragraph 5 of Source1.docx\r\nsources = new List&lt;Source&gt;()\r\n{\r\n    new Source(new WmlDocument(source1), 5, 10, true),\r\n};\r\nDocumentBuilder.BuildDocument(sources, \"Out1.docx\");\r\n \r\n\/\/ Create new document from paragraph 1, and paragraphs 5 through end of Source3.docx.\r\n\/\/ This effectively 'deletes' paragraphs 2-4\r\nsources = new List&lt;Source&gt;()\r\n{\r\n    new Source(new WmlDocument(source3), 0, 1, false),\r\n    new Source(new WmlDocument(source3), 4, false),\r\n};\r\nDocumentBuilder.BuildDocument(sources, \"Out2.docx\");\r\n \r\n\/\/ Create a new document that consists of the entirety of Source1.docx and Source2.docx.  Use\r\n\/\/ the section information (headings and footers) from source1.\r\nsources = new List&lt;Source&gt;()\r\n{\r\n    new Source(new WmlDocument(source1), true),\r\n    new Source(new WmlDocument(source2), false),\r\n};\r\nDocumentBuilder.BuildDocument(sources, \"Out3.docx\");\r\n \r\n\/\/ Create a new document that consists of the entirety of Source1.docx and Source2.docx.  Use\r\n\/\/ the section information (headings and footers) from source2.\r\nsources = new List&lt;Source&gt;()\r\n{\r\n    new Source(new WmlDocument(source1), false),\r\n    new Source(new WmlDocument(source2), true),\r\n};\r\nDocumentBuilder.BuildDocument(sources, \"Out4.docx\");\r\n \r\n\/\/ Create a new document that consists of the first 5 paragraphs of Source1.docx and the first\r\n\/\/ five paragraphs of Source2.docx.  This example returns a new WmlDocument, when you then can\r\n\/\/ serialize to a SharePoint document library, or use in some other interesting scenario.\r\nsources = new List&lt;Source&gt;()\r\n{\r\n    new Source(new WmlDocument(source1), 0, 5, false),\r\n    new Source(new WmlDocument(source2), 0, 5, true),\r\n};\r\nWmlDocument out5 = DocumentBuilder.BuildDocument(sources);\r\nout5.Save(\"Out5.docx\");  \/\/ save it to the file system, but we could just as easily\r\n                            \/\/ done something else with it.\r\n<\/pre>\n<p>&nbsp;<\/p>\n<h1>Change in the Namespace<o:p><\/o:p><\/h1>\n<p>In the original version of <b>DocumentBuilder<\/b>, I placed it in the <b>OpenXml.PowerTools<\/b> namespace (notice the decimal between <b>OpenXml<\/b> and <b>PowerTools<\/b>).&nbsp; Then subsequently when releasing the <b>RevisionAccepter<\/b> class and the <b>MarkupSimplifier<\/b> class, I placed then in the <b>OpenXmlPowerTools<\/b> namespace.&nbsp; This was an inadvertency.&nbsp; In any case, the best namespace is <b>OpenXmlPowerTools<\/b> (without the decimal point between <b>OpenXML<\/b> and <b>PowerTools<\/b>, so now all modules in the <b>PowerTools for Open XML<\/b> have the namespace of <b>OpenXmlPowerTools<\/b>.<\/p>\n<h1>Moving into the Future<\/h1>\n<p>One of my main goals over the next few months is to rationalize the various pieces of code in the <b>PowerTools for Open XML<\/b>.&nbsp; I want to make the C# code be more consistent.&nbsp; I want to make it more robust, and I think that the new approach of using a byte array helps with this goal. &nbsp;And if the winds are favorable in our direction, we&#8217;ll have a new release of <b>PowerTools for Open XML<\/b> sometime in the near future.<\/p>\n<p>Please continue to give me feedback on this new version.&nbsp; Your feedback on the original version of DocumentBuilder was instrumental in helping me decide the direction to take.<\/p>\n<p>-Eric<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Return to theDocumentBuilderDeveloper CenterAt long last, I have completed and posted a much requested and sorely needed update to DocumentBuilder.&nbsp; DocumentBuilder is code that is part of the Open-Xml-PowerTools project that handles issues of interrelated markup, enabling you to generate new documents from existing documents in a variety of ways. &nbsp;For example, you can assemble [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"_s2mail":"","footnotes":""},"class_list":["post-1866","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/www.ericwhite.com\/blog\/wp-json\/wp\/v2\/pages\/1866","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.ericwhite.com\/blog\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.ericwhite.com\/blog\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.ericwhite.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.ericwhite.com\/blog\/wp-json\/wp\/v2\/comments?post=1866"}],"version-history":[{"count":5,"href":"https:\/\/www.ericwhite.com\/blog\/wp-json\/wp\/v2\/pages\/1866\/revisions"}],"predecessor-version":[{"id":2643,"href":"https:\/\/www.ericwhite.com\/blog\/wp-json\/wp\/v2\/pages\/1866\/revisions\/2643"}],"wp:attachment":[{"href":"https:\/\/www.ericwhite.com\/blog\/wp-json\/wp\/v2\/media?parent=1866"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}