Introducing WmlComparer, a Module in Open-Xml-PowerTools

WmlComparer compares two Open XML documents, producing a new document that contains the precise differences in revision tracking markup. You can then iterate through those differences.

This module contains my latest thinking on how to process large documents character-by-character. The module converts the DOCX into a new form – an array – where every item in the array is a single piece of content, whether it be a character in a paragraph, the paragraph mark itself, an image, and so on. This thinking will go into the next version of OpenXmlRegex, so that it will be possible to include new lines in the search string, as well as replace matched strings with content that contains new lines.

This module is extremely useful when writing XUnit tests for code that processes Open XML markup – we can test that the code produced the exact document that we intended.


Open-Xml-PowerTools and JavaScript

One issue that many Open XML developers face when doing Open XML development in JavaScript is the need for many of the important modules available in Open-Xml-PowerTools.

To date, almost none of the Open-Xml-PowerTools have been converted to JavaScript (in a publicly releasable form).

However, I designed the Open-Xml-Sdk for JavaScript with the specific intention to enable manual conversion of Open-Xml-PowerTools code from C# to JavaScript. The Ltxmljs library is semantically compatible with the .NET LINQ to XML library. The Open-Xml-Sdk for JavaScript was designed with the idea that the common idioms that we use in the Open-Xml-Sdk can be converted to JavaScript with ease.

Proof-in-the-pudding is that I converted thousands of lines of code (of a preliminary version of the WmlToHtmlConverter) to JavaScript. This included the FormattingAssembler module, the ListItemRetriever module, the HtmlConverter module, and more. It worked very nicely. However, because this was based on a half-baked WmlToHtmlConverter, I decided that I wanted to first polish those modules as written in C#, and then at some future point in time convert the finished modules to JavaScript. I fully expect to convert these modules to JavaScript (but it is not going to happen in the near future, as my current projects prohibit an effort of this magnitude).

Key points:

  • It is doable.
  • It is non-trivial. You should be expert in C#, HTML, CSS, JavaScript, LINQ to XML, functional programming, and Open XML markup. I don’t want to minimize what you must know in order to tackle conversion of one of these modules to JavaScript.

But it is doable. I have a goal to do this, but it must fit in with other priorities in my life, like keeping my kid and wife with groceries. 🙂

Cheers, Eric

Comments (13)

Welcome to the new

Hi friends,

With the recent decision to shutter, I will be focusing my efforts to support Open XML developers here at

Forums: I have set up forums on this web site, so that we have a convenient place to discuss Open XML specific questions.  I have set up forum categories based on my experience with – there are fewer categories, which should make it easier to pick the appropriate forum for your questions. Feel free to suggest new forum categories. Forums link are in the sidebar. You will need to have an account here to post in the forums. Please feel free to create one. At the top of each forum, you can subscribe, which will send email notifying you of activity.

Content: I have moved much of the content from to this web site. Over the next month or two, I’ll be moving the rest of the interesting content here.  In the process of moving this content over, I’ll be reviewing and updating it, making sure that it recommends current best practices for Open XML development.  In addition to moving the content over, I have created ‘Developer Centers’ around specific topics such as Open-Xml-PowerTools, DocumentAssembler, DocumentBuilder, PresentationBuilder, WmlToHtmlConverter, and HtmlToWmlConverter.

Blogging: I will be regularly blogging about my Open XML activities here.

Twitter: Follow me (@EricWhiteDev) on twitter to get the latest news about Open XML development and tools.

LinkedIn: Connect, especially if you are an Open XML developer.

I will continue to enhance and support Open-Xml-PowerTools – it is a ‘live project’, with interesting enhancements planned for the near future.  I consider that Open-Xml-PowerTools is perhaps 25% complete – there is an awful lot that we can do to make server-side Open XML document generation and processing easier.  My plans are firming up for the next round of enhancements – stay tuned – I’ll keep you posted.

We have done nowhere near enough with the Open-Xml-Sdk-for-JavaScript.  There are great advantages to processing Open XML with JavaScript – cross-platform – process in browser – process with NodeJs.  And should take advantage of TypeScript, I think.

I love document formats.  I love the capacity to create and process rich content that we have gained due to the standardization of the Open XML format.  We have made great progress over the years, and I am looking forward to continuing to advance the state-of-the-art of document generation and processing.

As an aside, one thing that I am going to enjoy a lot is moving off of the old version of Telligent Community Server that we were using for – what a pain. is a WordPress site, updated to the latest version. It is a pleasure to move into a modern, blogging / content management platform.  The forums here are implemented using bbPress, which gives us a flexible, easy to use platform.

Cheers, Eric

Comments (11)

Announcing Support for RTL in the HtmlConverter.cs Module

Hi all,

Today I am announcing support for RTL languages in the HtmlConverter module that is part of PowerTools for Open XML.


Please give it a try and let me know how it goes…

Cheers, Eric


Comments (6)

PowerTools for Open XML Road Map, April 2014

I recently completed a new version of ListItemRetriever.cs – which is a super important module in PowerTools for Open XML, although it operates mostly behind the scenes.  This module is responsible for translating the various pieces of markup for numbered and bulleted lists into the actual text that HtmlConverter.cs will place in the generated HTML.  It was a test of my patience – I first patched the old version, then I re-wrote it, and then I threw it all out and re-wrote it again.  I am finally happy with it.

Now that this module is completed, it is time to jump back into some serious coding for the high-fidelity HtmlConverter.cs module.  My next goal is to complete Right-To-Left languages, and East Asian languages.

I always try to do the hard stuff first, and because of my unfamiliarity with the Open XML markup that I need to parse, this is somewhat hard.  Also, because I don’t read any RTL or East Asian languages, I have to do this by pattern matching.  Sure would be easier if I could read them…  🙂

I don’t have a good idea of how long it will take.

Some time ago I wrote a crude program that uses search engines to find Open XML documents on the web and download them.  I have a pretty large collection of them – in general, my work will be to run HtmlConverter.cs on these documents and manually compare the docs in Word with the converted HTML in a browser.  Fun.

But where I’m going – I want to have this high-fidelity conversion from DOCX to HTML in really good shape in the next 2-3 months.

Following that, I want to re-write the portions of PowerTools that we use from PowerShell.  I want to re-write all of the cmdlets using the PowerShell language, not using C#.  After re-writing the cmdlets, I believe that the process to install and use the cmdlets will be a matter of dropping some files in a specific place.  It also will make it much easier for users of PowerTools to build new cmdlets, and to modify the existing cmdlets.

This is my vision for PowerTools for Open XML 3.0.

Cheers, Eric

Comments (4)

New Focus for this Blog

This blog will have a new focus.  Up til now, the blog has been pretty much only on Open XML.  However, now I am going to go much wider afield.  Over the years I’ve gained some experience and insight into certain aspects of being a developer, and I’m going to start blogging about them – things like:

  • How I use PowerShell as a developer
  • My own approach to coverage testing, and why it is different and better
  • My approach to learning new programming languages quickly
  • Using screen-casts for intra-team communication – and how to create screen-casts super quickly
  • How I currently use the cloud as a development tool

I’m also going to discuss some of the more philosophical aspects of being a developer – I’ll even discuss what I did before I worked at Microsoft, and how and why I ended up doing what I’m doing.

And I’m certain I’ll discuss things that I haven’t contemplated yet.

This is going to be fun.

Comments (2)

Whew! Comments are working again on this blog!

First of all, I apologize for neglecting this blog for so long.  I’ve been super busy doing stuff at, and after interacting there, haven’t had energy for blogging here.  But I’m turning over a new leaf – have some great non-Open-XML content to post here (as well as some Open XML content).

Also, as many of you have noticed, comments have not been working on this blog.  After learning far more than I wanted to about PHP and MySQL, I finally determined that I had a corrupted comments table in MySQL – I installed phpMyAdmin, and then was able to repair the table, and presto, comments started working again!

I will be working on responding to comments (even old ones) here over the next while.  If you leave new comments, I’ll respond to those first, of course!

Comments (5)

Screen-cast series on writing Recursive Pure Functional Transformations

On, I am publishing a screen-cast series on writing recursive pure functional transforms, which is a powerful programming technique for transforming document-centric XML (such as Open XML WordprocessingML, XHTML, or ODT) into other forms of XML. I have posted four screen-casts in this series. I anticipate that there will be about 20 screen-casts before all is said and done. You can find the complete list here.

Comments (9)

LINQ to XML for JavaScript – Gaining Perf thru Atomization

LINQ to XML for JavaScript uses the same approach for good perf as LINQ to XML for .NET – atomization. Read more:


LINQ to XML for JavaScript

I’ve been hard at work in my spare time for months now developing a small JavaScript library that enables you to write code in the style of LINQ to XML. I developed this because I needed to implement some interesting functionality in JavaScript, and I already basically had the code written in C#, so writing this library enabled me to translate that C# code to JavaScript with a minimum of fuss and trouble. ltxml.js – LINQ to XML for JavaScript


« Previous entries Next Page » Next Page »