The markup for fields in WordprocessingML documents is fairly involved. The markup often spans multiple runs, and contains markers for the start of the field, a marker that separates the field instructional text and the results of the field, and a marker for the end of the field. To further complicate matters, fields can be nested. This post presents some code to accurately parse field markup.

I’ve recorded two screen-casts that explain field markup in detail. To understand the code that I attached to this article, please watch these screen-casts:

Video: Open XML WordprocessingML Fields and Hyper-links Deep Dive

Video: Open XML WordprocessingML Fields and Hyper-links Deep Dive – Part 2

As mentioned in the videos, one of the problems around correctly processing fields is that fields can be nested. Consider one field that contains another nested field. When assembling the list of XML elements for the containing field, that list will of necessity contain elements in the nested field. In effect, the XML elements in the nested field ‘belong’ to the nested field *and* to the containing field. The code that I present in this article (attached to this blog post) correctly processes nested fields.

One caveat about this code – it does not attempt to process fields that have tracked revisions. It contains code that restricts querying of fields to documents that do not contain tracked revisions. If you need to query documents that contain tracked revisions, then you can first accept revisions in-memory, and then query for fields.

The following document contains some fields. Before taking the screen-clipping, I pressed ALT-F9 so that Word displayed the field code instead of the results of the field.


The code to query a document for fields is pretty simple. To query for the fields in a document, you use an extension method implemented on OpenXmlPart named Fields().

static void Main(string[] args)
{
    using (WordprocessingDocument doc =
        WordprocessingDocument.Open(“Test.docx”, false))
    {
        foreach (var f in doc.MainDocumentPart.Fields())
            Console.WriteLine(“Id: {0} InstrText: {1}”, f.Id, f.InstrText);
    }
}

If you query the above document, the output will be as follows:

Id: 0 InstrText: { HYPERLINK “http://www.ericwhite.com/blog” }
Id: 1 InstrText: { IF { DATE } = “4/29/2011” “It is April 29” “It is NOT April 29” }
Id: 2 InstrText: { DATE }

You can see that the example correctly shows that the IF field contains an instance of a DATE field.

Now that we have this code in place, it becomes pretty trivial to query a document and determine the list of hyperlinks in the document, regardless of whether the hyperlinks are represented in the simple form or in the complex form. That will be the subject of my next post in this series.

-Eric White
EricWhite.com/blog