Changing the Schema for this Open XML Document Generation System

Flexibility in a document generation system is very important to its usability.  We all know how it works.  You’ve been commissioned by the marketing department to put together a mailing to 50,000 customers.  After doing the work of putting together the template document, the marketing department *will* come ask for changes to the data and to the template document.  In the following screen cast, I show the process of adjusting the XML data that drives the document generation system, as well as adjusting the template document to use that data.

This post is the 12th in a series of blog posts on generating Open XML documents. Here is the complete list: Generating Open XML WordprocessingML Documents Blog Post Series

Shows changing the schema and the template document.

There are lots of disadvantages to this approach of editing C# code in content controls in a Word document:

  • It requires a developer to put together the template document.
  • If you write C# code that doesn’t compile in a content control, you don’t see any errors until you try to compile the generated program.
  • There is no Intellisense when editing this code.  In a couple of places, I ended up first getting a snippet of code to work in Visual Studio, and then pasting that code into the content control.  This is far from ideal.
  • The code that generates C# code from the template document is not long – only about 390 lines of code (see ProcessTemplate.cs in the Zip file).  However, it is a bit gnarly, particularly the bits that make it so that you can have Value, Table, or Conditional content controls within a Conditional content control.  However, the C# code that you write inside the template document is not so complex – just the code to generate the code.

There are advantages too:

  • The code is directly associated and stored with the document.  This is called ‘lexical proximity’ – you don’t need to find code in another file somewhere, and you don’t need to keep code and the template document in sync.
  • You can pull data from *any* data source.  I could easily modify the template document to use OData or the Managed Client Object Model to pull data from a SharePoint list.  I could also write some ADO.NET code to pull data from any SQL database.

It is not clear that the advantages outweigh the disadvantages.  In the next post in this series, I’m going to limit the data source to XML, and use XPath in the content controls.

Comments (1)

Data Warehouse Book Recommendations

Some time ago, while still at Microsoft, I was involved in a small business intelligence project.  Many, many years ago, I was a database application developer, but times have changed.  I had to get up to speed on how to build and use a data warehouse in a hurry.  Fortunately, one of my best friends is a data warehouse developer for a large insurance company, and knew exactly which books to study.  Recently, I was asked for book recommendations, so I’m passing along the books that Bob McClellan recommended to me.

The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling

The Microsoft Data Warehouse Toolkit: With SQL Server 2008 R2 and the Microsoft Business Intelligence Toolset

Comments

Getting Started with Open XML PowerTools Markup Simplifier

On OpenXmlDeveloper.org, in one of the forums, there is a thread about how to clean Word proofing errors clutter out of an Open XML WordprocessingML document.  In PowerTools, in the HtmlConverter project, there is a class called MarkupSimplifier, which can remove proofing errors.  In addition, it can simplify WordprocessingML markup in a variety of ways, including removal of comments, content controls, and etc.  The blog post, Enabling Better Transformations by Simplifying Open XML WordprocessingML Markup describes MarkupSimplifier in more detail.

Here is a small screen-cast that shows the use of MarkupSimplifier.  In the screen-cast, I use Open XML Package Editor Power Tool for Visual Studio 2010.

Walks through the process of downloading and compiling a sample for MarkupSimplifier.

Here is the listing of the small program that uses MarkupSimplifier

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using OpenXmlPowerTools;
using DocumentFormat.OpenXml.Packaging;

class Program
{
    static void Main(string[] args)
    {
        using (WordprocessingDocument doc =
            WordprocessingDocument.Open("Test.docx", true))
        {
            SimplifyMarkupSettings settings = new SimplifyMarkupSettings
            {
                RemoveComments = true,
                RemoveContentControls = true,
                RemoveEndAndFootNotes = true,
                RemoveFieldCodes = false,
                RemoveLastRenderedPageBreak = true,
                RemovePermissions = true,
                RemoveProof = true,
                RemoveRsidInfo = true,
                RemoveSmartTags = true,
                RemoveSoftHyphens = true,
                ReplaceTabsWithSpaces = true,
            };
            MarkupSimplifier.SimplifyMarkup(doc, settings);
        }
    }
}

Comments (17)

Release of V1 of Simple DOCX Generation System

I have completed a preliminary version of this simple DOCX generation system, which you can download, unzip, and try.  You can find the zip file that contains all necessary bits here.

This post is the eleventh in a series of blog posts on generating Open XML documents. Here is the complete list: Generating Open XML WordprocessingML Documents Blog Post Series

The following 90 second screen-cast shows how to run the doc gen system after you download and unzip the zip file.

Demonstrates minimum number of steps to run the Open XML WordprocessingML document generator system

The following 2 1/2 minute video shows using the document generation system at scale.  I show generating 3000 documents in under a minute.

Comments (3)

Video of use of Document Generation Example

I have completed a rough first version of this document generation system that is driven by C# code that you write in content controls in a Word document.  As an intro, I’ve recorded a small screen-cast that shows the doc gen system in action.

This post is the tenth in a series of blog posts on generating Open XML documents. Here is the complete list: Generating Open XML WordprocessingML Documents Blog Post Series

Demonstrates an approach to Open XML WordprocessingML document generation that uses C# code in content controls.

V1 of the code that enables this approach to document generation is less than 400 lines of code, so this counts as simply an example program.  This shows the value of using functional programming, meta programming, and Open XML to reduce program size.

I have to note at this point – the example program contains almost no error handling.  If you mistype code in the content controls, you will encounter interesting compiler errors after loading the generated program.  In the long run, I expect to resolve these issues in an interesting way.  While at this point, I’m just playing around with document generation ideas, in the future, I want to build a system that is easy and convenient for non-developers to use.

I plan on posting this code sometime early next week, as well as a video that explains in more detail how the doc gen system works.

Comments (5)

A Super-Simple Template System

In the last post, I explored Text Templates (T4), and determined that using T4 text templates for my code generation needs would add complexity and not yield sufficient ROI (although I did determine that a doc gen example using T4 is interesting in its own right). However, my exploration into T4 text templates yielded one important point, which is that delimiting blocks of code using <# and #> is a good approach. This post details my super-simple template system, which will be more than adequate for building this first version of a doc gen system.

This post is the ninth in a series of blog posts on generating Open XML documents. Here is the complete list: Generating Open XML WordprocessingML Documents Blog Post Series

For what it’s worth, I did a fair amount of reading of the C#, VB, and XML specs and determined to my own satisfaction that those combinations are fine. I could go on for about two pages, detailing exactly where the hash mark is allowed in all three languages, and why <# and #> are safe, but I’ll spare you the ordeal. In any case, these are the combinations that the T4 architects and program managers decided on, and I’m certain that an extraordinary amount of time was spent designing the T4 syntax.

I am going to make one more simplification, which is that in my super-simple template system, the <# must be the first two non-whitespace characters on a line, and that #> must be the last two non-whitespace characters. You will see that this makes the LINQ projection that processes the template very simple. Ultimately, this template system would be best implemented by defining a grammar and writing or using a real parser, but my main objective is to build a small example that enables us to explore document generation, so a shortcut is in order here.

To allow for further enhancements in the future, I’m going to specify that the contents will be a small XML document. While this makes the syntax a bit more verbose, we gain such advantages as XML schema validation, extensibility, and a familiar syntax. Following is an example of a template using this system. It contains two insertion blocks, one with the name of Using, and the other with the name of GeneratorMain:

<# <Insert Name="Using"/> #>

namespace GenDocs
{
    class Generator
    {
        static void Main(string[] args)
        {
            <# <Insert Name="GeneratorMain"/> #>
        }
    }
}

The Using insertion block might be replaced with this:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

The GeneratorMain insertion block could be replaced with this:

Console.WriteLine("Hello world");

Processing this template would then result in the following C# program:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace GenDocs
{
    class Generator
    {
        static void Main(string[] args)
        {
Console.WriteLine("Hello world");
        }
    }
}

The LINQ projection that processes this template, in its entirety, is:

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Xml.Linq;

class Program
{
    // Simulated method that returns the text of a tagged content control.
    static string GetTextFromContentControl(string tag)
    {
        if (tag == "Using")
            return
@"using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;";
        if (tag == "GeneratorMain")
            return @"Console.WriteLine(""Hello world"");";
        return "error";
    }

    static void Main(string[] args)
    {
        string[] templateCode = File.ReadAllLines("template.txt");
        var filledTemplate = templateCode
            .Select(l =>
            {
                string trimmed = l.Trim();
                if (trimmed.StartsWith("<#") && trimmed.EndsWith("#>"))
                {
                    XElement insert = XElement.Parse(trimmed.Substring(2, trimmed.Length - 4));
                    string tag = insert.Attribute("Name").Value;
                    return GetTextFromContentControl(tag);
                }
                else
                    return l;
            })
            .ToArray();
        File.WriteAllLines("GeneratedDocGenProgram.cs", filledTemplate);

        // Print out the template for demonstration purposes.
        Console.WriteLine(File.ReadAllText("GeneratedDocGenProgram.cs"));
    }
}

To run this example, create a new C# console application. Save the template as a file named template.txt in the bin directory, and then run it. The example produces the following output:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace GenDocs
{
    class Generator
    {
        static void Main(string[] args)
        {
Console.WriteLine("Hello world");
        }
    }
}

Comments (5)

Text Templates (T4) and the Code Generation Process

As I was contemplating the process of generating the C# code that will do the document generation, I was drawn to the idea of using text templates, also known as T4. Text templates are a .NET code generation technology. I have never used text templates before, so I spent a few hours researching them to see their applicability in the Open XML WordprocessingML document generation process. The short version of this post is that I have decided against using text templates in this particular iteration of a document generation system. However, text templates are very cool, and have applicability in the Open XML document generation process. This post details my notes and thoughts on text templates, and gives my reasons for deciding against using them, although I am going to steal some ideas from them.

This post is the eighth in a series of blog posts on generating Open XML documents. Here is the complete list: Generating Open XML WordprocessingML Documents Blog Post Series

Text templates are a very cool technology (introduced in VS2008, I believe) that makes it easy to generate code as part of the application build process in Visual Studio. In addition, you can use text templates to generate files at runtime. You code text templates in a way that is similar to coding ASP.NET pages. Some portions of a text template contain literal text that is copied verbatim to the generated file, while other portions (similar to code blocks and expression holes) contain C# or VB code that you can use to programmatically generate portions of the generated file. For example, the following text template generates a simple XML document:

<#@ template debug="false" hostspecific="true" language="C#" #>
<#@ output extension=".xml" #>
<#@ assembly name="System.Xml" #>
<#@ assembly name="System.Xml.Linq" #>
<#@ import namespace="System.Xml.Linq" #>

<#
XElement e = new XElement("ChildElement", "with some data");
#>

<Root>
  <#= e #>
</Root>

Here is the generated XML document:

<Root>
  <ChildElement>with some data</ChildElement>
</Root>

There is a fair amount to learn about text templates. The lines that start with <#@ are directives. The assembly directives tell the text template to link with the specified assemblies. The import directive serves the same purpose as the using directive in a C# program. The code between <# and #> is executed when the template is evaluated. The line <#= e => serves the same purpose as an expression hole in other similar technologies.

The principle reason that I am not going to use text templates in this current effort is that I am writing a pure functional transform from a WordprocessingML document to a bunch of pure functional C# code that will generate a number of WordprocessingML documents. I would not be using the most powerful feature of text templates, which are expression holes, at least with the intent with which they were designed. The text template then becomes simply a mechanism to combine some boiler-plate code with some code generated by the functional transform. Pulling in the additional complexity of text templates doesn’t pay.

That said, text templates are interesting in the domain of Open XML document generation. Instead of approaching the problem as I am in this current series of posts, the approach using text templates would be generating a Flat OPC document. You can use LINQ to XML handily in text templates. The development of a doc gen system using text templates becomes one of finding the interesting markup in the Flat OPC document and writing some expression holes to generate the variable parts of the document. This is definitely going on my list of blog posts to write in the near future. It will be an easy post to write – perhaps only an hour or two will be required to build a rudimentary doc gen system (one with quite different characteristics from the one I’m currently building).

There are a few more interesting points to note about text templates. The most powerful and common use of text templates is to facilitate code generation from within a Visual Studio project. The text template is evaluated whenever it is saved, and the resulting generated C# or VB source file is compiled whenever you build the project. This allows you to generate code as part of the editing process, and then use the generated code from other modules seamlessly. This is super-interesting, but not really relevant to the problem of document generation. Document generation should ultimately be in the hands of the domain experts – the marketing folks, the customer relationship departments, and whoever has industrial-strength document generation requirements. We don’t want to require Visual Studio for the design process.

You can generate text files at run-time by using a pre-processed text template. There are limitations on what you can do with this approach. Effectively, you can define additional properties for the generated class behind the template. You can then use those properties in a non-dynamic way in the generated file. This is somewhat interesting, but doesn’t justify the additional complexity.

One feature of text templates is that you can write a ‘Custom Host’ that allows you to kick off the transformation process programmatically. There is an interesting note in the topic Processing Text Templates by using a Custom Host:

We do not recommend using text template transformations in server applications. We do not recommend using text template transformations except in a single thread. This is because the text templating Engine re-uses a single AppDomain to translate, compile, and execute templates. The translated code is not designed to be thread-safe. The Engine is designed to process files serially, as they are in a Visual Studio project at design time.

It is probably possible to design a robust document generation system using text templates, but you would have to take care to avoid any complications related to the above warning. You would also want to do a lot of testing at scale.

For more info about text templates, see Code Generation and Text Templates (T4).

Comments (1)

Refinement: Generating C# code from an XML Tree using Virtual Extension Methods

I’ve made great progress on this example to generate WordprocessingML documents from a template that is driven from C# that is written inside of content controls.  It took me longer than I thought it would, but I’ve been mostly on vacation this past week, so had to fit in coding around lots of other activities.  Also, there is Hofstadter’s law to take into account.

This post is the seventh in a series of blog posts on generating Open XML documents. Here is the complete list: Generating Open XML WordprocessingML Documents Blog Post Series

Along the way, I determined that I needed to refine the approach for generating C# code from an XML tree.  There were two refinements:

  • I needed to use the @”” syntax for strings (called verbatim string literals).  This handles cases such as insignificant white space that contains new lines.  My document generation example doesn’t need this functionality, but I dislike incomplete solutions.
  • I added an approach for injecting arbitrary code into the code that creates the XML tree.  I need to inject the code that the developer types into content controls into the code that creates the XML tree.  My approach is that if the transform encounters an element with the fully qualified name of {http://www.ericwhite.com/xmlcodegeneration}Literal then the text contents of that element are directly injected into the code to create the XML tree.

The following code shows what I mean.  The code creates an XML tree that contains an element with the special namespace/name:

XNamespace ewx = "http://www.ericwhite.com/xmlcodegeneration";
XDocument root = new XDocument(
    new XElement("Root",
        new XElement("Child", 123),
        new XElement(ewx + "Literal",
          @"new XElement(""Data"", 123),  // injected code" + Environment.NewLine),
        new XElement("Child", 345)));
Console.WriteLine(LtxToCode.XDocumentToCode(root));

When you run this code, it generates the following code that contains directly injected code:

new XDocument(
  new XElement("Root",
    new XElement("Child",
      new XText(@"123")
    ),
new XElement("Data", 123),  // injected code
    new XElement("Child",
      new XText(@"345")
    )
  )
)

To avoid having two versions of the code posted on my blog, I’ve altered the previous post to contain the corrected code. In the next post, I’ll discuss pros and cons of text templates.

Comments

Simulating Virtual Extension Methods

When considering the problem of how to generate code that will create some arbitrary XML tree, it is interesting to examine the LINQ to XML class hierarchy, which uses polymorphism. The XObject class is an abstract base class of both the XAttribute and XNode classes. The XNode class is an abstract base class of XContainer, XComment, XDocumentType, XProcessingInstruction, and XText. XContainer is the base class for XElement and XDocument.

This post is the sixth in a series of blog posts. Here is the complete list: Generating Open XML WordprocessingML Documents Blog Post Series

The following diagram shows the class hierarchy of the LINQ to XML classes that are important when considering serializing a LINQ to XML tree as code. There are other classes in LINQ to XML, but they don’t impact this discussion.

Image1

Software systems such as LINQ to XML are typically written using a recursive approach. The XNode class has an abstract virtual method WriteTo. Each derived class must override the WriteTo method and provide its own method for serializing. When serializing an XElement object by calling WriteTo, internally, LINQ to XML calls the WriteTo method for each of the child nodes of the XElement object. This is the approach that we want to use when generating code to create an arbitrary XML tree.

What would be ideal is to write ‘Virtual Extension Methods’. There would be an abstract virtual extension method for XObject named ToCode. This extension method would be overridden in each of the concrete classes. Classes that include child objects, such as XElement and XDocument can then iterate through their children and use the virtual extension method to convert each node in the XML tree to the appropriate C# code to create the node.

However, C# does not include support for the idea of a virtual extension method. When you call an extension method, it is bound at compile time to an extension method based on the type of the variable, not based on the type of the object in the variable. To show exactly what I mean, consider the following snippet:

public static class MyExtensions
{
    public static string ToCode(this XObject o)
    {
        return "Called extension method on XObject";
    }

    public static string ToCode(this XElement e)
    {
        return "Called extension method on XElement";
    }
}

class Program
{
    static void Main(string[] args)
    {
        XElement e = new XElement("Root");
        Console.WriteLine(e.ToCode());
        XObject o = e;
        Console.WriteLine(o.ToCode());
    }
}

The above snippet calls the ToCode extension method twice for the same object. However, even though the type of the object is XElement, when the object is assigned to a variable with type XObject (which is valid because XElement is derived from XNode, which derives from XObject), and the program calls the ToCode extension method, the extension method on XElement is not called. Instead, the extension method on XObject is called.

Because the LINQ to XML programming interface is fairly simple, we can simulate virtual extension methods by implementing an extension method on XObject that simply dispatches to the appropriate extension method based on the actual type of the object. The following listing shows the implementation of the ToCode extension method for XObject.

public static string ToCode(this XObject xObject)
{
    XAttribute a = xObject as XAttribute;
    if (a != null)
        return a.ToCode();
    XElement element = xObject as XElement;
    if (element != null)
        return element.ToCode();
    XCData cdata = xObject as XCData;
    if (cdata != null)
        return cdata.ToCode();
    XText text = xObject as XText;
    if (text != null)
        return text.ToCode();
    XComment comment = xObject as XComment;
    if (comment != null)
        return comment.ToCode();
    XProcessingInstruction pi = xObject as XProcessingInstruction;
    if (pi != null)
        return pi.ToCode();
    throw new CodeGenerationException("Internal error");
}

Next, we can examine the ToCode method that is implemented on XAttribute. The code calls an Indentation method that determines the number of spaces to precede the ‘new XAttribute’. This enables the code to be properly indented, so that it is easy to examine the generated code. Other than this, it is pretty straightforward code to generate the code to new up an XAttribute object.

public static string ToCode(this XAttribute attribute)
{
    return Indentation(attribute) +
        String.Format("new XAttribute(\"{0}\", \"{1}\")," + Environment.NewLine,
        attribute.Name, attribute.Value);
}

As initially written, the generated code to create an XAttribute or XElement object includes appending the comma before the new line.  One minor issue that must be solved is that when passing a number of parameters to a function that takes a params array, a comma immediately before the closing parenthesis is invalid.

Image2

The approach I took to solve this problem is that after assembling all of the code to create the child nodes of an XElement object, the code calls a method to trim off the final comma.

public static string ToCode(this XElement element)
{
    var c = element.Attributes().Cast<XObject>().Concat(element.Nodes().Cast<XObject>());
    if (c.Count() == 0)
        return Indentation(element) +
            String.Format("new XElement(\"{0}\")," + Environment.NewLine, element.Name);
    else
        return Indentation(element) +
            String.Format("new XElement(\"{0}\"," + Environment.NewLine, element.Name) +
            TrimFinalComma(c.Select(n => n.ToCode()).StringConcatenate()) +
            Indentation(element) + ")," + Environment.NewLine;
}

TrimFinalComma looks like this:

private static string TrimFinalComma(string code)
{
    if (code.EndsWith("," + Environment.NewLine))
        return code.Substring(0, code.Length - ("," + Environment.NewLine).Length) +
            Environment.NewLine;
    return code;
}

You can see the rest of the ToCode extension methods in Generating C# code from an XML Tree using Virtual Extension Methods.

One Final Note

Some people will recognize the similarity in functionality between the Paste XML as LINQ sample and the code presented in this and the last post.  The code presented in the Paste XML as LINQ sample is imperative code that iterates through the nodes outputting code.  I need a completely different structure for my code.  By coding as a recursive functional transform, I can easily alter the transform as appropriate for special purpose content controls that contain C#.

Comments

Generating C# code from an XML Tree using Virtual Extension Methods

One integral part of my scheme for building a document generation system is to write some code that generates C# code to create an arbitrary XML tree. I want to transform the markup for the main document part into C# code that will produce that main document part, with the exception that at various points where I find content controls, I want to alter the transformation as appropriate. The first task is to write code that produces C# code that will create any arbitrary XML tree.

This post is the fifth in a series of blog posts. Here is the complete list: Generating Open XML WordprocessingML Documents Blog Post Series

To demonstrate what I mean by code that generates code, here is a small snippet that parses a string, creates an XML tree, and then prints the code that the XElementToCode method produces:

XElement e = XElement.Parse(
@"<Root xmlns='http://www.ericwhite.com'>
  <Child>This is a text node.</Child>
  <!--Here is a comment.-->
</Root>");
Console.WriteLine("var z = {0};", LtxToCode.XElementToCode(e));

This produces the following automatically written code:

var z = new XElement("{http://www.ericwhite.com}Root",
  new XAttribute("xmlns", @"http://www.ericwhite.com"),
  new XElement("{http://www.ericwhite.com}Child",
    new XText(@"This is a text node.")
  ),
  new XComment(@"Here is a comment.")
);

The code that I present in this post uses expanded XML names, which deserve a bit of an explanation.

Expanded XML Names

In LINQ to XML, an expanded name is an approach that enables specification of a namespace and local name in a single string. The gist of it (which you can see in the example above) is that the namespace is enclosed in curly braces, followed by the local name.

The normal idiom when working with names and namespaces in LINQ to XML is to declare and initialize an XNamespace object, and then use the overload of the ‘+’ operator to combine the namespace with a local name to create a fully qualified name:

XNamespace ew = "http://www.ericwhite.com";
XElement root = new XElement(ew + "Root");
Console.WriteLine(root);

This snippet is identical in functionality to the following example, which uses an expanded name:

XElement root = new XElement("{http://www.ericwhite.com}Root");
Console.WriteLine(root);

While the second approach is perhaps marginally slower than the first approach, it is far easier to generate code that uses the second approach. If I used the first approach, I would need to setup a dictionary that maps namespace names to XNamespace object names, and then appropriately generate code that uses the correct XNamespace objects. It is a fair amount of housekeeping. So instead of using that approach, the generated code specifies fully qualified names using expanded names.

One note about the LINQ to XML programming interface: When you call the ToString method on an XName object, the returned string is an expanded name. For instance, the following code prints the fully qualified name of an element:

XElement root = XElement.Parse("<Root xmlns='http://www.ericwhite.com'/>");
Console.WriteLine(root.Name);

This outputs the expanded name:

{http://www.ericwhite.com}Root

About the Example

The code in the following example uses extension methods to implement a recursive transform from the XML tree to code that will create the XML tree.  The example contains a few sample XML documents that it converts to code.  It produces a C# that you can compile and run – the C# code instantiates two XML trees – one using the XElement.Parse method, and another using the C# code that is generated by the example.  The example then uses DeepEquals to validate that the two trees are identical.

To run the example:

Create a new C# console application

Copy and paste the following code into Program.cs.

Run the example.  This produces a new file, GeneratedTestProgram.cs.  You can examine the generated code for each XML tree in the generated program.

Next, we want to validate that the generated code actually generates the XML tree that it should.  Create a new C# console application, replace Program.cs in the new program with the generated program, and then run it to validate that the generated code produced the correct XML tree.

This example simulates the use of virtual extension methods, which made the example very easy to write.  In the next post, I’ll explain virtual extension methods.

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Xml.Linq;

public static class LocalExtensions
{
    public static string StringConcatenate(this IEnumerable<string> source)
    {
        StringBuilder sb = new StringBuilder();
        foreach (string item in source)
            sb.Append(item);
        return sb.ToString();
    }
}

public static class LtxToCode
{
    public static XNamespace w = "http://schemas.openxmlformats.org/wordprocessingml/2006/main";
    public static XNamespace ew = "http://www.ericwhite.com/xmlcodegeneration";

    private static string Indentation(XObject xObject)
    {
        XAttribute attribute = xObject as XAttribute;
        if (attribute != null)
            return "".PadRight((attribute.Parent.Ancestors().Count() + 1 +
                (attribute.Parent.Document != null ? 1 : 0)) * 2, ' ');
        XElement element = xObject as XElement;
        if (element != null)
            return "".PadRight((element.Ancestors().Count() +
                (element.Document != null ? 1 : 0)) * 2, ' ');
        XDocument document = xObject as XDocument;
        if (document != null)
            return "";
        XProcessingInstruction pi = xObject as XProcessingInstruction;
        if (pi != null)
            return "".PadRight((pi.Ancestors().Count() +
                (pi.Document != null ? 1 : 0)) * 2, ' ');
        XNode node = xObject as XNode;
        if (node != null)
            if (node.Parent != null)
                return "".PadRight((node.Parent.Ancestors().Count() +
                    1 + (node.Document != null ? 1 : 0)) * 2, ' ');
            else
                return "";
        throw new CodeGenerationException("Internal error");
    }

    public static string ToCode(this XObject xObject)
    {
        XAttribute a = xObject as XAttribute;
        if (a != null)
            return a.ToCode();
        XElement element = xObject as XElement;
        if (element != null)
            return element.ToCode();
        XCData cdata = xObject as XCData;
        if (cdata != null)
            return cdata.ToCode();
        XText text = xObject as XText;
        if (text != null)
            return text.ToCode();
        XComment comment = xObject as XComment;
        if (comment != null)
            return comment.ToCode();
        XProcessingInstruction pi = xObject as XProcessingInstruction;
        if (pi != null)
            return pi.ToCode();
        throw new CodeGenerationException("Internal error");
    }

    public static string ToCode(this XDocument document)
    {
        var s = "new XDocument(" + Environment.NewLine +
            (document.Declaration != null ?
                String.Format("  new XDeclaration(\"{0}\", \"{1}\", \"{2}\")," +
                Environment.NewLine,
                document.Declaration.Version, document.Declaration.Encoding,
                document.Declaration.Standalone) :
                "") +
            TrimFinalComma(document
                .Nodes().Select(n => n.ToCode()).StringConcatenate()) +
            ")";
        return s;
    }

    public static string ToCode(this XElement element)
    {
        var c = element
            .Attributes()
            .Cast<XObject>()
            .Concat(element.Nodes().Cast<XObject>());
        if (element.Name == ew + "Literal")
            return element.Value;
        if (c.Count() == 0)
            return Indentation(element) +
                String.Format("new XElement(\"{0}\")," + Environment.NewLine,
                element.Name);
        else
            return Indentation(element) +
                String.Format("new XElement(\"{0}\"," + Environment.NewLine,
                    element.Name) +
                TrimFinalComma(c.Select(n => n.ToCode()).StringConcatenate()) +
                Indentation(element) + ")," + Environment.NewLine;
    }

    public static string ToCode(this XAttribute attribute)
    {
        return Indentation(attribute) +
            String.Format("new XAttribute(\"{0}\", @\"{1}\")," + Environment.NewLine,
            attribute.Name, attribute.Value.Replace("\"", "\"\""));
    }

    public static string ToCode(this XText text)
    {
        return Indentation(text) +
            String.Format("new XText(@\"{0}\")," + Environment.NewLine, text.Value.Replace("\"", "\"\""));
    }

    public static string ToCode(this XComment comment)
    {
        return Indentation(comment) +
            String.Format("new XComment(@\"{0}\")," + Environment.NewLine, comment.Value.Replace("\"", "\"\""));
    }

    public static string ToCode(this XProcessingInstruction pi)
    {
        return Indentation(pi) +
            String.Format("new XProcessingInstruction(@\"{0}\", @\"{1}\")," +
                Environment.NewLine, pi.Target.Replace("\"", "\"\""), pi.Data.Replace("\"", "\"\""));
    }

    public static string ToCode(this XCData cdata)
    {
        return Indentation(cdata) +
            String.Format("new XCData(@\"{0}\")," + Environment.NewLine, cdata.Value.Replace("\"", "\"\""));
    }

    private static string TrimFinalComma(string code)
    {
        if (code.EndsWith("," + Environment.NewLine))
            return code.Substring(0, code.Length - ("," + Environment.NewLine).Length) +
                Environment.NewLine;
        return code;
    }

    public static string XElementToCode(XElement element)
    {
        string code = element.ToCode();
        if (code.EndsWith("," + Environment.NewLine))
            return code.Substring(0, code.Length - ("," + Environment.NewLine).Length);
        return code;
    }

    public static string XDocumentToCode(XDocument document)
    {
        string code = document.ToCode();
        if (code.EndsWith("," + Environment.NewLine))
            return code.Substring(0, code.Length - ("," + Environment.NewLine).Length);
        return code;
    }

    public class CodeGenerationException : Exception
    {
        public CodeGenerationException(string message)
            : base(message)
        {
        }
    }
}

public class GenerateDocumentException : Exception
{
    public GenerateDocumentException(string message)
        : base(message)
    {
    }
}

class Program
{
    static string[] testXml = new[] {
@"<Root a=""1"" b=""2""/>",

@"<?xml version=""1.0"" encoding=""UTF-8"" standalone=""yes""?>
<Root>
  <Child> abc </Child>
  <Child xmlns:space=""preserve""> abc </Child>
</Root>",

@"<?xml version=""1.0"" encoding=""UTF-8"" standalone=""yes""?>
<Root><![CDATA[foo]]></Root>",

@"<?xml version=""1.0"" encoding=""UTF-8"" standalone=""yes""?>
<Root><Child/></Root>",

@"<?xml version=""1.0"" encoding=""UTF-8"" standalone=""yes""?>
<Root/>",

@"<Root xmlns=""http://www.ericwhite.com/aaaaa"">
  <Child xmlns=""http://www.ericwhite.com/child"">
    <Element att=""1""
             b:att2=""2""
             xmlns:b=""http://www.ericwhite.com/bbbbb"">abc</Element>
  </Child>
</Root>",

@"<a:Root xmlns:a=""http://www.ericwhite.com"">abc</a:Root>",

@"<a:Root xmlns:a=""http://www.ericwhite.com"">abc<!--a comment -->def</a:Root>",

@"<Root>abc</Root>",

@"<Root att1=""1"" att2=""2""/>",

@"<Root/>",

@"<Root att1=""1"">
  <Child>
    <Gc1>abc<b/>def</Gc1>
  </Child>
  <Child>
    <Gc2>abc</Gc2>
  </Child>
</Root>",

@"<?xml version=""1.0"" encoding=""UTF-8"" standalone=""yes""?>

<?mso-application progid=""Word.Document""?>

<!--This is a comment at the root level.  There are also white space nodes at the root level.-->

<pkg:package xmlns:pkg=""http://schemas.microsoft.com/office/2006/xmlPackage"">
  <pkg:part pkg:name=""/_rels/.rels""
            pkg:contentType=""application/vnd.openxmlformats-package.relationships+xml""
            pkg:padding=""512"">
    <pkg:xmlData>
      <Relationships
          xmlns=""http://schemas.openxmlformats.org/package/2006/relationships"">
        <Relationship Id=""rId3""
                      Type=""http://schemas.openxmlformats.org""
                      Target=""docProps/app.xml""/>
        <Relationship Id=""rId2""
                      Type=""http://schemas.openxmlformats.org""
                      Target=""docProps/core.xml""/>
        <Relationship Id=""rId1""
                      Type=""http://schemas.openxmlformats.org""
                      Target=""word/document.xml""/>
      </Relationships>
    </pkg:xmlData>
  </pkg:part>
</pkg:package>",
};

    static void Main(string[] args)
    {
        string st1 = (@"using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml.Linq;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
");
        StringBuilder sb = new StringBuilder();
        // test as XElement
        for (int i = 0; i < testXml.Length; i++)
        {
            sb.Append(
                String.Format("var xElementSourceTree{0} = XElement.Parse(@\"{1}\");",
                i, testXml[i].Replace("\"", "\"\"")));
            sb.Append(Environment.NewLine);
            sb.Append(String.Format("var xElementCodeTree{0} = {1};",
                i, LtxToCode.XElementToCode(XElement.Parse(testXml[i]))));
            sb.Append(Environment.NewLine);
            sb.Append(String.Format(
                "if (XNode.DeepEquals(xElementSourceTree{0}, xElementCodeTree{0}))", i));
            sb.Append(Environment.NewLine);
            sb.Append(String.Format(
                "    Console.WriteLine(\"XElement Test {0} Passed\");", i));
            sb.Append(Environment.NewLine);
            sb.Append("else");
            sb.Append(Environment.NewLine);
            sb.Append(String.Format(
                "    Console.WriteLine(\"XElement Test {0} Failed\");", i));
            sb.Append(Environment.NewLine);
            sb.Append(Environment.NewLine);
        }
        // test as XDocument
        for (int i = 0; i < testXml.Length; i++)
        {
            sb.Append(String.Format(
                "var xDocumentSourceTree{0} = XDocument.Parse(@\"{1}\");",
                i, testXml[i].Replace("\"", "\"\"")));
            sb.Append(Environment.NewLine);
            sb.Append(String.Format("var xDocumentCodeTree{0} = {1};",
                i, LtxToCode.XDocumentToCode(XDocument.Parse(testXml[i]))));
            sb.Append(Environment.NewLine);
            sb.Append(String.Format(
              "if (XNode.DeepEquals(xDocumentSourceTree{0}, xDocumentCodeTree{0}))", i));
            sb.Append(Environment.NewLine);
            sb.Append(String.Format(
                "    Console.WriteLine(\"XDocument Test {0} Passed\");", i));
            sb.Append(Environment.NewLine);
            sb.Append("else");
            sb.Append(Environment.NewLine);
            sb.Append(String.Format(
                "    Console.WriteLine(\"XDocument Test {0} Failed\");", i));
            sb.Append(Environment.NewLine);
            sb.Append(Environment.NewLine);
        }
        string st2 = @"        }
    }
}";
        string fullProgram = st1 + sb.ToString() + st2;
        File.WriteAllText("GeneratedTestProgram.cs", fullProgram);
    }
}

Comments (6)

« Previous Page« Previous entries « Previous Page · Next Page » Next entries »Next Page »