Merging Manipulated Word Docs

Home Forums Open-Xml-PowerTools Merging Manipulated Word Docs

Viewing 3 posts - 1 through 3 (of 3 total)
  • Author
    Posts
  • #2556
    Eric White
    Keymaster

    Hi,

    The WmlDocument is an abstraction for an unopened Open XML document. It is a thin wrapper over a byte array. If you want to process a document successively by the Open-Xml-Sdk, and then by DocumentBuilder, the best way is:

    1. Get the document as a byte array, perhaps by calling File.ReadAllBytes, or by serializing a stream to a byte array.
    2. Create a new MemoryStream
    3. Write the byte array to the memory stream. Note that you do not want to use the MemoryStream constructor that takes a byte array as an argument, as that creates a non-resizable memory stream, which means you can’t make changes to the document. Instead, you new up a MemoryStream using the default constructor, and then write the byte array into the MemoryStream as the first line in your using block.
    4. Open the MemoryStream using the Open-Xml-Sdk. Make modifications to the document.
    5. Once you drop out of the using block for the WordprocessingDocument, the memory stream will be updated. You can get the byte array by calling ToArray() on the MemoryStream.
    6. You can then new up the WmlDocument using the byte array, and do DocumentBuilder operations on it.

    This sounds more complicated than it is. At any point in time, you have:

    • A byte array.
    • A memory stream.
    • An opened WordprocessingDocument
    • A WmlDocument created from the byte array, or retrieved from an Open-Xml-PowerTools function. You can get the byte array from a public property in this class.

    I agree, it isn’t ideal. The root of this division is a difference in programming models – LINQ to XML is a better programming model than some of the classes in the Open-Xml-Sdk for doing certain types of transforms, so all of Open-Xml-PowerTools are written using LINQ to XML. Further, in order to make DocumentBuilder as robust and useful as possible, it was easier to write DocumentBuilder such that it was responsible for doing the actual opening of the documents in the source list. In particular, if a source references the same document twice, DocumentBuilder opens it twice, therefore, it is important to take unopened documents as input into DocumentBuilder.

    I have had an idea of doing a much tighter integration of Open-Xml-PowerTools and the Open-Xml-Sdk, making your job easier, but this is not currently in my plans. But we never know, plans change.

    #3270
    AlanSMac
    Participant

    Thanks Eric.

    Bizarrely I replied to this the day after you posted and it never showed. I tried to post again immediately and the server said it detected a duplicate post and it never ever showed up!

    Just wanted to say your response was really useful and much appreciated. I ended up creating a wrapper for conveniently being able to interchange between the formats. The only thing is the callis responsbile for disposing the Word doc etc. to get the bytes to udpdate like you mentioned:

    public class InterchangeableWordProcessingDocument : IDisposable
    {

    public MemoryStream memoryStream { get; private set; }

    public InterchangeableWordProcessingDocument(string path)
    {
    var bytes = File.ReadAllBytes(path);
    CreateMemoryStream(bytes);
    }

    private MemoryStream CreateMemoryStream(byte[] bytes)
    {
    //Do not use byte array constructor as this is not resizable i.e. does not handle change.
    memoryStream = new MemoryStream();
    memoryStream.Write(bytes, 0, bytes.Length);
    return memoryStream;
    }

    public WordprocessingDocument GetAsWordProcessingDocument()
    {
    return WordprocessingDocument.Open(memoryStream, true);
    }

    public WmlDocument GetAsWmlDocument()
    {
    return new WmlDocument(“dummy”, memoryStream.ToArray());
    }

    public void Dispose()
    {
    memoryStream.Dispose();
    }
    }

    While I’m posting for my particular task it would have been great if the TextReplacer class exposed the method that does all the hard work on an individual element rather than the only public method being to replace all instances in a whole document. Sorry I forget the method name but it looked it would just be a case of changing the accessor. I ended up having to put in code to manually handle replacing values like <<myvariable>> becuase the << and >> would sometimes be broken into 2 or 3 elements but I didn’t want to replace all instances in my particualr scenario (different parts of the document were for different people and had different values based for variables based on the person). I got the impressiong TextReplacer had had a lot of work and pain to handle these types of things.

    Thanks again for your speedy and useful response.

    #3291
    Eric White
    Keymaster

    Also take a look at OpenXmlRegex. It is a superset of TextReplacer.

    http://www.ericwhite.com/blog/blog/openxmlregex-developer-center/

Viewing 3 posts - 1 through 3 (of 3 total)
  • You must be logged in to reply to this topic.