Eric White » Topic: Merging Manipulated Word Docs

Merging Manipulated Word Docs

Tagged: Merge WmlDocument WordprocessingDocument

This topic has 3 replies, 2 voices, and was last updated 10 years, 2 months ago by Eric White.

Viewing 3 posts - 1 through 3 (of 3 total)

Author

Posts
March 16, 2016 at 5:11 am #2556
Eric White
Keymaster
Hi,

The WmlDocument is an abstraction for an unopened Open XML document. It is a thin wrapper over a byte array. If you want to process a document successively by the Open-Xml-Sdk, and then by DocumentBuilder, the best way is:
1. Get the document as a byte array, perhaps by calling File.ReadAllBytes, or by serializing a stream to a byte array.
2. Create a new MemoryStream
3. Write the byte array to the memory stream. Note that you do not want to use the MemoryStream constructor that takes a byte array as an argument, as that creates a non-resizable memory stream, which means you can’t make changes to the document. Instead, you new up a MemoryStream using the default constructor, and then write the byte array into the MemoryStream as the first line in your using block.
4. Open the MemoryStream using the Open-Xml-Sdk. Make modifications to the document.
5. Once you drop out of the using block for the WordprocessingDocument, the memory stream will be updated. You can get the byte array by calling ToArray() on the MemoryStream.
6. You can then new up the WmlDocument using the byte array, and do DocumentBuilder operations on it.
This sounds more complicated than it is. At any point in time, you have:
- A byte array.
- A memory stream.
- An opened WordprocessingDocument
- A WmlDocument created from the byte array, or retrieved from an Open-Xml-PowerTools function. You can get the byte array from a public property in this class.
I agree, it isn’t ideal. The root of this division is a difference in programming models – LINQ to XML is a better programming model than some of the classes in the Open-Xml-Sdk for doing certain types of transforms, so all of Open-Xml-PowerTools are written using LINQ to XML. Further, in order to make DocumentBuilder as robust and useful as possible, it was easier to write DocumentBuilder such that it was responsible for doing the actual opening of the documents in the source list. In particular, if a source references the same document twice, DocumentBuilder opens it twice, therefore, it is important to take unopened documents as input into DocumentBuilder.

I have had an idea of doing a much tighter integration of Open-Xml-PowerTools and the Open-Xml-Sdk, making your job easier, but this is not currently in my plans. But we never know, plans change.
March 24, 2016 at 2:38 pm #3270

AlanSMac
Participant

Thanks Eric.

Bizarrely I replied to this the day after you posted and it never showed. I tried to post again immediately and the server said it detected a duplicate post and it never ever showed up!

Just wanted to say your response was really useful and much appreciated. I ended up creating a wrapper for conveniently being able to interchange between the formats. The only thing is the callis responsbile for disposing the Word doc etc. to get the bytes to udpdate like you mentioned:

public class InterchangeableWordProcessingDocument : IDisposable
{

public MemoryStream memoryStream { get; private set; }

public InterchangeableWordProcessingDocument(string path)
{
var bytes = File.ReadAllBytes(path);
CreateMemoryStream(bytes);
}

private MemoryStream CreateMemoryStream(byte[] bytes)
{
//Do not use byte array constructor as this is not resizable i.e. does not handle change.
memoryStream = new MemoryStream();
memoryStream.Write(bytes, 0, bytes.Length);
return memoryStream;
}

public WordprocessingDocument GetAsWordProcessingDocument()
{
return WordprocessingDocument.Open(memoryStream, true);
}

public WmlDocument GetAsWmlDocument()
{
return new WmlDocument(“dummy”, memoryStream.ToArray());
}

public void Dispose()
{
memoryStream.Dispose();
}
}

While I’m posting for my particular task it would have been great if the TextReplacer class exposed the method that does all the hard work on an individual element rather than the only public method being to replace all instances in a whole document. Sorry I forget the method name but it looked it would just be a case of changing the accessor. I ended up having to put in code to manually handle replacing values like <<myvariable>> becuase the << and >> would sometimes be broken into 2 or 3 elements but I didn’t want to replace all instances in my particualr scenario (different parts of the document were for different people and had different values based for variables based on the person). I got the impressiong TextReplacer had had a lot of work and pain to handle these types of things.

Thanks again for your speedy and useful response.

March 29, 2016 at 3:54 am #3291

Eric White
Keymaster

Also take a look at OpenXmlRegex. It is a superset of TextReplacer.

http://www.ericwhite.com/blog/blog/openxmlregex-developer-center/
Author

Posts

Viewing 3 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic.

Eric White's Blog

Merging Manipulated Word Docs

Forums

Developer Content

User

Blog TOC

Archives

Categories

Search