Eric White » Topic: Replace placeholders enclosed in {} in .docx file from MVC app

Replace placeholders enclosed in {} in .docx file from MVC app

Tagged: docx, openxml, Placeholders, replace

This topic contains 3 replies, has 3 voices, and was last updated by Anonymous 3 years, 6 months ago.

Viewing 4 posts - 1 through 4 (of 4 total)

Author

Posts
May 16, 2020 at 12:17 pm #8647

MayuriB
Participant

I have a requirement in my MVC application where I want to programmatically replace placeholders with text or images from DB in a predefined document template.
Scenario: The application needs to generate one report based on some survey questionnaire answers. Report should be generated from a predefined document template(.docx) file available with all styling and formatting applied. Now I want to read this template, search for the placeholders and replace it with appropriate data. It should also repeat the document sections multiple times dynamically if required for multiple categories. It should repeat template contents for those many categories by replacing placeholders with respective category data. while doing this it should not remove or disturb the report styling and format.

How to achieve this using OpenXML SDK methods? I don’t want to use any third party tools. I did some R&D on google and tried few options. Tried using Stream, reading all text and then replace but it disturbs the document format. Also tried using searching placeholdes in paragraphs, Runs, Text Descendents but with this it does not replace all the placeholders. some placeholders get ignored

May 16, 2020 at 1:17 pm #8648

MayuriB
Participant

sometimes the placeholder spans across Runs. I want a solution that will search placeholders irrespective of their position container or the way it is being placed in doc xml because none of the solution is working for me

May 27, 2020 at 10:38 pm #8688

Chunk
Participant

I think I had similar case. I’ve had placeholder in my template. I’ve used DocumentAssembler provided by Eric White for this purpose.
I needed to put AltChunks in that placeholder. This AltChunks were HTML pages with the encoded images (I had to download the external sources and convert them to Base64 previously).
During generating that AltChunk I’ve created a placeholder for each of the Chunks. So DocumentAssembler was putting only that ChunkId as a string in the generated document. Then replacing that string with my AltChunk (you can replace it with anything) was pretty easy.

I’ve written two methods for this purpose, one is preparing the data, the second is consuming them:
private void PrepareEmbeddedContent()
{
var htmlFields = DataModel.XPathSelectElements(“//Fields/*[@Type=’Html’]”).Where(node => !string.IsNullOrWhiteSpace(node.Value)).ToList();
if (!htmlFields.Any())
{
return;
}

using (var document = WordprocessingDocument.Open(_processedFilePath, true))
{
var mainDocumentPart = document.MainDocumentPart;

foreach (var htmlField in htmlFields)
{
var workItemId = htmlField.XPathSelectElement(“../ID”).Value;
var fieldName = htmlField.Name;
var altChunkId = $”Chunk{workItemId}{fieldName}”;
var htmlText = XmlHelper.GetDataEmbeddedHtml(htmlField.Value, NetworkCredentials);

var ms = new MemoryStream(Encoding.UTF8.GetBytes(htmlText));
// Create alternative format import part.
var formatImportPart = mainDocumentPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.Html, altChunkId);

// Feed HTML data into format import part (chunk).
formatImportPart.FeedData(ms);

htmlField.Value = $@”{{altChunkId:{altChunkId}}}”;
}
document.Close();
}
}

private void InsertEmbeddedContent()
{
XNamespace w = “http://schemas.openxmlformats.org/wordprocessingml/2006/main”;
XNamespace r = “http://schemas.openxmlformats.org/officeDocument/2006/relationships”;

using (var document = WordprocessingDocument.Open(_processedFilePath, true))
{
var mainDocumentXDoc = GetXDocument(document);
var pattern = @”{altChunkId:(.*)}”;
var regex = new Regex(pattern);
var paragraphs = mainDocumentXDoc.Descendants(W.p).ToList();

OpenXmlRegex.Match(paragraphs, regex, (element, match) =>
{
var altChunkId = match.Groups[1].Value;
var chunkElement = new XElement(w + “altChunk”,
new XAttribute(r + “id”, altChunkId)
);

var chunkPlaceholder = paragraphs.First(p => p.Value.Contains(altChunkId));
chunkPlaceholder.ReplaceWith(chunkElement);

});

SaveXDocument(document, mainDocumentXDoc);
}
}

GetXDocument and SaveXDocument are helper methods provided by Eric White.
Basically what could facilitate your need is that that RegEx, Match and ReplaceWith method.
I hope this helps you.

October 23, 2020 at 11:19 am #9848

Anonymous

You are already relying on third party libraries simply by using MVC. You rely on the third party JSON library for your calls. If your UI is any more complex then a simple LoB app you’re probably relying on a lot of third party client libraries like jQuery, Angular, Modernizr, etc. Out of the box the MVC template you’re using does this automatically. I have never really understood why people say they cannot rely on third party library and therefore want to do things the hard way. If you need to do merging of data into a Word document then you should be using the right tool, which may or may not be OpenXML. You should be ready to use the correct library for your task. Your end users care about performance and correctness, not what libraries you’re using.

Nevertheless if you want to use OpenXML you’re going to be in for a challenge mainly because Word wasn’t really designed for this. If you’re using fields in Word then it’d be easier. While I dislike PDFs a lot, they are the normal approach to doing this kind of thing. PDFs support fields and libraries like iTextSharp can replace fields in a document in a couple lines of code. Electronic signature tools like Docusign are completely based upon this. I think you should reconsider your approach and whether Word with OpenXML is really the best solution long term.

Normally I’d recommend you use mail merge from the Word API. It was designed to solve this problem. But a web app shouldn’t rely on the Office API as it is not designed for server environments. The only way to do this that I’m aware of is search and replace. However there are lots of cases you’ll need to handle. For example the word you want to replace might be part of a larger word (e.g. replace state with TX would be incorrect in the word estate). Even becomes even worse if case matters (e.g. replace Country with US would not work on country). And there are cases where maybe the word shouldn’t be replaced at all (e.g. replace first name with ‘Bob’ but not in the sentence please enter your first name. Ultimately if you intend to just do raw string replacement then you should update your document to use some sort of delimiters around the text to find. This eliminates most of these issues and makes it easier to verify everything was replaced.

As for the code it is too long to post here so I’ll provide the links. The quick and dirty solution is to simply use OpenXML to open the file, then read all the text into a string using StreamReader. You can then work with the contents like you’d do with a normal string. This is how the documentation does it. However the problem with that approach, and most likely the problem with your code, is that it doesn’t take into account that Word will break up text arbitrarily across runs. So the text Dear {firstname} may be a single run or multiple. If it is multiple then no amount of string replacement will work as you have no idea where one run ends and the next begins. So instead you have to actually look at the text across runs. As you can imagine this gets complicated fast (without using the Office API). Somebody on SO posted a solution that is supposed to handle text across runs that you’ll need want to look at instead. But that is a lot of code.

Again, if you switch your Word document to using fields for the replaceable values then it becomes easier I believe because you can just set the fields sort of like you might do in a PDF.

There is no impact of using OpenXML in AWS. You’re running a VM so you can install anything you want. It is a regular machine as far as your app goes. Even if you hosted your app directly on AWS OpenXML is simply a NuGet package so there is no restrictions. You wouldn’t be able to use the Office API of course but you aren’t looking into that anyway.
Author

Posts

Viewing 4 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic.

Eric White's Blog

Replace placeholders enclosed in {} in .docx file from MVC app

Forums

Developer Content

User

Blog TOC

Archives

Categories

Search