Replacing Text in Revision Tracked Document

Home Forums Open-Xml-PowerTools Replacing Text in Revision Tracked Document

This topic contains 4 replies, has 2 voices, and was last updated by  AlanSMac 7 years, 8 months ago.

Viewing 5 posts - 1 through 5 (of 5 total)
  • Author
    Posts
  • #3648

    AlanSMac
    Participant

    Hi Eric,

    I am replacing variables in docx templates with values in a Windows Service. To do this I use TextReplacer.SearchAndReplace. It turns out some of our customers have been using Revision Tracking with their templates and TextReplacer throws when it detects Revision Tracking elements. I’d like to handle this in our service.

    I have tried removing the elements which ends up with an NullReferenceException:

    System.NullReferenceException: Object reference not set to an instance of an object.
    at OpenXmlPowerTools.PtOpenXmlExtensions.GetXDocument(OpenXmlPart part)

    Code:

    private void ProcessRevisionTracking(WordprocessingDocument wordDoc)
            {
                //var revisionTrackingOn = RevisionAccepter.HasTrackedRevisions(wordDoc);
    
                //RevisionAccepter.P
                var revisionTrackingElements = wordDoc.MainDocumentPart.DocumentSettingsPart.Settings.Descendants<TrackRevisions>();
    
                var revisionTrackingOn = revisionTrackingElements
                                            .Any(rte => rte.Val = new DocumentFormat.OpenXml.OnOffValue(true));
    
                if (revisionTrackingOn)
                {
                    logger.LogDebug("Revision tracking " + (revisionTrackingOn ? "detected" : "NOT enabled"));
                    foreach (var rte in revisionTrackingElements)
                        rte.Remove();
                
    
                List<OpenXmlPart> parts = new List<OpenXmlPart>();
    
                parts.Add(wordDoc.MainDocumentPart);
                parts.AddRange(wordDoc.MainDocumentPart.HeaderParts);
                parts.AddRange(wordDoc.MainDocumentPart.FooterParts);
                parts.Add(wordDoc.MainDocumentPart.EndnotesPart);
                    parts.Add(wordDoc.MainDocumentPart.FootnotesPart);
    
                foreach (var part in parts)
                {
                    ProcessRevisionTracking(part);
                }
    
               // wordDoc.MainDocumentPart.Document.Save();
    
                logger.LogDebug(String.Format("Revision elements still found? - {0}", RevisionAccepter.HasTrackedRevisions(wordDoc)));
                }
                else
                    logger.LogDebug("Revision tracking NOT detected");
            }
    
            private void ProcessRevisionTracking(OpenXmlPart part)
            {
    
                    //var matchNames = RevisionAccepter.TrackedRevisionsElements
    
                    var revisionElements = part.GetXDocument().Descendants()
                                .Where(desc => RevisionAccepter.TrackedRevisionsElements.Contains(desc.Name))
                                .ToArray();
    
                    logger.LogDebug(revisionElements.Length + " revision elements found to remove.");
                    foreach (var elem in revisionElements)
                    {
                        elem.Remove();
                    }
    
            }

    I came across this page https://msdn.microsoft.com/en-us/library/ee836138(v=office.12).aspx#AcceptRevisions_RemovingElements which sounds like I’d need to do some complicated collapsing versus removing. Do you have any advice on how to easily replace the text or is it a case of me coding everything mentioned in that article? I don’t need Revision Tracking in the output document (but can have it if it’s easier), but I would need all the normal document elements.

    I was also looking at the source code of https://github.com/VisualOn/OpenXmlPowerTools/blob/master/RevisionAccepter.cs method public static bool PartHasTrackedRevisions(OpenXmlPart part). I don’t understand how some of those elements that don’t sound RevisionTracking related indicate tracking is in use. For instance is the existence of a W.cellDel really enough to signal revision tracking is being used and that element relates to that?

    • This topic was modified 7 years, 9 months ago by  AlanSMac. Reason: Code did not format correctly
    #3650

    Eric White
    Keymaster

    Hi,

    You should be using OpenXmlRegex, not TextReplacer. OpenXmlRegex can do everything that TextReplacer can do, and a lot more, including replacing text in a document that contains tracked revisions.

    http://www.ericwhite.com/blog/blog/openxmlregex-developer-center/

    Cheers, Eric

    #3656

    AlanSMac
    Participant

    Hi Eric,

    thanks for your reply. I am in the middle of trying to convert my code over to use OpenXmlRegex. I followed the link and also watched one of your YouTube videos about it and can’t get the replace to work despite the fact I think I am calling correctly. I can find matches with the regex but not replace the value. I am in the UK so at home now so will try again tomorrow when if the office. I think it might be because all my existing code was based on WordprocessingDocument and now I have had to call doc.MainDocumentPart.GetXDocument(); and manipulate that. Maybe it’s not persisting back and I have to save to the same stream?

    public void ReplaceFirst(WordprocessingDocument doc, params KeyValuePair<string, string>[] kvps)
            {
                var xdoc = doc.MainDocumentPart.GetXDocument();
    
                foreach (var kvp in kvps)
                {
                    
    
                    //OOXML library does not like null or empty
                    string value = (kvp.Value == null || kvp.Value == string.Empty) ? " " : kvp.Value;
    
                    logger.LogDebug("Applying value: [" + kvp.Key + "] " + value);
                    //var content = doc.MainDocumentPart.Document.Body.Descendants<Text>();
                    var content = xdoc.Descendants(W.p);
    
                    logger.LogDebug("Found " + content.Count() + " text elements to search");
                    //var regex = new Regex(VariablePrefix + kvp.Key + VariableSuffix);
                    var regex = new Regex("contact");
    
                    logger.LogDebug(OpenXmlRegex.Match(content, regex) + " matches");
    
                    bool isFirstReplacement = true;
                    OpenXmlRegex.Replace(content, regex, value, (xElement, match) =>
                    {
                        if (isFirstReplacement)
                        {
                            isFirstReplacement = false;
                            logger.LogDebug("Replaced match");
                            return true;
                        }
                        logger.LogDebug("Did not replace match");
                        return false;
                    }
                    );
    
                    
                    //TextReplacer.SearchAndReplace(doc, VariablePrefix + kvp.Key + VariableSuffix, value, false);
                }
                
    
            }

    Later on the WordprocessingDocument is saved via wordDoc.MainDocumentPart.Document.Save()

    I am wondering if I need to do something to see changes in a WordProcessingDocument caused by OpenXmlRegex against an XDocument.

    Thanks

    #3668

    Eric White
    Keymaster

    Hi Alan,

    You may be seeing a problem associated with using the strongly-typed OM vs using LINQ to XML (which Open-Xml-PowerTools uses).

    The short answer – before and after using OpenXmlRegex, close and reopen the document.

    There is some strange caching in the strongly typed OM that doesn’t play nicely with using other XML technologies. I used to try to deal with this caching and avoid opening and closing the document. However, it is super cheap to open / close, and there are edge cases associated with caching that make it difficult, so my recommendation now is to close and reopen the document when you need to use OpenXmlRegex.

    Cheers, Eric

    #3706

    AlanSMac
    Participant

    Hi Eric,

    thanks for your help. I got this working for the revision/change tracking code. Strangely I had to leave some other code using TextReplacer instead of OpenXmlRegex because I couldn’t get the two object trees (OM vs using LINQ to XML) to sync in memory but only for that case. I have to keep the former object model because it seems easier when inserting new element. I have to do that to append breaks after newline characters otherwise the new lines don’t work and do a little paragraph object manipulation elsewhere for a superfluous blank page issue. Perhaps it would have been easier than I thought to do via Linq to SQL but I like the properties and methods on the OM classes.

    Thanks!

Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic.