SimplifyMarkup – Saved file corrupted

Home Forums Open-Xml-PowerTools SimplifyMarkup – Saved file corrupted

This topic contains 3 replies, has 2 voices, and was last updated by  FRCMNS0 8 years, 6 months ago.

Viewing 3 posts - 1 through 3 (of 3 total)
  • Author
    Posts
  • #3981

    FRCMNS0
    Participant

    As an addendum, the extra SimplifyMarkupSettings options (everything besides NormalizeXml) doesn’t cause errors, only when NormalizeXml is set this problem occurs.

    #3982

    Eric White
    Keymaster

    Hi,

    I think that there is something else causing this problem, not MarkupSimplifier.

    You are getting a failure in parsing the xml in the /word/styles.xml file, not the main document part, which is what NormalizeXml operates on. It looks as though your styles.xml file maybe doesn’t have anything in it, which could be caused by any of a variety of things, but probably not by MarkupSimplifier, not to say that MarkupSimplifier doesn’t modify styles.xml – it might, I can’t recall, but this is not the first place I’d look for this bug. I’d look for what is writing to styles.xml, and see why the XML parser is failing on it.

    You can also manually examine the styles.xml file using the Open XML Package Editor Add-In for Visual Studio. That may provide a clue as to why the parser is failing on reading the styles.xml part.

    Best, Eric

    #3983

    FRCMNS0
    Participant

    In this case, I am sure that MarkupSimplifier is modifying the styles.xml file. Here is the entire sample code used:

    using DocumentFormat.OpenXml.Packaging;
    using OpenXmlPowerTools;
    using System;
    using System.Collections.Generic;
    using System.IO;
    using System.Linq;
    using System.Text;
    
    namespace DocxTest
    {
        class Program
        {
            static void Main(string[] args)
            {
                try
                {
                    File.Copy("PORTA.docx", "PORTA_copy.docx");
                    using (var docMaster = WordprocessingDocument.Open("PORTA_copy.docx", true))
                    {
                        SimplifyMarkupSettings settings = new SimplifyMarkupSettings
                        {
                            NormalizeXml = true, // Merges Run's in a paragraph with similar formatting
    
                            // Additional settings if required
                            RemoveBookmarks = true,
                            RemoveComments = true,
                            RemoveGoBackBookmark = true,
                            RemoveWebHidden = true,
                            RemoveContentControls = true,
                            RemoveEndAndFootNotes = true,
                            //RemoveFieldCodes = true,
                            RemoveLastRenderedPageBreak = true,
                            RemovePermissions = true,
                            RemoveProof = true,
                            RemoveRsidInfo = true,
                            RemoveSmartTags = true,
                            RemoveSoftHyphens = true,
                        };
    
                        MarkupSimplifier.SimplifyMarkup(docMaster, settings);
    
                        docMaster.Save();
                    }
    
                    Console.WriteLine("Done.");
                }
                catch(Exception ex)
                {
                    Console.WriteLine("Error: {0}", ex.ToString());
                }
                
                Console.ReadLine();
            }
        }
    }
    

    There’s nothing else being done to the document.
    Most of the differences are a extra space before closing a tag or reordered attributes.

    The major change is right on the beginning of the file, mostly additional namespace declarations.

    Here is a WinMerge report with the differences highlighted:
    https://drive.google.com/file/d/0B0ZNalzpb4uFRjdndWFidTduME0/view?usp=sharing

Viewing 3 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic.