SimplifyMarkup – Saved file corrupted
Home › Forums › Open-Xml-PowerTools › SimplifyMarkup – Saved file corrupted
Tagged: Error, SimplifyMarkup
This topic contains 3 replies, has 2 voices, and was last updated by FRCMNS0 8 years, 6 months ago.
-
AuthorPosts
-
November 28, 2016 at 4:45 pm #3981
As an addendum, the extra SimplifyMarkupSettings options (everything besides NormalizeXml) doesn’t cause errors, only when NormalizeXml is set this problem occurs.
November 28, 2016 at 5:37 pm #3982Hi,
I think that there is something else causing this problem, not MarkupSimplifier.
You are getting a failure in parsing the xml in the /word/styles.xml file, not the main document part, which is what NormalizeXml operates on. It looks as though your styles.xml file maybe doesn’t have anything in it, which could be caused by any of a variety of things, but probably not by MarkupSimplifier, not to say that MarkupSimplifier doesn’t modify styles.xml – it might, I can’t recall, but this is not the first place I’d look for this bug. I’d look for what is writing to styles.xml, and see why the XML parser is failing on it.
You can also manually examine the styles.xml file using the Open XML Package Editor Add-In for Visual Studio. That may provide a clue as to why the parser is failing on reading the styles.xml part.
Best, Eric
November 28, 2016 at 6:10 pm #3983In this case, I am sure that MarkupSimplifier is modifying the styles.xml file. Here is the entire sample code used:
using DocumentFormat.OpenXml.Packaging; using OpenXmlPowerTools; using System; using System.Collections.Generic; using System.IO; using System.Linq; using System.Text; namespace DocxTest { class Program { static void Main(string[] args) { try { File.Copy("PORTA.docx", "PORTA_copy.docx"); using (var docMaster = WordprocessingDocument.Open("PORTA_copy.docx", true)) { SimplifyMarkupSettings settings = new SimplifyMarkupSettings { NormalizeXml = true, // Merges Run's in a paragraph with similar formatting // Additional settings if required RemoveBookmarks = true, RemoveComments = true, RemoveGoBackBookmark = true, RemoveWebHidden = true, RemoveContentControls = true, RemoveEndAndFootNotes = true, //RemoveFieldCodes = true, RemoveLastRenderedPageBreak = true, RemovePermissions = true, RemoveProof = true, RemoveRsidInfo = true, RemoveSmartTags = true, RemoveSoftHyphens = true, }; MarkupSimplifier.SimplifyMarkup(docMaster, settings); docMaster.Save(); } Console.WriteLine("Done."); } catch(Exception ex) { Console.WriteLine("Error: {0}", ex.ToString()); } Console.ReadLine(); } } }
There’s nothing else being done to the document.
Most of the differences are a extra space before closing a tag or reordered attributes.The major change is right on the beginning of the file, mostly additional namespace declarations.
Here is a WinMerge report with the differences highlighted:
https://drive.google.com/file/d/0B0ZNalzpb4uFRjdndWFidTduME0/view?usp=sharing -
AuthorPosts
You must be logged in to reply to this topic.