{"id":2695,"date":"2016-03-18T12:00:11","date_gmt":"2016-03-18T12:00:11","guid":{"rendered":"http:\/\/www.ericwhite.com\/home2\/bm8qcmjy\/public_html\/blog\/?page_id=2695"},"modified":"2016-03-18T12:01:36","modified_gmt":"2016-03-18T12:01:36","slug":"search-and-replace-text-in-an-open-xml-wordprocessingml-document","status":"publish","type":"page","link":"https:\/\/www.ericwhite.com\/blog\/search-and-replace-text-in-an-open-xml-wordprocessingml-document\/","title":{"rendered":"Search and Replace Text in an Open XML WordprocessingML Document"},"content":{"rendered":"<div class=\"post-content user-defined-markup\">\n<p>A reoccurring question around Open XML is how to search and replace text in a word-processing document. There have been several attempts at presenting example code to do this, however, until now I have not seen any examples that correctly implement this. This post presents some example code that implements a correct algorithm to search and replace text.<\/p>\n<p>The first challenge is handle the case when the text you are searching for spans runs with different formatting. A simple example will demonstrate the problem. You want to replace \u2018Hello World\u2019 with \u2018Hi World\u2019. If, in the document, the word \u2018World\u2019 is bolded, then the markup will look something like this:<\/p>\n<pre class=\"prettyprint\"><span class=\"tag\">&lt;w:p&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 <\/span><span class=\"tag\">&lt;w:t<\/span> <span class=\"atn\">xml:space<\/span><span class=\"pun\">=<\/span><span class=\"atv\">\"preserve\"<\/span><span class=\"tag\">&gt;<\/span><span class=\"pln\">Hello <\/span><span class=\"tag\">&lt;\/w:t&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;\/w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 <\/span><span class=\"tag\">&lt;w:rPr&gt;<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 \u00a0 <\/span><span class=\"tag\">&lt;w:b<\/span> <span class=\"tag\">\/&gt;<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 <\/span><span class=\"tag\">&lt;\/w:rPr&gt;<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 <\/span><span class=\"tag\">&lt;w:t&gt;<\/span><span class=\"pln\">World<\/span><span class=\"tag\">&lt;\/w:t&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;\/w:r&gt;<\/span><span class=\"pln\">\r\n<\/span><span class=\"tag\">&lt;\/w:p&gt;<\/span><\/pre>\n<p>Even though the search text spans runs, the algorithm should find the text and replace it. The next challenge is to define exactly the semantics of searching and replacing text if the text that you are searching for spans runs with different formatting. In short, the replaced text takes on the run formatting of the run that contains the first character of the search string. An example makes this clear. In the following sentence, the first four characters of the word \u2018include\u2019 are bolded:<\/p>\n<p>On the Insert tab, the galleries <b>incl<\/b>ude items.<\/p>\n<p>If you replace \u2018include\u2019 with \u2018do not include\u2019, then the sentence should be formatted like this:<\/p>\n<p>On the Insert tab, the galleries <b>do not include<\/b> items.<\/p>\n<p>The replaced text takes on the formatting of the \u2018i\u2019 character of include, which was bolded.<\/p>\n<p>Here is a short screen-cast that walks through the algorithm and the code.<\/p>\n<p><iframe loading=\"lazy\" title=\"SearchAndReplace.mp4\" width=\"500\" height=\"281\" src=\"https:\/\/www.youtube.com\/embed\/w128hJUu3GM?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe><\/p>\n<p><span style=\"font-size: 24px; font-weight: bold;\">Search and Replace Algorithm<\/span><\/p>\n<p>It certainly would be possible to carefully define an algorithm to search for text that spans runs, noting where the searched text intersects bookmarks, comments, and the like. However, this algorithm would be pretty complicated, and to be done properly, a test team would need to write extensive test specs, and supply a plethora of sample documents that exercise all edge cases. It is a non-trivial project.<\/p>\n<p>However, there is another approach that we can take that is pretty simple, easy to test, and yields the correct results in all cases. The algorithm consists of:<\/p>\n<ul>\n<li>Concatenate all text in a paragraph into a single string, and search for the search string in the concatenated text. If the search text is found, then continue with the following steps.<\/li>\n<li>Iterate through all runs in the paragraph, and break all runs into runs of a single character. There are a variety of special characters, such as carriage return, hard tab, break, and the non-breaking hyphen character. Normally, these special characters will coexist in runs with text elements. When breaking runs into runs of a single character, these special characters should also be placed into their own run. At the end of this process, no run will contain more than a single character, whether it is a character of text, or one of the special characters that is represented by an XML element.<\/li>\n<li>After breaking runs of text into multiple runs of single characters, it is then pretty easy to iterate through the runs looking for a string of runs that match the characters in the search string.<\/li>\n<li>If the algorithm finds a string of runs that match the search string, then it inserts a new run into the document. This new run contains the run properties of the first run in the string of runs that match the search string. In addition, the algorithm deletes the set of single-character runs that matched the search string. This process is repeated until no strings of runs are found that match the search string.<\/li>\n<li>After the algorithm replaces the single-character runs with a new run containing the replacement text, then the algorithm coalesces adjacent runs with the same formatting into a single run.<\/li>\n<\/ul>\n<h1>Algorithm Walk-through<\/h1>\n<p>It will be helpful to walk through an example, and examine the markup at each step in the process. The following paragraph contains the text, \u201cSee <b>th<\/b>is markup.\u201d The letters \u2018th\u2019 in the word \u2018this\u2019 is bolded. We want to change the word \u2018this\u2019 to the word \u2018the\u2019.<\/p>\n<pre class=\"prettyprint\"><span class=\"tag\">&lt;w:p&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 <\/span><span class=\"tag\">&lt;w:t<\/span> <span class=\"atn\">xml:space<\/span><span class=\"pun\">=<\/span><span class=\"atv\">\"preserve\"<\/span><span class=\"tag\">&gt;<\/span><span class=\"pln\">See <\/span><span class=\"tag\">&lt;\/w:t&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;\/w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 <\/span><span class=\"tag\">&lt;w:rPr&gt;<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 \u00a0 <\/span><span class=\"tag\">&lt;w:b<\/span> <span class=\"tag\">\/&gt;<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 <\/span><span class=\"tag\">&lt;\/w:rPr&gt;<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 <\/span><span class=\"tag\">&lt;w:t&gt;<\/span><span class=\"pln\">th<\/span><span class=\"tag\">&lt;\/w:t&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;\/w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 <\/span><span class=\"tag\">&lt;w:t&gt;<\/span><span class=\"pln\">is markup.<\/span><span class=\"tag\">&lt;\/w:t&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;\/w:r&gt;<\/span><span class=\"pln\">\r\n<\/span><span class=\"tag\">&lt;\/w:p&gt;<\/span><\/pre>\n<p>After splitting all runs into multiple runs of a single character each, the markup looks like this:<\/p>\n<pre class=\"prettyprint\"><span class=\"tag\">&lt;w:p<\/span> <span class=\"atn\">xmlns:w<\/span><span class=\"pun\">=<\/span><span class=\"atv\">\"http:\/\/schemas.openxmlformats.org\/wordprocessingml\/2006\/main\"<\/span><span class=\"tag\">&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 <\/span><span class=\"tag\">&lt;w:t&gt;<\/span><span class=\"pln\">S<\/span><span class=\"tag\">&lt;\/w:t&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;\/w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 <\/span><span class=\"tag\">&lt;w:t&gt;<\/span><span class=\"pln\">e<\/span><span class=\"tag\">&lt;\/w:t&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;\/w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 <\/span><span class=\"tag\">&lt;w:t&gt;<\/span><span class=\"pln\">e<\/span><span class=\"tag\">&lt;\/w:t&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;\/w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 <\/span><span class=\"tag\">&lt;w:t<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 \u00a0 <\/span><span class=\"atn\">xml:space<\/span><span class=\"pun\">=<\/span><span class=\"atv\">\"preserve\"<\/span><span class=\"tag\">&gt;<\/span> <span class=\"tag\">&lt;\/w:t&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;\/w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 <\/span><span class=\"tag\">&lt;w:rPr&gt;<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 \u00a0 <\/span><span class=\"tag\">&lt;w:b<\/span> <span class=\"tag\">\/&gt;<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 <\/span><span class=\"tag\">&lt;\/w:rPr&gt;<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 <\/span><span class=\"tag\">&lt;w:t&gt;<\/span><span class=\"pln\">t<\/span><span class=\"tag\">&lt;\/w:t&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;\/w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 <\/span><span class=\"tag\">&lt;w:rPr&gt;<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 \u00a0 <\/span><span class=\"tag\">&lt;w:b<\/span> <span class=\"tag\">\/&gt;<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 <\/span><span class=\"tag\">&lt;\/w:rPr&gt;<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 <\/span><span class=\"tag\">&lt;w:t&gt;<\/span><span class=\"pln\">h<\/span><span class=\"tag\">&lt;\/w:t&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;\/w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 <\/span><span class=\"tag\">&lt;w:t&gt;<\/span><span class=\"pln\">i<\/span><span class=\"tag\">&lt;\/w:t&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;\/w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 <\/span><span class=\"tag\">&lt;w:t&gt;<\/span><span class=\"pln\">s<\/span><span class=\"tag\">&lt;\/w:t&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;\/w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 <\/span><span class=\"tag\">&lt;w:t<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 \u00a0 <\/span><span class=\"atn\">xml:space<\/span><span class=\"pun\">=<\/span><span class=\"atv\">\"preserve\"<\/span><span class=\"tag\">&gt;<\/span> <span class=\"tag\">&lt;\/w:t&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;\/w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 <\/span><span class=\"tag\">&lt;w:t&gt;<\/span><span class=\"pln\">m<\/span><span class=\"tag\">&lt;\/w:t&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;\/w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 <\/span><span class=\"tag\">&lt;w:t&gt;<\/span><span class=\"pln\">a<\/span><span class=\"tag\">&lt;\/w:t&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;\/w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 <\/span><span class=\"tag\">&lt;w:t&gt;<\/span><span class=\"pln\">r<\/span><span class=\"tag\">&lt;\/w:t&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;\/w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 <\/span><span class=\"tag\">&lt;w:t&gt;<\/span><span class=\"pln\">k<\/span><span class=\"tag\">&lt;\/w:t&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;\/w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 <\/span><span class=\"tag\">&lt;w:t&gt;<\/span><span class=\"pln\">u<\/span><span class=\"tag\">&lt;\/w:t&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;\/w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 <\/span><span class=\"tag\">&lt;w:t&gt;<\/span><span class=\"pln\">p<\/span><span class=\"tag\">&lt;\/w:t&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;\/w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 <\/span><span class=\"tag\">&lt;w:t&gt;<\/span><span class=\"pln\">.<\/span><span class=\"tag\">&lt;\/w:t&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;\/w:r&gt;<\/span><span class=\"pln\">\r\n<\/span><span class=\"tag\">&lt;\/w:p&gt;<\/span><\/pre>\n<p>The algorithm can then iterate through the runs, finding the series of runs where the text of the runs matches \u2018t\u2019, \u2018h\u2019, \u2018I\u2019, \u2018s\u2019. The algorithm then inserts a new run containing the replace text, taking the run properties from the run that contained the \u2018t\u2019 in the search string, which indicates that the run is bolded. It also removes the single character runs that match the search string. The adjusted markup looks like this.<\/p>\n<pre class=\"prettyprint\"><span class=\"tag\">&lt;w:p<\/span> <span class=\"atn\">xmlns:w<\/span><span class=\"pun\">=<\/span><span class=\"atv\">\"http:\/\/schemas.openxmlformats.org\/wordprocessingml\/2006\/main\"<\/span><span class=\"tag\">&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 <\/span><span class=\"tag\">&lt;w:t&gt;<\/span><span class=\"pln\">S<\/span><span class=\"tag\">&lt;\/w:t&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;\/w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 <\/span><span class=\"tag\">&lt;w:t&gt;<\/span><span class=\"pln\">e<\/span><span class=\"tag\">&lt;\/w:t&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;\/w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 <\/span><span class=\"tag\">&lt;w:t&gt;<\/span><span class=\"pln\">e<\/span><span class=\"tag\">&lt;\/w:t&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;\/w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 <\/span><span class=\"tag\">&lt;w:t<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 \u00a0 <\/span><span class=\"atn\">xml:space<\/span><span class=\"pun\">=<\/span><span class=\"atv\">\"preserve\"<\/span><span class=\"tag\">&gt;<\/span> <span class=\"tag\">&lt;\/w:t&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;\/w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 <\/span><span class=\"tag\">&lt;w:rPr&gt;<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 \u00a0 <\/span><span class=\"tag\">&lt;w:b<\/span> <span class=\"tag\">\/&gt;<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 <\/span><span class=\"tag\">&lt;\/w:rPr&gt;<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 <\/span><span class=\"tag\">&lt;w:t&gt;<\/span><span class=\"pln\">the<\/span><span class=\"tag\">&lt;\/w:t&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;\/w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 <\/span><span class=\"tag\">&lt;w:t<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 \u00a0 <\/span><span class=\"atn\">xml:space<\/span><span class=\"pun\">=<\/span><span class=\"atv\">\"preserve\"<\/span><span class=\"tag\">&gt;<\/span> <span class=\"tag\">&lt;\/w:t&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;\/w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 <\/span><span class=\"tag\">&lt;w:t&gt;<\/span><span class=\"pln\">m<\/span><span class=\"tag\">&lt;\/w:t&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;\/w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 <\/span><span class=\"tag\">&lt;w:t&gt;<\/span><span class=\"pln\">a<\/span><span class=\"tag\">&lt;\/w:t&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;\/w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 <\/span><span class=\"tag\">&lt;w:t&gt;<\/span><span class=\"pln\">r<\/span><span class=\"tag\">&lt;\/w:t&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;\/w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 <\/span><span class=\"tag\">&lt;w:t&gt;<\/span><span class=\"pln\">k<\/span><span class=\"tag\">&lt;\/w:t&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;\/w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 <\/span><span class=\"tag\">&lt;w:t&gt;<\/span><span class=\"pln\">u<\/span><span class=\"tag\">&lt;\/w:t&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;\/w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 <\/span><span class=\"tag\">&lt;w:t&gt;<\/span><span class=\"pln\">p<\/span><span class=\"tag\">&lt;\/w:t&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;\/w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 <\/span><span class=\"tag\">&lt;w:t&gt;<\/span><span class=\"pln\">.<\/span><span class=\"tag\">&lt;\/w:t&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;\/w:r&gt;<\/span><span class=\"pln\">\r\n<\/span><span class=\"tag\">&lt;\/w:p&gt;<\/span><\/pre>\n<p>Finally, the algorithm iterates through the runs, coalescing adjacent runs with identical formatting.<\/p>\n<pre class=\"prettyprint\"><span class=\"tag\">&lt;w:p<\/span> <span class=\"atn\">xmlns:w<\/span><span class=\"pun\">=<\/span><span class=\"atv\">\"http:\/\/schemas.openxmlformats.org\/wordprocessingml\/2006\/main\"<\/span><span class=\"tag\">&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 <\/span><span class=\"tag\">&lt;w:t<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 \u00a0 <\/span><span class=\"atn\">xml:space<\/span><span class=\"pun\">=<\/span><span class=\"atv\">\"preserve\"<\/span><span class=\"tag\">&gt;<\/span><span class=\"pln\">See <\/span><span class=\"tag\">&lt;\/w:t&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;\/w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 <\/span><span class=\"tag\">&lt;w:rPr&gt;<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 \u00a0 <\/span><span class=\"tag\">&lt;w:b<\/span> <span class=\"tag\">\/&gt;<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 <\/span><span class=\"tag\">&lt;\/w:rPr&gt;<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 <\/span><span class=\"tag\">&lt;w:t&gt;<\/span><span class=\"pln\">the<\/span><span class=\"tag\">&lt;\/w:t&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;\/w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;w:r&gt;<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 <\/span><span class=\"tag\">&lt;w:t<\/span><span class=\"pln\">\r\n\u00a0 \u00a0 \u00a0 <\/span><span class=\"atn\">xml:space<\/span><span class=\"pun\">=<\/span><span class=\"atv\">\"preserve\"<\/span><span class=\"tag\">&gt;<\/span><span class=\"pln\"> markup.<\/span><span class=\"tag\">&lt;\/w:t&gt;<\/span><span class=\"pln\">\r\n\u00a0 <\/span><span class=\"tag\">&lt;\/w:r&gt;<\/span><span class=\"pln\">\r\n<\/span><span class=\"tag\">&lt;\/w:p&gt;<\/span><\/pre>\n<h1>Additional Notes<\/h1>\n<p>There are a few additional notes worth mentioning about this algorithm.<\/p>\n<ul>\n<li>This algorithm only works for paragraphs that do not contain tracked revisions. While it is certainly possible to write this functionality for content that contains tracked revisions, it significantly complicates the algorithm. The code as written checks for the existence of tracked revisions (using the code presented in Using XML DOM to Detect Tracked Revisions in Open XML WordprocessingML Documents), and throws an exception if they exist.<\/li>\n<li>If revision tracking is turned on for a document, the correct functionality would be to create the revision tracking markup, which is beyond the scope of this example. If revision tracking is turned on, the example code throws an exception.<\/li>\n<li>While my favorite way to write these types of algorithms is to use LINQ to XML, to make this code more widely applicable, I used System.Xml.XmlDocument, which is an implementation of XML DOM. This makes it easier to translate this code to a variety of other platforms, such as PHP or Java.<\/li>\n<li>The code searches and replaces text in the main document part, all headers, all footers, the endnote part, and the footnote part.<\/li>\n<\/ul>\n<\/div>\n<p>Download &#8211; <a href=\"https:\/\/www.ericwhite.com\/blog\/wp-content\/uploads\/2016\/03\/SearchAndReplace.zip\">Example Code<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>A reoccurring question around Open XML is how to search and replace text in a word-processing document. There have been several attempts at presenting example code to do this, however, until now I have not seen any examples that correctly implement this. This post presents some example code that implements a correct algorithm to search [&hellip;]<\/p>\n","protected":false},"author":10567,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"_s2mail":"","footnotes":""},"class_list":["post-2695","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/www.ericwhite.com\/blog\/wp-json\/wp\/v2\/pages\/2695","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.ericwhite.com\/blog\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.ericwhite.com\/blog\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.ericwhite.com\/blog\/wp-json\/wp\/v2\/users\/10567"}],"replies":[{"embeddable":true,"href":"https:\/\/www.ericwhite.com\/blog\/wp-json\/wp\/v2\/comments?post=2695"}],"version-history":[{"count":4,"href":"https:\/\/www.ericwhite.com\/blog\/wp-json\/wp\/v2\/pages\/2695\/revisions"}],"predecessor-version":[{"id":2703,"href":"https:\/\/www.ericwhite.com\/blog\/wp-json\/wp\/v2\/pages\/2695\/revisions\/2703"}],"wp:attachment":[{"href":"https:\/\/www.ericwhite.com\/blog\/wp-json\/wp\/v2\/media?parent=2695"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}