Search and Replace Content in DOCX, PPTX using Regular Expressions

Return to the
OpenXmlRegex
Developer Center
OpenXmlRegex is a class in PowerTools for Open XML that enables you to search and optionally replace content in DOCX and PPTX using regular expressions.  The following screen-cast demonstrates the OpenXmlRegex class, and explains some of the more interesting semantics of it.

To use the OpenXmlRegex class, you first query for a collection of paragraphs, and then pass that collection into the various methods.  This enables you to select a subset of a document and perform search or replace operations just in that subset.  Further, it enables you to perform search and replace operations on parts other than the main document part; you can query for paragraphs in a header or footer and perform search or replace operations in those parts.

There are not separate methods for operation on paragraphs in PresentationML.  You can pass in either a collection of WordprocessingML paragraphs, or a collection of PresentationML paragraphs.  Behavior is identical for both XML vocabularies, with the exception that you cannot introduce revision tracking markup into PresentationML, as it does not support revision tracking.  The various methods in OpenXmlRegex detect whether you have passed in WordprocessingML markup or PresentationML markup, and then take action accordingly.

To get OpenXmlRegex, go to the Downloads Tab at powertools.codeplex.com, and download version 2.7.04 or later.

There are four external methods in the OpenXmlRegex class:

OpenXmlRegex.Match Method (IEnumerable<XElement>, Regex)

Counts the number of times that the regular expression matches text in the specified content.

Syntax

public static int Match(
    IEnumerable <XElement> content,
    Regex regex
)

Parameters

content

Type: IEnumerable<XElement>
The content to search.

regex

Type: System.Text.RegularExpressions.Regex
The regular expression to match.

Return Value

Type: int
The number of matches found in the content.

Remarks

If this method returns 0, then no matches were found.  If this method returns a value greater than zero, then matches were found.

Example

content = xDoc.Descendants(W.p).Take(1);
regex = new Regex("Video");
count = OpenXmlRegex.Match(content, regex);
Console.WriteLine("Example #1 Count: {0}", count);

OpenXmlRegex.Match Method (IEnumerable<XElement>, Regex, Action<XElement, Match>)

Counts the number of times that the regular expression matches text in the specified content, calling the specified callback for each instance of matched text.

Syntax

public static int Match(
    IEnumerable <XElement> content,
    Regex regex,
    Action<XElement, Match> found
)

Parameters

content

Type: IEnumerable<XElement>
The content to search.

regex

Type: System.Text.RegularExpressions.Regex
The regular expression to match.

found

Type: Action<XElement, Match>
The callback to call with each match.

Return Value

Type: int
The number of matches found in the content.

Remarks

If this method returns 0, then no matches were found.  If this method returns a value greater than zero, then matches were found.

Typically, you write the found callback using a lambda expression.  In the lambda expression, you can write code to inspect each match.

Example

content = xDoc.Descendants(W.p).Take(1);
regex = new Regex("video", RegexOptions.IgnoreCase);
count = OpenXmlRegex.Match(content, regex, (element, match) =>
    Console.WriteLine("Example #3 Found value: >{0}<", match.Value));

OpenXmlRegex.Replace Method (IEnumerable<XElement>, Regex, string, Func<XElement, Match, bool>)

Replaces matched text in the specified content, calling the specified callback for each instance of matched text.  If the callback returns true for matched text, then the method replaces the matched text.  If the callback returns false, then the method does not replace the matched text.

Syntax

public static int Replace(
    IEnumerable <XElement> content,
    Regex regex,
    string replacement,
    Func<XElement, Match, bool> doReplacement
)

Parameters

content

Type: IEnumerable<XElement>
The content to search.

regex

Type: System.Text.RegularExpressions.Regex
The regular expression to match.

replacement

Type: string
The text that will replace the matched content.

found

Type: Action<XElement, Match>
The callback to call with each match.  If you pass null for this argument then the method replaces all occurrences of matched text. 

Return Value

Type: int
The number of replacements in the content.

Remarks

Typically, you write the found callback using a lambda expression.  In the lambda expression, you can write code to inspect each match.

Example

content = xDoc.Descendants(W.p).Skip(1).Take(1);
regex = new Regex("^Video provides");
count = OpenXmlRegex.Replace(content, regex, "Audio gives", null);
Console.WriteLine("Example #4 Replaced: {0}", count);

OpenXmlRegex.Replace Method (IEnumerable<XElement>, Regex, string, Func<XElement, Match, bool>, bool, string)

Replaces matched text in the specified content, calling the specified callback for each instance of matched text.  If the callback returns true for matched text, then the method replaces the matched text.  If the callback returns false, then the method does not replace the matched text.

If you pass true for the trackRevisions argument, then this method introduces tracked revisions for all replacements.  In WordprocessingML, each tracked revision contains the name of the author who made the change.  For tracked revisions that are created by this method, the author of the tracked revisions is set to the value of the author argument.

Syntax

public static int Replace(
    IEnumerable <XElement> content,
    Regex regex,
    string replacement,
    Func<XElement, Match, bool> doReplacement,
    bool trackRevisions,
    string author
)

Parameters

content

Type: IEnumerable<XElement>
The content to search.

regex

Type: System.Text.RegularExpressions.Regex
The regular expression to match.

replacement

Type: string
The text that will replace the matched content.

found

Type: Action<XElement, Match>
The callback to call with each match.  If you pass null for this argument then the method replaces all occurrences of matched text. 

trackRevisions

Type: bool
If true, then this method introduces tracked revisions when replacing content.

author

Type: string
The author of the tracked revisions.

Return Value

Type: int
The number of replacements in the content.

Remarks

Typically, you write the found callback using a lambda expression.  In the lambda expression, you can write code to inspect each match.

Example

content = xDoc.Descendants(W.p).Skip(13).Take(1);
regex = new Regex("Video provides ");
count = OpenXmlRegex.Replace(content, regex, "Audio gives ", null, true, "John Doe");
Console.WriteLine("Example #16 Replaced: {0}", count);