Eric White » Topic: WordOpenXML and leading/trailing white space

WordOpenXML and leading/trailing white space – odd behavior

This topic contains 8 replies, has 3 voices, and was last updated by BobH 7 years, 10 months ago.

Viewing 9 posts - 1 through 9 (of 9 total)

Author

Posts
June 7, 2016 at 11:22 pm #3444

pchambre
Participant

Hi folks,

A co-worker and I are working on some code that handles data through WordOpenXML fragments from the Range’s WordOpenXML property. One odd piece of behavior we’ve found is that, if there is a single leading and/or trailing space, these spaces are trimmed from the XML; but if there are multiple spaces, they are preserved. The Range’s Text property shows the spaces regardless of how many or few there are. This is in Word 2013 (15.0.4823.1000).

For instance,

Selecting and exporting:
Hello World
(one leading and one trailing space)

results in this in the WordOpenXML:
<w:t>Hello World</w:t>

While selecting and exporting:
Hello World
(two leading and trailing spaces)

results in:
<w:t xml:space=”preserve”> Hello World </w:t>

Is there a reason for this behavior, and is there any method to adjust/control it? We would like to preserve any leading and trailing white space, even if it is only a single space.

Thanks,
Paul

June 10, 2016 at 5:20 pm #3447

pchambre
Participant

One thing I’ve noticing while researching this, is that the behavior seems to have changed a bit over time.

According to an article Brian Jones wrote about WordOpenXML in Word 2003, white space at formatting change boundaries used to be lost as well. https://blogs.msdn.microsoft.com/brian_jones/2005/07/18/intro-to-word-xml-part-2-simple-formatting/ When I try the same test in 2013, I do not lose the white space within the range… I only lose it when it’s a single leading or trailing space for the whole range, or when there’s a trailing CR for the whole range.

If MS changed this behavior, does that mean that the original behavior was incorrect? How about the current behavior? Seems like a bug to me… is there some good reason for it?

June 11, 2016 at 12:40 am #3448

pchambre
Participant

After a bit more investigation, it’s looking like this behavior only affects three cases in Word 2013:

Single leading space before non-whitespace at the very beginning of the range.
Single trailing space after non-whitespace at the very end of the range.
And the even more special case: a range that consists of only a single space. In this case, the w:r and w:t elements are omitted entirely.

If you have a range that includes non-text data, like images, at the beginning and end, and the text in the middle of the range includes single leading, and/or single trailing, spaces, these spaces will be preserved in the WordProcessingML output.

June 13, 2016 at 2:20 pm #3459

Eric White
Keymaster

That blog post from Brian Jones is specifically about the idea that you need to include the xml:space=”preserve” attribute for a w:t element, if the text in that w:t element contains spaces at the beginning or end of it.

This is, I believe, unrelated to the issue you are facing, which is variances in the markup that the JavaScript API returns depending on whether you select various combinations of spaces before and after the words you are interested in.

Those JavaScript APIs are a ‘black-box’ to a great extent. They return what text they return based on what the user has selected, and you have no control over what they return. It certainly does seem like fairly random behavior, but I believe that you have little or no control over this behavior.

One idea – I’m not certain if this idea will be fruitful – but worth a bit of research. It is possible that the source code for those JavaScript APIs are available on GitHub or elsewhere. Certainly it is available as minimized JavaScript, which you can retrieve because you have the URI to that JavaScript in the source for your app. It would be interesting to see if this behavior is implemented in the JavaScript API, or if this behavior is implemented at a deeper level. If it is implemented in the JavaScript API, it might be possible that you could modify that code and alter the behavior. If it is implemented in the COM libraries that the JavaScript APIs call, then there most probably is nothing that you can do to impact this behavior.

June 13, 2016 at 2:25 pm #3460

pchambre
Participant

Hi Eric,

I’m confused about the JavaScript reference here. We are using VSTO, C#, and the COM Interop Word automation interfaces to access the WordOpenXML property of a Range. We’re not (directly) using any JavaScript. Are you saying that, internally, the Interop components implement JavaScript as a wrapper around the COM components?

Thanks,
Paul

June 13, 2016 at 2:39 pm #3461

Eric White
Keymaster

Hi Paul,

I’m sorry – I thought you were referring to the new JavaScript APIs in the new Apps for Office approach to extending Microsoft Office. My mistake. (For what it’s worth, I am a big fan of this new way to extend Office – vastly superior over extending using VSTO. Deployment is a breeze compared to VSTO.)

So, sorry, forget what I said.

I think that what you are seeing is the behavior of the actual COM libraries. You are using the API in the COM libs that enable you to get the markup for the current selection, and that API returns what it returns, and you have no control over it. Altering the behavior of the COM libraries is not possible.

VSTO is a (mostly) thin managed wrapper over the COM libraries, so you are still at the mercy of the COM libraries when using VSTO.

June 13, 2016 at 2:52 pm #3463

pchambre
Participant

OK. Thanks. Sounds like it is a bug, then. I may open a PSS case to push on this.

Cheers,
Paul

June 13, 2016 at 10:43 pm #3474

pchambre
Participant

PSS case opened… I’ll update here with the results if/when…

July 12, 2016 at 9:55 pm #3563

BobH
Participant

I have found the option that affects the output for this issue.

Application.Options.SmartCutPaste = false;

Once this option is turned off the selection does not snap to the whole word when there is a single beginning or ending space. The space is now preserved in the OpenXML output.

<w:t xml:space=”preserve”> Hello </w:t>

I believe this option changes the range object before certain actions are performed and may be the cause of other issues I have witnessed.

Thanks for your help,
Bob
Author

Posts

Viewing 9 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic.

Eric White's Blog

WordOpenXML and leading/trailing white space – odd behavior

Forums

Developer Content

User

Blog TOC

Archives

Categories

Search