Archive for Uncategorized

Announcing OpenXmlSdkTs — The Open XML SDK for TypeScript

If you work with Office documents programmatically, you probably know the .NET Open-Xml-Sdk. It’s been the gold standard for reading, writing, and manipulating .docx, .xlsx, and .pptx files at the XML level. But if you’re working in TypeScript or JavaScript — building Office add-ins, server-side document processing in Node.js, or browser-based document tools — you’ve been on your own.

OpenXmlSdkTs brings the same programming model to TypeScript.

Why Build This?

For anyone building Word, Excel, or PowerPoint JavaScript/TypeScript add-ins, direct Open XML manipulation is often the most reliable approach. But the JavaScript ecosystem has lacked a library that provides the kind of structured, typed access to Open XML packages that .NET developers are accustomed to.

I built OpenXmlSdkTs to solve this problem. It provides the same class hierarchy, the same part navigation model, and the same LINQ to XML querying that makes the .NET SDK so productive — all in TypeScript, running in Node.js or the browser.

Note that when programming with the Open-Xml-Sdk in dotnet, I invariably use LINQ to XML, not the various classes (Paragraph, Body, etc.) that represent the markup in the Open-Xml-Sdk. I don’t care for those classes, and instead prefer to use LINQ to XML for working with the markup. Maybe one day I will detail everything that I find wrong with those classes, but that is not the purpose of this post. In any case, this TypeScript library includes only the functionality for working with markup using LINQ to XML (LtXmlTs in this case).

What You Get

Full document format support. Work with Word (.docx), Excel (.xlsx), and PowerPoint (.pptx) files.

OpenXmlPackage
├── WmlPackage        Word documents
├── SmlPackage        Excel spreadsheets
└── PmlPackage        PowerPoint presentations

OpenXmlPart
├── WmlPart           Word parts (document, styles, headers, footers, etc.)
├── SmlPart           Excel parts (workbook, worksheets, charts, etc.)
└── PmlPart           PowerPoint parts (presentation, slides, masters, etc.)

Three I/O modes. Open and save documents as binary blobs (the standard ZIP-based format), Flat OPC XML strings (required for Office JavaScript/TypeScript add-ins), or Base64 strings. The open() method auto-detects the format.

Pre-initialized namespace and element names. Static classes like W, S, P, and A provide pre-initialized XName and XNamespace objects for every element and attribute in the Open XML specification. Because these objects are atomized — two objects with the same namespace and local name are the same object — equality checks use identity comparison (===), giving excellent query performance. No more copy-pasting long namespace URIs.

Built on LINQ to XML for TypeScript. OpenXmlSdkTs is powered by ltxmlts, my TypeScript port of .NET’s LINQ to XML (which I wrote about in a previous post). You get the full API — elements(), descendants(), attributes(), functional construction — all the things that make XML manipulation so much more pleasant than working with DOMDocument.

What the Code Looks Like

Here’s a concrete example. This loads a Word document, navigates to the comments part to read comment data, adds a new bold paragraph, and saves the result:

import { WmlPackage, W, XElement } from "openxmlsdkts";
import * as fs from "fs";

const buffer = fs.readFileSync("MyDocument.docx");
const doc = await WmlPackage.open(new Blob([buffer]));

// Navigate to the main document part and get its XML
const mainPart = await doc.mainDocumentPart();
const xDoc = await mainPart!.getXDocument();

// Access comments using typed navigation
const commentsPart = await mainPart!.wordprocessingCommentsPart();
if (commentsPart) {
  const commentsXDoc = await commentsPart.getXDocument();
  for (const comment of commentsXDoc.root!.elements(W.comment)) {
    const author = comment.attribute(W.author)?.value ?? "(unknown)";
    const text = Array.from(comment.descendants(W.t))
      .map(t => t.value).join("");
    console.log(`Comment by ${author}: "${text}"`);
  }
}

// Add a bold paragraph using functional construction
const body = xDoc.root!.element(W.body);
const sectPr = body!.element(W.sectPr);
sectPr!.addBeforeSelf(
  new XElement(W.p,
    new XElement(W.r,
      new XElement(W.rPr, new XElement(W.b)),
      new XElement(W.t, "Added by OpenXmlSdkTs.")
    )
  )
);

// Save changes back
mainPart!.putXDocument(xDoc);
const savedBlob = await doc.saveToBlobAsync();
fs.writeFileSync("Modified.docx", Buffer.from(await savedBlob.arrayBuffer()));

If you’ve used the .NET Open XML SDK, this should feel immediately familiar. The pre-atomized names (W.p, W.r, W.t, W.b) are the same element names you already know, and the nesting hierarchy in the functional construction mirrors the XML being produced.

Office Add-ins: First-Class Support

One of the key motivations for this library is building Office JavaScript/TypeScript add-ins. For anyone targeting macOS or Office on the web, JavaScript add-ins are the only option. And if you need to manipulate the document at the Open XML level, you need to work with Flat OPC — a single-file XML representation of the entire package.

Lightweight and Focused

The library has three runtime dependencies: jszip for ZIP compression and ltxmlts for LINQ to XML, and sax. It works in Node.js 18+ and modern browsers.

How Claude Helped

Like ltxmlts, I built OpenXmlSdkTs using Claude Code as my primary development tool. The approach was the same: targeted, methodical prompts — one class, one area of functionality at a time — with unit tests written and reviewed at each step. Claude handled the heavy lifting of translating patterns I knew from the .NET side into idiomatic TypeScript. But every design decision, every API surface choice, and every test case came from my experience with working with Open XML. This was intense, focused engineering using Claude as a powerful tool, not a magic wand.

MIT Licensed

OpenXmlSdkTs is released under the MIT License — the same license used by the C#/.NET Open-Xml-Sdk. Free for commercial and open-source use.

Get Started

Install from npm:

npm install openxmlsdkts

Both openxmlsdkts and its companion library ltxmlts are available on npmjs.

Full documentation is available in the GitHub repository, including an overview, per-class API reference docs, and runnable examples covering binary, Flat OPC, and Base64 round-tripping.

If you’re building Office add-ins, processing documents on the server, or doing anything with Open XML in TypeScript — give it a try.

Comments

LINQ to XML for TypeScript: Why I Built It, and How Claude Helped

XML transformation in JavaScript has meant wrestling with DOMDocument. There is a better way — and it now exists for TypeScript.

I am pleased to announce LtXmlTs — a TypeScript implementation of LINQ to XML, available now on GitHub at github.com/EricWhiteDev/LtXmlTs under the MIT license.

Here is why I built it, and why it took the form it did.

The case for JavaScript Word add-ins

For anyone targeting macOS or Word Online, JavaScript add-ins are the only option. And unlike VSTO add-ins — which require cumbersome deployment cycles — a JavaScript add-in can be updated on the fly. For certain use cases, there is no other path.

Very often, the only way to implement real functionality in such an add-in is through direct manipulation of Open XML markup. Some developers sidestep this by farming the work out to a .NET or Python service. I think there are cleaner approaches — but they require doing XML transformation in TypeScript. And that means confronting the available tools.

The problem with DOMDocument

DOMDocument works. But revisiting it months later involves matching variable names scattered across the page and deducing intent from imperative steps. LINQ to XML is better because the structure of your code directly mirrors the XML you are producing — the nesting is the documentation. Recursive pure functional transformations are also the only clean way to implement certain classes of Open XML functionality, short of XSLT (which I dislike) or sprawling DOMDocument code.

CriterionDOMDocumentLINQ to XML
Code readabilityVariable names must be traced up and down the pageNesting hierarchy mirrors the XML being produced
Functional constructionImperative; mutable nodes built step by stepFunctional construction is declarative and easy to reason about
Recursive transformsDifficult to express cleanly; code sprawlsPure functional transformations are a first-class pattern
Long-term maintenanceIntent is buried; requires active reconstructionOriginal intent is clear and on the surface

A look at the difference

Consider a paragraph with formatted text — something that appears constantly in Open XML documents:

const paragraph = new XElement(W.p,
  new XElement(W.pPr,
    new XElement(W.pStyle,
      new XAttribute(W.val, "Heading1")
    )
  ),
  new XElement(W.r,
    new XElement(W.rPr,
      new XElement(W.b)
    ),
    new XElement(W.t,
      new XAttribute(XML.space, "preserve"),
      "Hello, Open XML"
    )
  )
);

The indentation is the documentation. No variable names to trace, no appendChild calls to sequence. The code is the structure.

Why I was the right person to build this

Years ago, as a Microsoft employee, I worked on the LINQ to XML team. I was required to know every detail of the library’s semantics — every edge case, every design decision. I later built a JavaScript implementation that shipped as part of the Open-Xml-Sdk-JavaScript and used it for years. But JavaScript is not TypeScript, and other TypeScript implementations I evaluated left me uncertain about their semantic fidelity. So I built my own.

How Claude Code made this possible

I used Claude Code throughout, but the process was not casual “vibing.” It was over one hundred targeted prompts — one class, one area of functionality at a time — with unit tests written and reviewed at each step. Intense, focused engineering using Claude as a powerful tool, not a magic wand.

Vibing? Not so much. Intense, focused engineering using Claude as a powerful tool — that is more accurate.

My knowledge of the LINQ to XML semantics was essential for verification. What Claude provided was velocity: implementations fleshed out rapidly, comprehensive test suites generated, iteration without writing every character by hand. The result was faster than manual development — and the code quality was higher.

The next phase is generating the Open XML manipulation code that uses this library. The engineering approach will be the same: precise, AI-assisted, and thoroughly reviewed. It has been a lot of fun.

There is documentation in the docs directory of the repo.

Comments

Looking for Contract Work or Full-Time Employee Work

Hi Friends,

After more than six years at Docugami, it is time to move on to the next adventure. I’m looking for either contract work, or work as a full-time employee. While I have lots of experience in many technologies, I would love to continue working in my main area of expertise, specifically Microsoft Office and Open XML.

Please connect with me on LinkedIn/ericwhitedev.

From my profile on LinkedIn:

.NET developer, technical writer, blogger. Open XML, Open XML SDK, JavaScript/TypeScript, Office client development, SharePoint 2010 development. Experience with many technologies, including ASP.NET, HTML, CSS, JavaScript, XML, XSLT, XSD, and document transformations.

Specialties: Open XML, C#, .NET, JavaScript/TypeScript, Office development, SharePoint development, technical writing

Over the many years, I’ve written books, magazine articles, blog posts, and recorded more than 200 screen-casts.

This blog Table-of-Contents contains links to much of what I’ve written and recorded over the years: https://www.ericwhite.com/blog/toc-expanded
Videos: https://www.youtube.com/user/OpenXML/videos

http://blogs.msdn.com/ericwhite
http://ericwhite.com/blog

Looking forward to connecting.

Eric

Comments

Docugami Lights Up Dark Data Trapped in Unstructured Documents

Hi Friends,

It is great fun to join a startup, doing new and incredibly innovative things with documents, but one downside is that there have been limitations on what I can publicly disclose, of course. But now the wrappers are starting to come off. The folks at SignalFire, one of the investors in Docugami have written a super interesting blog post about what we’re doing.

My work at Docugami runs the gamut from some fairly standard DOCX transformations to building parts of a super-cool Word add-in (JavaScript / TypeScript), but the most fun aspects of my job are tailing (virtually) the scientists who have built and are enhancing world-class AI for document analysis. Frankly, I’m having more fun than I have ever had in my career.

And by the way, we still have an open software engineer position and an open ML/AI scientist position. Seize the day!

-Eric

Comments

Docugami: The Document Engineering Company

Hi Friends,

After many years being an independent contractor for a variety of companies and organizations, I have taken a job as a developer at a company named Docugami! I’ve been an employee for a couple of months now. 😊

The focus of this company is a “multi-faceted AI solution that combines innovative Deep Learning, NLP, Bayesian, Evolutionary, and other AI techniques.” It is super fascinating to see the amazing technology that this team has put together.

The company recently raised $10M in a seed round, an unusually large early stage deal. Also interesting: Grammarly is an investor.

The fine-tuning of our product is proceeding apace toward a public rollout later this year. If you want to get a glimpse of the future and see what AI can do when applied to business agreements and contracts, you can get early access from the Web site.

The absolute best thing about this job is the amazing team of people involved. The business, engineering, marketing, and public relations leaders are top-notch. The scientists who work on the AI and Machine Learning are amazing. Docugami is the complete package. I am humbled and honored to be part of this team.

You can meet some of my amazing colleagues on the Docugami LinkedIn page

There are still open developer positions, an open Software Engineering Manager position, as well as a Machine Learning / Data Scientist. This is an amazing team, and good to work with. If you are top-notch (you obviously are, since you are here 😊), go to the careers page at docugami.com.

The tagline for this company is “The Document Engineering Company.”

This is perfect for me!

Cheers, Eric

Comments (1)

Welcome, Open-Xml-PowerTools Users

Welcome, Open-Xml-PowerTools users. As you may have learned from the repo at OfficeDev/Open-Xml-PowerTools, Microsoft is going to archive that repo, and will not be maintaining that repo in the future. The good folks at
Microsoft asked if I would maintain this repo going forward, and I said “yes, absolutely”.

At this moment, the github.com/EricWhiteDev/Open-Xml-PowerTools repo is an exact clone of the github/OfficeDev/Open-Xml-PowerTools repo, but that is going to change in the near future.

I plan to:

  • Merge vNext into Master
  • Make significant enhancements to vNext

For those of you awaiting the enhancements to DocumentAssembler, I am nearly done with them, and expect to post an updated, refactored version shortly.

Unfortunately, this development came right as I am starting a 10-day vacation, so will need to balance making the necessary enhancements with my other vacation activities! But stay tuned, I expect to make progress in the next couple of days.

Warm Regards,

Eric White

Comments (9)

Building Open-Xml-PowerTools when pulling the Open-Xml-Sdk via NuGet

Recently, there was an update to the vNext branch of Open-Xml-PowerTools on GitHub (https://github.com/OfficeDev/Open-Xml-PowerTools) that enabled pulling the Open-Xml-Sdk via NuGet.

This reduces steps necessary to build the PowerTools, and make a step in the right direction, which is to use the NuGet package manager to simplify use of libraries. The following screen-cast shows the precise steps necessary in order to build the Open-Xml-PowerTools as they currently exist on GitHub.

Comments (1)

Introducing WmlComparer, a Module in Open-Xml-PowerTools

WmlComparer compares two Open XML documents, producing a new document that contains the precise differences in revision tracking markup. You can then iterate through those differences.

This module contains my latest thinking on how to process large documents character-by-character. The module converts the DOCX into a new form – an array – where every item in the array is a single piece of content, whether it be a character in a paragraph, the paragraph mark itself, an image, and so on. This thinking will go into the next version of OpenXmlRegex, so that it will be possible to include new lines in the search string, as well as replace matched strings with content that contains new lines.

This module is extremely useful when writing XUnit tests for code that processes Open XML markup – we can test that the code produced the exact document that we intended.

Comments (5)

Open-Xml-PowerTools and JavaScript

One issue that many Open XML developers face when doing Open XML development in JavaScript is the need for many of the important modules available in Open-Xml-PowerTools.

To date, almost none of the Open-Xml-PowerTools have been converted to JavaScript (in a publicly releasable form).

However, I designed the Open-Xml-Sdk for JavaScript with the specific intention to enable manual conversion of Open-Xml-PowerTools code from C# to JavaScript. The Ltxmljs library is semantically compatible with the .NET LINQ to XML library. The Open-Xml-Sdk for JavaScript was designed with the idea that the common idioms that we use in the Open-Xml-Sdk can be converted to JavaScript with ease.

Proof-in-the-pudding is that I converted thousands of lines of code (of a preliminary version of the WmlToHtmlConverter) to JavaScript. This included the FormattingAssembler module, the ListItemRetriever module, the HtmlConverter module, and more. It worked very nicely. However, because this was based on a half-baked WmlToHtmlConverter, I decided that I wanted to first polish those modules as written in C#, and then at some future point in time convert the finished modules to JavaScript. I fully expect to convert these modules to JavaScript (but it is not going to happen in the near future, as my current projects prohibit an effort of this magnitude).

Key points:

  • It is doable.
  • It is non-trivial. You should be expert in C#, HTML, CSS, JavaScript, LINQ to XML, functional programming, and Open XML markup. I don’t want to minimize what you must know in order to tackle conversion of one of these modules to JavaScript.

But it is doable. I have a goal to do this, but it must fit in with other priorities in my life, like keeping my kid and wife with groceries. 🙂

Cheers, Eric

Comments (16)

Welcome to the new EricWhite.com

Hi friends,

With the recent decision to shutter OpenXmlDeveloper.org, I will be focusing my efforts to support Open XML developers here at EricWhite.com.

Forums: I have set up forums on this web site, so that we have a convenient place to discuss Open XML specific questions.  I have set up forum categories based on my experience with OpenXmlDeveloper.org – there are fewer categories, which should make it easier to pick the appropriate forum for your questions. Feel free to suggest new forum categories. Forums link are in the sidebar. You will need to have an account here to post in the forums. Please feel free to create one. At the top of each forum, you can subscribe, which will send email notifying you of activity.

Content: I have moved much of the content from OpenXmlDeveloper.org to this web site. Over the next month or two, I’ll be moving the rest of the interesting content here.  In the process of moving this content over, I’ll be reviewing and updating it, making sure that it recommends current best practices for Open XML development.  In addition to moving the content over, I have created ‘Developer Centers’ around specific topics such as Open-Xml-PowerTools, DocumentAssembler, DocumentBuilder, PresentationBuilder, WmlToHtmlConverter, and HtmlToWmlConverter.

Blogging: I will be regularly blogging about my Open XML activities here.

Twitter: Follow me (@EricWhiteDev) on twitter to get the latest news about Open XML development and tools.

LinkedIn: Connect, especially if you are an Open XML developer.

I will continue to enhance and support Open-Xml-PowerTools – it is a ‘live project’, with interesting enhancements planned for the near future.  I consider that Open-Xml-PowerTools is perhaps 25% complete – there is an awful lot that we can do to make server-side Open XML document generation and processing easier.  My plans are firming up for the next round of enhancements – stay tuned – I’ll keep you posted.

We have done nowhere near enough with the Open-Xml-Sdk-for-JavaScript.  There are great advantages to processing Open XML with JavaScript – cross-platform – process in browser – process with NodeJs.  And should take advantage of TypeScript, I think.

I love document formats.  I love the capacity to create and process rich content that we have gained due to the standardization of the Open XML format.  We have made great progress over the years, and I am looking forward to continuing to advance the state-of-the-art of document generation and processing.

As an aside, one thing that I am going to enjoy a lot is moving off of the old version of Telligent Community Server that we were using for OpenXmlDeveloper.org – what a pain.  EricWhite.com is a WordPress site, updated to the latest version. It is a pleasure to move into a modern, blogging / content management platform.  The forums here are implemented using bbPress, which gives us a flexible, easy to use platform.

Cheers, Eric

Comments (25)

« Previous entries Next Page » Next Page »