NLP++ Version 3: Compiled Analyzers, Faster Execution, and Protected Code

For years, NLP++ and VisualText have given developers something no other NLP toolkit offers: a fully transparent, human-readable, 100% modifiable way to build text analyzers. Every rule, every dictionary entry, every knowledge base concept is right there in plain text, editable in VS Code. That glassbox philosophy is what makes NLP++ analyzers maintainable in ways that regex tangles and black-box neural models never can be.

With version 3 of the NLP Engine and VisualText, we’re adding something that, at first glance, might sound like the opposite of glassbox — but in practice opens up a whole new way of building and shipping NLP++ systems. You can now compile your analyzers and knowledge bases, and do it with a single click.

A capability that was always there — but never easy

Compiling NLP++ analyzers isn’t entirely new. Back in the legacy Windows-only days, the ability to compile an analyzer existed. But it was a tedious, complicated affair — the kind of task that required real sleuthing, a specific environment set up just so, and a lot of manual work to get right. It was available, but it was never approachable.

Then NLP++ moved to all three platforms — Windows, Linux, and Mac. That cross-platform move was a huge step forward for the language, but it came with a casualty: compiling was no longer easily available to general users at all. The mechanics of producing a compiled analyzer across three operating systems were complicated enough that, in practice, it stayed out of reach for almost everyone.

Version 3 fixes that. Compiling is now genuinely accessible — even to the non-programmer.

How it works now: one click, compiled on GitHub

The key change is how easy it is, and where the work happens. In VS Code, compiling an analyzer is now a single click. From a script or from the Python packages, it’s a single function call. Either way, you don’t assemble a build environment, you don’t wrangle a C++ toolchain — you just trigger the compile.

Behind that single click, the NLP Engine automatically generates the C++ for your analyzer locally and sends it to GitHub to be compiled — the same place the NLP Engine itself is compiled. What comes back is a C++ library that the local NLP Engine reads in directly, and because it loads as a prebuilt library, it loads extremely quickly. That’s what makes the one-click experience possible: instead of asking each user to reproduce a C++ build on their own machine across three operating systems, version 3 hands the job to the same proven build process that produces the engine. The user doesn’t need to be a programmer and doesn’t need to know how any of it is built.

Once compilation is complete, you have three ways to run your analyzer:

  • Fully interpreted. Both the analyzer sequence and the knowledge base are read at runtime. This is where you’ll usually be while you’re still writing and debugging — change a rule, rerun, see the result immediately, no build step in the way.
  • Fully compiled. Both the analyzer sequence and the knowledge base run as compiled C++ — the fastest option, and the one you reach for when an analyzer is mature and headed toward production.
  • Compiled knowledge base, interpreted analyzer. A middle path that compiles just the knowledge base while leaving the analyzer sequence interpreted. You keep iterating freely on your rules, but the knowledge base loads from a compiled library instead of being read in from source each run.

That third option is especially valuable when the knowledge base is large. Take the full English dictionary that ships with NLP++ — roughly 190,000 words. Reading a knowledge base that size from source on every run adds up fast during development. Compiling just the knowledge base lets it load near-instantly while you continue to change the analyzer sequence interpreted, giving you the iteration speed of interpreted development without paying the knowledge-base load cost over and over.

All three modes run the exact same NLP++ logic you wrote; the only difference is how it executes.

Why this matters: speed

The most immediate benefit is execution speed. Interpreting rules and walking the knowledge base at runtime carries overhead. A compiled analyzer doesn’t. For analyzers that run over large volumes of text — whether you’re processing documents in a development loop or running a deployed system at scale — compiled execution is meaningfully faster. The same NLP++ logic you already wrote, now running as native code that loads as a library and gets straight to work.

That speedup helps in two places at once. During development, faster runs over your test corpora tighten the feedback loop — and with the compiled-knowledge-base option, even a 190,000-word dictionary stops being a drag on iteration. In deployment, faster execution means lower compute cost and higher throughput for the same analyzer.

Why this matters more: protecting your code

Here’s the part that changes the business of building NLP++ analyzers.

Because the interpreter reads NLP++ source directly, deploying an interpreted analyzer has always meant shipping your NLP++ code to wherever it runs — including to a client’s environment. For open-source and internal work, that’s no problem. But for developers and companies who build analyzers as a commercial product, it meant handing over the very thing they spent their expertise building.

Compiled analyzers solve this. When you compile your analyzer, its rules and knowledge base are no longer present as readable source in the deployed artifact — they’ve become a compiled C++ library. You can ship a working, fast analyzer to a customer without exposing the NLP++ code behind it.

This flips the maintenance model in a way that benefits everyone. The developer or company that built the analyzer remains its steward — the one who upkeeps it, improves it, and rolls out new versions. Clients get a fast, reliable analyzer that does its job. And the people with the deep NLP++ expertise stay in the loop as the ongoing maintainers and improvers, rather than handing off code they can no longer support or control. It’s a sustainable arrangement: expertise stays where the expertise is.

Built across the whole toolchain

Getting here meant updating version 3 across the entire VisualText ecosystem, not just one repository. Over the last several weeks we’ve updated the NLP Engine itself, the VisualText VS Code language extension, the Python package, the platform-specific engine builds for Windows, Linux, and Mac, and the TypeScript and Python bindings. The compile-and-deploy path is now wired through the same tools you already use, on every platform NLP++ runs on.

Try it

The compile feature is available now in the VS Code extension and the engine. We’ve put together a video walking through how to use it — from compiling an analyzer to running it in compiled mode:

Watch the tutorial:

(Video link coming soon)

Version 3 keeps everything that made NLP++ and VisualText distinct — the transparency, the modifiability, the visual debugging — and adds a production path that’s faster, protects the work you put into your analyzers, and is finally easy enough for anyone to use. We’re excited to see what you build and ship with it

Loading