Full English Parser

NOTE: The latest VisualText is required for running and modifying TAIParse.  Even the compiled version needs runtime libraries bundled with VisualText. DOWNLOAD HERE.

TAIParse also performs part-of-speech tagging at 94% accuracy on a blind test business article corpus.

Natural language processing (NLP) generally refers to the complete linguistic and conceptual processing of a text. To facilitate the construction of natural language processing products, TAI is now making available some general text analysis prototypes that can be used as a starting point for a host of applications, such as information extraction, categorization, summarization, and question parsing.

TAIParse is a general analyzer that emphasizes the minimal use of knowledge (“just-in-time” knowledge) to perform part-of-speech tagging, entity extraction, and shallow parsing.  TAIParse is an excellent starting point for customizing your own text analysis capabilities. For one thing, TAIParse includes a full lexicon with part-of-speech information within its knowledge base. For another, it illustrates the latest features of the NLP++® language in action. TAIParse further illustrates the ease of implementation of NLP systems with the VisualText® IDE (SDK, tools, etc.). TAIParse includes these capabilities and more:

  • Zoning and “parsing-per-line” to characterize regions and formats in text
  • Dynamic and context-dependent part-of-speech assignment and parsing
  • Successive segmentation of text in a “divide-and-conquer” strategy
  • Treatment of unknown words
  • Noun phrase extraction
  • A semantic and discourse processing framework that ties into an ontology and dynamic representation of the analysis within the knowledge base
  • Processing INDEPENDENT of capitalization, so that, for example, all-uppercase text regions can be analyzed.
  • Robust analysis in the face of errors, misspellings, and ungrammatical text