NOTE: The latest VisualText is required for running and modifying TAIParse. Even the compiled version needs runtime libraries bundled with VisualText. DOWNLOAD HERE.
TAIParse also performs part-of-speech tagging at 94% accuracy on a blind test business article corpus.
Natural language processing (NLP) generally refers to the complete linguistic and conceptual processing of a text. To facilitate the construction of natural language processing products, TAI is now making available some general text analysis prototypes that can be used as a starting point for a host of applications, such as information extraction, categorization, summarization, and question parsing.
TAIParse is a general analyzer that emphasizes the minimal use of knowledge (“just-in-time” knowledge) to perform part-of-speech tagging, entity extraction, and shallow parsing. TAIParse is an excellent starting point for customizing your own text analysis capabilities. For one thing, TAIParse includes a full lexicon with part-of-speech information within its knowledge base. For another, it illustrates the latest features of the NLP++® language in action. TAIParse further illustrates the ease of implementation of NLP systems with the VisualText® IDE (SDK, tools, etc.). TAIParse includes these capabilities and more:
- Zoning and “parsing-per-line” to characterize regions and formats in text
- Dynamic and context-dependent part-of-speech assignment and parsing
- Successive segmentation of text in a “divide-and-conquer” strategy
- Treatment of unknown words
- Noun phrase extraction
- A semantic and discourse processing framework that ties into an ontology and dynamic representation of the analysis within the knowledge base
- Processing INDEPENDENT of capitalization, so that, for example, all-uppercase text regions can be analyzed.
- Robust analysis in the face of errors, misspellings, and ungrammatical text