Text Mining and NLP APIs
RxNLP’s Text Mining and NLP APIs provide access to some core and advanced text analytics functionality over the cloud. The APIs are meant for commercial as well as research use. We currently have a good mix of commercial and academic users. To use RxNLP’s APIs and the wrappers for the APIs, you will have to register for a Mashape account and then subscribe to the API plan that works for you. Below you will find a list of our current APIs and links to documentation.
HTML2Text
The HTML2Text endpoint extracts only the body text of any HTML page or extracts the body text directly from a URL.
- Test on Mashape
N-Gram and Word Counting
The N-Gram and Word Counting endpoint generates word and n-gram counts in any language. The words or n-grams and its frequency are returned in descending order of frequency.
- N-Gram Counter API Documentation
- Test on Mashape
Text Similarity
The Text Similarity endpoint computes similarity between two pieces of texts (long or short) using well known measures such as Jaccard, Dice and Cosine.
- Text Similarity API Documentation
- Simple Java Wrapper for Text Similarity
- Python Wrapper for Text Similarity
- Test on Mashape
Sentence Clustering
The Sentence Clustering endpoint clusters texts such as Tweets, Customer Support Tickets, News Articles, Surveys, User Reviews and others into logical sentence groups. You get the clusters, cluster score and cluster labels.
- Sentence Clustering API Documentation
- Test on Mashape
- Wrapper Code in Python
Topics Extraction
The Topics Extraction endpoint helps you find key topics when you have lots of text to deal with. It returns topics ranked by importance and also provides snippets containing the topics.
- Topics Extraction Documentation
- Test on Mashape
Opinosis Summarization
The Opinosis Summarization endpoint generates short summaries of opinions and is designed to work for texts such as user reviews.
- Test on Mashape
- Sample Wrapper Code in Java
- Sample Code Utilizing Opinosis Summarization API in Python
- Related Paper
