The main features of the Newistic platform
The Newistic platform has a rich set of features which help customers to gather insights about the online coverage of certain drugs, terms, topics or companies.Monitoring coverage : We gather news articles, forum posts, Twitter updates, blog posts, and comments from thousands of sources. We also sinamically increase the number of sources based on the queries used by our users.
Sentiment Analysis: The platform has preliminary support for sentiment analysis. Newistic tags each article or post with a positive, neutral, or negative sentiment, depending on the kinds of words used in the content. Users can then search for or generate cumulative graphs from positive or negative articles that mention certain drugs, organizations, people, or keywords.
Named Entities: We extract the names of people, organizations, and geographical locations mentioned in each article, post or comment. Newistic users can then search for mentions of specific persons, organizations, or locations.
Geographical Names Disambiguation: There are at least 50 geographical locations named "Easton" in the world, so when "Easton" is mentioned in an article we do not know which of those 50 distinct places the text talks about (we say that "Easton" is ambiguous). Newistic has an algorithm which tries to determine the exact locations of the articles mentioned in texts in an unambiguous way, by looking at other clues in the text. Moreover, if we determine that the "Easton" in the text is actually the Easton in Pennsylvania, USA, we also store that Easton is in Pennsylvania, Pennsylvania is in the US, USA is North America, etc. So the search results for articles that happen in Pennsylvania will show the article about Easton even if Pennsylvania is not mentioned in the text.
Text clustering: The system is able to group articles which talk about the same event. We can then show the most important stories (and how many news sources cover each story) for each of the general categories. Also, for any article we can show a list of related articles.
Continous monitoring: Newistic checks all its news sources for new articles every few minutes. The actual rate of checks for each source is variable and determined on the fly by the rate at which the news source is publishing new content. New articles are available in maximum five minutes after Newistic detects them.
Information extraction: Our system is able to extract news articles from sources with or without RSS feeds. Our extraction algorithms are able to extract the title, text and images of each article directly from HTML pages, regardless of their design, with zero or minimal training. This allows us to collect and index complete articles, which would not be possible by looking only at RSS feeds that usually contain one or two sentences per article. We are also able to collect articles from news sources which do not have RSS feeds at all or their feeds are invalid (which is a surprisingly large percentage of the sources we cover).
Trend monitoring: We store all collected information for several weeks/months, so our users can user our database to detect trends.
Powerful API: Newistic can be accessed by third party applications through an API. The API allows for complex searches that combine any of the above features. For example, it would be easy to search for articles related to halthcare, which happen in Vienna, mention a specific company, and were published in the last five hours. The API also has several functions which make it easy to draw complex graphs from searches. Custom interfaces which serve many purposes can be built on top of this core.
Click here to read about the types of services provided by Newistic.
