Stop Thinking, Just Do!

Sung-Soo Kim's Blog

Smart Data Discovery


10 May 2016

Smart Data Discovery

Smart data discovery is a next-generation data discovery capability that provides insights from advanced analytics to business users or citizen data scientists without requiring them to have traditional data scientist expertise.

Examples [Market]

The convergence is already underway, with each data discovery type providing features of the others.

  • Many smart data discovery tools, such as DataRPM, run directly on Hadoop/Spark and/or can access data (some directly) in Hadoop as a data source. They feature a natural-language query interface (one element of search-based data discovery) and interactive visualization (a hallmark of visual-based data discovery).

  • Most search-based data discovery tools, such as Oracle Endeca Information Discovery, Incorta and ThoughtSpot, also offer visual-based data discovery.

  • Some smart data discovery tools, such as Ayasdi and DataRPM, leverage graph analysis to relate entities and identify important relationships.

  • Most graph-based data discovery vendors, such as Cambridge Semantics and Centrifuge Systems, offer interactive visualization to explore relationships.

  • All Hadoop-based data discovery vendors, such as Platfora and Datameer, offer visual exploration.

  • Most visual-based data discovery vendors, such as MicroStrategy Visual Insight, Tableau, Qlik and TIBCO Spotfire, access data in Hadoop, as well as other sources with roadmaps for expanded support.

  • Many visual-based data discovery vendors have natural-language query (for example, Qlik, IBM Watson Analytics and Microsoft PowerBI) and some offer automated pattern detection or smart data discovery (SAS Visual Analytics and IBM Watson Analytics) as capabilities that they plan to mature over time.

Smart Data for Smart Labs

Hadoop-Based or Big Data Discovery

Hadoop-based data discovery enables business users to explore and find insights across diverse data (such as clickstreams, social, sensor and transaction data) that is stored and managed in Hadoop. It enables users to directly query the Hadoop Distributed File System (HDFS) without the extensive modeling required by traditional SQL-based approaches, the specialized skills to generate custom MapReduce, Hive or Pig queries, or the performance penalty of querying Hadoop through Hive. Vendors like Platfora, Datameer, FICO, Oracle, MicroStrategy, Pentaho and IBM are offering ways for users to directly explore data in Hadoop. Most data discovery and BI vendors offer access to Hadoop through Hive/HBase.

Hadoop based Big Data Discovery and Analytics

Graph-Based Data Discovery

Graph-based data discovery provides a set of analytic techniques that enable business users to visually explore relationships between entities of interest, such as organizations, people and transactions. Connections may be explicit or inferred — indicating, for example, the strength of a relationship or influence, frequency of interaction, or probability of fraud. Analytics can be descriptive (aPriori, clustering and outlier detection), diagnostic (Bayesian reasoning), predictive (Markov chains, discrete-event simulation) or prescriptive (graph optimization and routing). Once a graph is generated using size, color, shape and direction to represent relationship and node attributes, the user has the ability to interact directly with the graph elements to find insights.

Augmenting Hadoop for Graph Analytics

Visual-Based Data Discovery

Visual data discovery is a BI platform architectural style that blends data from multiple sources into a proprietary in-memory store that is tightly coupled with an interactive visualization layer. It contrasts with the traditional BI platform, which relies on a more modular architecture dependent on three distinct technologies to integrate, store and present data. Visual-based data discovery is a combination of in-memory analytics and interactive visualization technologies. It includes interactive visualization technology that enables the exploration of data via the manipulation of chart images, with the color, brightness, size, shape and motion of visual objects representing aspects of the dataset being analyzed. These tools enable users to analyze data by interacting with visual representations of it. They also provide an array of visualization options that go beyond those of pie, bar and line charts, including heat and tree maps, geographic maps, scatter plots and other special-purpose visuals.

Visual Data Discovery, Self Service Analytics e Corporate BI


  • Citizen Data Scientist
  • Governed Data Discovery
  • Search-Based Data Discovery


  • Smart Data Discovery Will Enable a New Class of Citizen Data Scientist, Gartner, 29 June 2015.

comments powered by Disqus