Predictive Analytics with Endeca

December 6, 2013

The opportunities and challenges to delivering real-world predictive analytics are exciting. They’re not trivial efforts, and they require a level of collaboration between business users and IT that can be rare. However, when the stars align, the forecasts they produce can be game changers. And a BI solution that doesn’t change the game for its business is arguably a waste of time.

Oracle Endeca Information Discovery is not really a predictive analytics tool. The text mining through Lexalytics provides one powerful data-mining model, but that’s the only one. Plus, it’s part of the data ingest and upstream from Studio. We’ve interfaced enough with R from Integrator to know during the ETL stage, just about any external data-mining model is effectively available. Some level of classification and association might be suggested through Studio’s data exploration, but I’d argue this produces business questions, and is a far cry from the complex algorithms that proper cluster analysis and classification trees produce. OEID is, first and foremost, a tool for data discovery.

So what good is OEID when your company wants to add predictive analytics?

There are three reasons, starting with big data.

If you crunch through some of the facts gathered and collected by Marcia Conner, you can see data volumes are absolutely enormous and continuing to grow. Data mining of structured and massaged data isn’t hard, but there is a growing gold mine of unstructured and social media data that is ripe for analysis. Collecting all this data together requires robust and capable ETL processes that can handle data sets that are text rich and constantly changing. OEID provides the Integrator Acquisition Service (IAS), the text enrichment components, and an inherent flexibility in the engine to support multi-assign attributes and ragged-width records without a long refactoring effort. Combining these delivers an ideal toolset to bring together data from multiple sources and formats.

The second consideration is the sheer effort involved in bridging the gap between business and IT. Data scientists are expected to know all the data-mining models, but like the rest of IT have no secret insights to automatically understand all the business nuances. And the data-mining tools they use, such as R, are not always intuitive interfaces for business users to pick up. OEID Studio is a visual tool. The intuitive and friendly interface makes it ideal to search, explore, and even extract data of interest. Exploring data can be used to feed into data-mining models, to create training sample sets and, of course, to explore the output they generate. As a tool for communicating and collaborating, Studio can be ideal for reducing the gap between IT and business and help ensure relevancy and focus to any data mining efforts.

The final major consideration is the growing demand for tools that deliver interactive and intuitive visualizations and simulations. At heart, I absolutely believe the rule that good data coupled with simple visualizations is best. Out of the box, OEID offers the essential histograms, line graphs, scatter plots and, of course, the ever-popular pie chart. With features like the guided navigation, breadcrumbs and the search interface, OEID can provide a tool to explore your data and any generated data-mining models. For those analysts who trust the pivot table and ad hoc queries, the alerts, results table and metrics components are easily accessible. For anything more, the Liferay framework Studio is built upon is entirely extensible to customizations, improved visualizations and even methods that can support simulations and model-output comparisons. OEID is an ideal and flexible tool for business users to interact with the output of data-mining models.

Business users aren’t going to be able to use OEID to generate predictions. They aren’t going to write EQL queries that perform regression analysis. They won’t find a magic button to perform algorithmic analysis and produce coefficients, probabilities, margins of errors and all the other statistical outputs proper data-mining models produce. They will, however, have a tool that can handle all the varieties and volumes of real-time data to feed into the data-mining exercise. They’ll have an architecture that can interface with data-mining engines to ingest the results of a model. They’ll have powerful text-mining engine in a data world increasingly made up of unstructured text. And they’ll have an interactive and visual tool that lets them actively participate in preparing, tuning and applying the fruits of predictive analytics.

The only thing missing is the question: What do you want to predict for your business?