Friday, February 27, 2009

KNIME - Excellent GUI for Preliminary Data Analysis

Initial exploratory stage in data analysis and mining require quite a bit of data visualization, data preparation and trying out multiple algorithms. Being a beginner in R, I find it cumbersome to do all the exploration writing code in R. While searching for a nice GUI for R and stumbled upon this piece of software called Knime.

Knime is based on the Eclipse platform. It provides a nice GUI for creating a workflow for data analysis. It allows you to chain together modules into chains and branches progressively to create complex analytics. Knime comes with some in-build modules, integrates the Weka Machine Learning modules, and integrates with R as well.

To experience Knime, I tried some analysis on the Iris data set. These are steps in which I proceeded:
  • Read the data in
  • Partition the data into a training and testing set
  • Use scatter plots, summary views and data views to visualize the data.
  • Try different methods for classification and prediction with different parameters
  • Compare results to get the best technique

The workflow looks something as below:

Knime comes with a limited set of functionality, it integrates them really well. Knime comes with R built in. It can also connect to a remote R server. Integration with R is not as exhaustive as Weka. It essentially needs you to write R scripts that it then executes on the R server. It provides predefined variable names that you can pass or retrieve from R.

I would use Knime for preliminary data analysis and visualization. Once beyond the exploratory stage I'd still go back to R scripts and do it purely on R. May be after that the R script can be plugged in to Knime for using it with larger workflows.

There is an exhaustive list of different tools in the market, both commercial and open source, here.

