Friday, February 27, 2009

KNIME - Excellent GUI for Preliminary Data Analysis

Initial exploratory stage in data analysis and mining require quite a bit of data visualization, data preparation and trying out multiple algorithms. Being a beginner in R, I find it cumbersome to do all the exploration writing code in R. While searching for a nice GUI for R and stumbled upon this piece of software called Knime.

Knime is based on the Eclipse platform. It provides a nice GUI for creating a workflow for data analysis. It allows you to chain together modules into chains and branches progressively to create complex analytics. Knime comes with some in-build modules, integrates the Weka Machine Learning modules, and integrates with R as well.

To experience Knime, I tried some analysis on the Iris data set. These are steps in which I proceeded:
  • Read the data in
  • Partition the data into a training and testing set
  • Use scatter plots, summary views and data views to visualize the data.
  • Try different methods for classification and prediction with different parameters
  • Compare results to get the best technique

The workflow looks something as below:

Knime comes with a limited set of functionality, it integrates them really well. Knime comes with R built in. It can also connect to a remote R server. Integration with R is not as exhaustive as Weka. It essentially needs you to write R scripts that it then executes on the R server. It provides predefined variable names that you can pass or retrieve from R.

I would use Knime for preliminary data analysis and visualization. Once beyond the exploratory stage I'd still go back to R scripts and do it purely on R. May be after that the R script can be plugged in to Knime for using it with larger workflows.

There is an exhaustive list of different tools in the market, both commercial and open source, here.

Related reading:

No comments: