CGB Roundtable Details
Geoffrey Fox
Robust High Performance Optimization for Clustering, Multi-
Dimensional Scaling and Mixture Models
January 22, 2008 at 12:00 PM
Myers 209
Description:http://www.infomall.org/salsa
We first review the pros and cons of various approaches to non
linear optimization in the presence of local minima, ill conditioned
matrices and ambiguous choice of appropriate number of degrees of
freedom (over and under fitting). We define constraints on
approaches from need to run well in parallel on systems of multicore
CPU's. We present a uniform approach to data clustering and Gaussian
mixture model ling that uses deterministic (not Monte Carlo)
annealing to mitigate the local minima problem and naturally relates
the appropriate number of parameters (clusters or mixture
components) to the scale at which problem is examined. New clusters
(mixtures) are introduced at phase transitions as the annealing
temperature is lowered and second derivative matrix becomes
singular. We contrast three ways of visualizing this structure in
low (2) dimensions with Principal Component Analysis PCA, Generative
Topographic Mapping GTM and Multi-Dimensional Scaling MDS using
annealing to regularize GTM and MDS.
Currently we have implemented in preliminary fashion deterministic
annealing clustering and GTM in a fashion that runs well on
multicore systems. We have applied these techniques to Geographical
Information Systems (clustering demographic data in 2D) and
Cheminformatics in 1024 and lower dimensions. We would like to
understand other applications that can constrain and test these
techniques.