Murtagh, Fionn, Contreras Albornoz, Pedro and Downs, Geoff (2008) Hierarchical clustering of massive, high dimensional data sets by exploiting ultrametric embedding. SIAM Journal on Scientific Computing, 30
Full text access: Open
Coding of data, usually upstream of data analysis, has crucial impli- cations for the data analysis results. By modifying the data coding – through use of less than full precision in data values – we can aid appre- ciably the effectiveness and efficiency of the hierarchical clustering. In our first application, this is used to lessen the quantity of data to be hierar- chically clustered. The approach is a hybrid one, based on hashing and on the Ward minimum variance agglomerative criterion. In our second appli- cation, we derive a hierarchical clustering from relationships between sets of observations, rather than the traditional use of relationships between the observations themselves. This second application uses embedding in a Baire space, or longest common prefix ultrametric space. We compare this second approach, which is of O(n log n) complexity, to k-means.
This is a Submitted version This version's date is: 2008 This item is not peer reviewed
https://repository.royalholloway.ac.uk/items/198b0f36-fedd-580e-8f20-c478b1f9fe64/6/
Deposited by Research Information System (atira) on 22-Jul-2014 in Royal Holloway Research Online.Last modified on 22-Jul-2014