What is multi-objective clustering

Search for artificial 2D data to demonstrate the properties of clustering algorithms

I'm looking for records of two-dimensional data points (each data point is a vector with two values ​​(x, y)) that follow different distributions and shapes. Code to generate such data would also be helpful. I want to use them to draw / visualize the performance of some clustering algorithms. Here are some examples:


R has a lot of records, and it doesn't seem like a big deal to reproduce most of the examples you cited with just a few lines of code. The mlbench package may also be helpful, especially synthetic records that start with. Some illustrations are given below.

Further examples can be found in the cluster task view on CRAN. The fpc package, for example, has an integrated generator for "face-shaped" cluster benchmark data sets ().

Similar considerations apply to Python, where you can find interesting benchmark tests and data sets for clustering with the Scikit-Learn.

The UCI Machine Learning Repository also hosts a lot of data sets, but you'd better simulate data yourself using the language of your choice.

This benchmark for toy clusters contains various data sets in ARFF format (can easily be converted to CSV), mostly with basic truth labels. The benchmark should validate the basic desired properties of clustering algorithms. Most of the datasets come from the clustering papers like:

  • BIRCH - Zhang, Tian, ​​Raghu Ramakrishnan and Miron Livny. "BIRCH: An efficient data clustering method for very large databases." ACM SIGMOD recording. Vol. 25. No. 2. ACM, 1996.
  • Healing - Guha, Sudipto, Rajeev Rastogi and Kyuseok Shim. "CURE: An Efficient Clustering Algorithm for Large Databases." ACM SIGMOD recording. Vol. 27. No. 2. ACM, 1998.
  • Chameleon - Karypis, George, Eui-Hong Han and Vipin Kumar. "Chameleon: Hierarchical Clustering Using Dynamic Modeling." Computer 32.8 (1999): 68-65; 75.
  • The Fundamental Clustering Problem Suite - Ultsch, A .: Clustering with SOM: U * C, In Proc. Workshop on self-organizing cards, Paris, France, (2005), pp. 75-82
  • MOCK - Handl, Julia and Joshua Knowles. "An evolutionary approach to multi-objective clustering." Evolutionary Computation, IEEE Transactions on 11.1 (2007): 56-76.
  • Robust path-based spectral clustering - Chang, Hong, and Dit-Yan Yeung. "Robust Path-Based Spectral Clustering." Pattern Recognition 41.1 (2008): 191- 203.

ELKI comes with some datasets (also check the unit tests, they contain a lot more than the ones on the website in addition to the parameter settings).

It also includes a fairly flexible data generator.

Here is a customizable cluster generator. It only addresses a certain class of data sets, but can certainly be used for investigating cluster algorithms.

Here is an example of the type of clusters that can be created:

The cluster membership is saved in a text file. The code is open source under MIT license.

This Matlab script generates 2D data for clustering. Several parameters are accepted so the data generated will meet user requirements.

I can't believe no one mentioned Fisher's Iris data.

I don't think I've seen any clustering technique where the iris data Not serve as an example.

Simply enter "iris" in r to access the data.

Here is an example of a nice (and typical) iris display: http://ygc.name/2011/12/24/ml-class-7-kmeans-clustering/

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from.

By continuing, you consent to our use of cookies and other tracking technologies and affirm you're at least 16 years old or have consent from a parent or guardian.

You can read details in our Cookie policy and Privacy policy.