Visual Pattern Detection in High-dimensional Spaces
MetadataShow full item record
Visual analytics can be seen as guided exploration of data where underlying statistical or machine learning algorithms present interesting views of high-dimensional data that users can interactively explore. This paradigm is used to investigate clusters, trends and anomalies in a wide range of data domains such as bioinformatics, network flow monitoring, social networks and more. In this thesis, we develop visual analytic tools to explore high-dimensional data based on discovering views that exhibit certain visual patterns. We start by introducing a framework, called Visual Model Based Transformations, to reveal interesting visual patterns in multivariate data. This novel framework can be used to describe and discover anomalies based on visual characterizations of high-dimensional data so leveraging visual analytics as an integral part of the process. An instantiation of the framework is the development of a set of features to characterize the shape of two-dimensional point clouds and answer the question, how can we identify anomalous scatterplots or bivariate distributions? In the process of extending this idea to identify and formally capture high-dimensional structures through a visual platform, we developed a highly accurate classification algorithm that leverages random projections and set covers in a new way. With the success of our classifier, we analyze how we can visually explore these random projections and the subspaces they describe to get insight into the data. We developed a visual analytic explorer to delve into the internals of our classifier. Taking this further, we investigate how we can use visual feature scores to discover interesting structure in higher dimensions. Leveraging non-axis-parallel random projections to find subspaces that maximize some score has not been explored and we show that it is an interesting alternative feature selection method when dealing with large multivariate data sets. We integrate our proposed score functions and search algorithm into a novel visual platform that facilitates exploration of targeted subspaces of high-dimensional data. All the tools we develop allow for exploratory data analysis of high-dimensional data when we describe target patterns of interest. We describe how our methodology can be used for anomaly detection and supervised classification tasks.