Science for Health
Gene expression data generated by high-throughput approaches, such as microarrays and next generation sequencing, play a central role in biological knowledge discovery. However, the size and complexity of these type of data make their analysis challenging. Often the aim of these experiments is to identify patterns of gene expression and to define sets of co-regulated genes. For these purposes clustering algorithms (eg, hierarchical clustering and k-means clustering) are frequently used. Although these methods have proved a powerful and efficient way to analyse gene expression data, they have limitations. One weakness is that they generally produce sharp delineations between clusters of co-expressed genes and different methods often result in very different classifications; the validity and the logic of any classification are rarely obvious or possible to investigate. A second drawback is that clustering algorithms do not reveal global patterns in the data and it is usually difficult to understand how one cluster of co-regulated genes relates to another.
To address these deficiencies we work with Chris Watkins, a computer scientist at Royal Holloway, University of London to develop easily implemented methods that allow an investigator to visualise and interact with gene expression data in an intuitive and flexible manner. We have developed a method that displays gene expression data as an interactive two-dimensional map that an investigator can explore. This method combines a non-linear dimensionality reduction method – t-statistic Stochastic Neighbor Embedding – with a novel visualisation technique that highlights genes with related expression profiles. The result is an interactive map of gene expression data in which a point on the map represents a gene and the location of each gene-point is determined by the expression profile of the genes in the dataset. This means that genes with similar expression patterns are located close together in the map.
We have found this approach to be helpful for the exploration and analysis of gene expression data. It performs better than many commonly used methods and can offer insight into underlying patterns of gene expression at both global and local scales. The method provides a way to visually and interactively identify clusters of similarly expressed genes and to understand partitioning of data generated by clustering algorithms. We aim to extend this method and develop further tools to support the analysis of gene expression data.
This work is the property of the author(s) and MRC, UK. You understand that this software may not have been tested on a system exactly the same as yours and you therefore use the software at your own risk.
© MRC National Institute for Medical Research
The Ridgeway, Mill Hill, London NW7 1AA