Computational Developments for High-Dimensional, Latent Data Applications in Pathology Imaging and Informatics
B.A., Cornell University - 2006
Thesis Advisor: David J. Foran, Ph.D.
Graduate Program in Biomedical Engineering
Cancer Institute of NJ (CINJ)
Tuesday, August 28, 2012
Digital pathology presents a wide spectrum of new opportunities, which rely upon large-scale data analysis to advance investigative research and clinical practice. These opportunities make it possible to acquire, analyze, query, and assimilate imaged tissue samples, cytometry data, genomic data, and patient records. As the technical barriers for data acquisition and analysis are lowered, an expanding repertoire of new modalities and tests has become available to characterize the underlying pathology of certain cancers. Through advanced imaging and informatics methods, we propose it is possible to systematically detect and evaluate subtleties in histopathology data which are not visible to the human eye, yet contribute to the characterization and classification of disease onset and progression. The supporting tools and computational methods developed for this research unlock latent information that is inaccessible to traditional methods of manual analysis. In doing so, they expand the scope and quality of evidence available for the pathologist to make informed clinical decisions.
As a first test of the proposed hypothesis, we examine multispectral imaging (MSI). A multispectral camera is capable of imaging histologic slides at multiple, narrow bandwidths over the range of the color spectrum, as opposed to the three color channels available with traditional cameras. We conducted an analysis to understand those cases in which the additional spectral data provided by MSI can improve computer-aided interpretation and diagnosis. Using the transformation between the spectrum and its perceived color, we determine cases for which MSI revealed information not present in a standard, brightfield color image. Subsequent chapters support the use of MSI in certain circumstances, including an application to improve classification of breast cancer in tissue microarrays.
Complementary to the MSI studies using brightfield images, the next section introduces the application of examining gene repositioning in fluorescence confocal images. Despite the visibility of individual gene markers in the nucleus, gene repositioning is only apparent when nuclei are collectively analyzed. We present a reliable method to quantify spatial repositioning in a set of cancer-implicated genes. Repositioning was identified by labeling the gene sequences with fluorescence in situ hybridization and measuring their relative positions. A ranked-retrieval approach was developed and tested for its capacity to automate the selection of accurately delineated nuclei. Logistic regression was applied to features extracted from candidate segmented nuclei in order to rank them according to the likelihood they were accurately segmented. In man vs. machine studies, the automated method outperformed the baseline consensus between three human reviewers. Automated assessment by ranked retrieval was shown to reduce or even eliminate the time-consuming, inaccurate need to manually select nuclei.
To further demonstrate how informatics aids in uncovering latent information, we present experiments on non image-based applications. Particular focus is given to high-dimensional data, which is not only complex to visualize and understand, but also presents a high level of difficulty for many pattern recognition and machine-learning techniques. Dimensionality reduction methods seek to overcome this impediment by embedding data into a low-dimensional subspace, a process which has proven useful to several computer-aided pathology applications. We developed a divisive edge-removal method, based on betweenness centrality, to identify manifold-shorting edges in graphs. These unwanted edges improperly link regions of one or more manifolds, causing errors in the embedding. A decrease in residual embedding error was demonstrated on both synthetic and real data, including a colon cancer gene microarray dataset.
In the final application, we develop an algorithm and supporting software to cluster and visualize high-dimensional flow cytometry data. Pathologists often perform a practice called gating, in which the physician appraises cell populations using two-dimensional plots of various biomarkers. The identification and subsequent gating of meaningful clusters is time consuming, subjective, and limited by the constraints of viewing the data in two dimensions. To demonstrate how a computational approach can resolve these issues, we developed a novel clustering algorithm and associated software to facilitate hematologic immunophenotyping. We describe implementation of the clustering software as a server-side application, using a cohort of chronic lymphocytic leukemia patient samples for validation.
Return to Dissertation list
Campus | Piscataway
Campus | Stratford
Campus | About
GSBS | FAQ