Patrick Schmid's Computer Science PhD defense from the Massachusetts Institute of Technology. We introduce methods for creating a large curated gene expression database geared towards data mining, and explore methods for efficiently expanding this database using active learning. Leveraging our curated expression database, we adopt a holistic approach in which we characterize phenotypes in the context of a myriad of tissues and diseases. We present scalable methods that associate expression patterns to phenotypes in order to assign phenotype labels to new expression samples and to select phenotypically meaningful gene signatures. By using a nonparametric statistical approach, we identify signatures that are more precise than those from existing approaches and accurately reveal biological processes that are hidden in case vs. control studies. We conclude the work by exploring the applicability of the heterogeneous expression database in analyzing clinical drugs for the purpose of drug repurposing.