The microarray data analysis system (MDAS) provides sophisticated microarray data analysis algorithms to the NIH intramural research community. Researchers who analyze gene-expression data typically perform some common operations to their data such as normalization, clustering , and classification. In addition to these standard features, MDAS provides advanced features such as gene selection, automated annotation lookup, and microarray image simulation. HPCIO collaborates with users of MDAS and incorporates their prototype algorithms into the MDAS system, which is flexible enough to accommodate new algorithms written in a variety of languages. In addition, algorithms that take advantage of HPCIO’s high-performance computing resources are a unique aspect of MDAS. Such support for high-performance computing is not available in commercially available packages for gene expression analysis
The Microarray Data Analysis System offers a simple user interface to a powerful set of data analysis tools. Users can quickly navigate through set of available analyses and results. The format of the results depends on which algorithm is run, but includes two- and three-dimensional plots, dendrograms, and heat plots. Some algorithms also produce a movie of a three-dimensional plot being rotated showing the plot from different perspectives.
The Microarray Data Analysis System offers a simple user interface to a powerful set of data analysis tools. Users can quickly navigate through sets of available analyses and results. The format of the results depends on which algorithm is run, and includes two- and three-dimensional plots, dendrograms, and heat plots.
In addition to searching for clusters, the hierarchical clustering feature sorts the data, bringing genes and experiments with similar profiles together. In this figure, data was simulated as two classes of experiments, each having a Gaussian distribution with constant and equal standard deviations for each gene in the experiment. There is no simulated correlation between the genes. The experiment name includes the simulated experiment class. This figure shows the data in the order of the gene and experiments before clustering.
This figure shows the data ordered by hierarchical clustering with both the gene and experiment dendrograms present. The “red” cluster contains experiments simulated from class 1, and the ”blue“ cluster contains experiments clustered from class 2. As visible in the figure, Exp. 20-2 is placed into the wrong cluster due to its proximity to Exp. 5-1.
Visualization tools such as multidimensional scaling (MDS) reduce the dimensionality of data to two or three. The dataset is the same as described above. The mpeg movie shows the three-dimensional plot from different angles giving different perspectives on the data.
In FY 2006, HPCIO addressed concerns our collaborators had raised concerning the possibility of opening the MDAS system to use by the entire NIH intramural research program. We have addressed three issues that were raised. The first issue was that when users went to MDAS, the first page they saw was the NIH single-sign-on page. Immediately redirecting users away from MDAS may cause confusion. We modified our security model and created an unprotected welcome page. Now when someone visits MDAS, they see a welcome page with a login button that redirects them to the NIH single-sign-on. The second issue raised was providing help to users, who were not always familiar with the algorithm parameters. MDAS now provides tool-tips on algorithm input parameters. When a user moves the mouse over an input field, a tool-tip box briefly describes the meaning of the input parameter. The third type of improvements involved correcting error messages that were produced by some of the classification algorithm modules.
We have been approached by Dr. Javed Khan who requested that a new algorithm, and possibly a new interface, be added to MDAS. Dr. Khan, a senior investigator from NCI, requested a collaborative effort between HPCIO and his lab to incorporate his artificial neural network in MDAS.
Current and Future Work
In FY 2007, we anticipate opening up MDAS to the entire NIH intramural research community. Having made a number of improvements to the site, we believe that MDAS is now ready for NIH-wide use. We plan to add at least two algorithms to MDAS in FY 2007. The first is the hybrid K-means-hierarchical clustering algorithm that HPCIO has been developing. This algorithm will allow users to perform hierarchical clustering on very large datasets by performing a K-means clustering first. The second is the artificial neural network that Dr. Khan’s lab developed. The artificial neural network is a classification algorithm very different from other classification algorithms we currently provide. We are also looking for new collaborators who are interested in developing novel microarray data analysis algorithms.
We also plan on improving the reliability and robustness of the MDAS system by decreasing the number of system failures and by improving error handling. Better error handling will help users understand what went wrong, and if they can adjust input parameters to fix the problem. If a system error occurs, better error handling will warn operators of the error and help them resolve it.
- Andreas Baxevanis, Ph.D., Deputy Scientific Director, NHGRI
- Yidong Chen, Ph.D., Staff Scientist Cancer Genetics Branch, NCI
- Javed Khan, M.D., Senior Investigator, Pediatric Oncology Branch, NCI
Weeraratna A.T., Kalehua A., DeLeon I., Bertak D., Maher G., Wade M.S., Lustig A., Becker K.G., Wood W. III, WalkerD.G., Beach T.G., and Taub D.D., “Alterations in Immunological and Neurological Gene Expression Patterns in Alzheimer's Disease Tissues,” Experimental Cell Research, accepted for publication.
S.A. Amundson, R.A. Lee, C.A. Koch-Paiz, M.L. Bittner, P. Meltzer, J.M. Trent and A.J. Fornace, Jr., “Differential responses of stress genes to low dose-rate gamma-irradiation,” Molecular Cancer Research, 1(6), pp. 445-452, 2003.
C.A. Koch-Paiz, S.A. Amundson, M.L. Bittner, P.S. Meltzer and A.J. Fornace, Jr., “Functional genomics of UV radiation responses in human cells,” Mutat. Res., 549(1-2), pp. 65-78, 2004.
S.A. Amundson, M.B. Grace, C.B. McLeland, M.W. Epperly, A. Yeager, Q. Zhan, J.S. Greenberger and A.J. Fornace, Jr., “Human in vivo radiation-induced biomarkers: gene expression changes in radiotherapy patients,” Cancer Research, 64, pp 6368-6371, 2004.
S.A. Amundson, K.T. Do, L. Vinikoor, C.A. Koch-Paiz, M.L. Bittner, J.M. Trent, P. Meltzer and A.J. Fornace, Jr., “Stress-specific signatures: Expression profiling of p53 wild-type and null human cells,” Oncogene, 24, pp.4572-4579, 2005.
Martino, R.L., D.E. Russ, and C.A. Johnson “Parallel Computing in the Analysis of Gene Expression Relationships.” Chapter 15 in: A.Y. Zomaya, editor, Parallel Computing Bioinformatics New York, N.Y.: Wiley
- Number of users: 55
- Number of jobs run: 970
- Number of algorithms: 23