MetageneCreator
Adrian Dobra, Quanli Wang and Mike West
NOTE: IF YOU HAVE
DOWNLOADED THE SOFTWARE BEFORE August 15, 2004, PLEASE DOWNLOAD IT AGAIN AND
READ THE INSTRUCTIONS BELOW.
Overview
MetageneCreator identifies overlapping clusters of genes and generates the meta-genes
associated with these clusters in arbitrarily large datasets.
Download
HERE you can
download MetageneCreator. To run this package, you
need MATLAB and the Statistics Toolbox.
Input Parameters
After
you unzip the package, you need to edit the file “parameters.m”.
You have to specify the following information:
·
“covfiles” This is the directory where the covariance models
generated with HdBCS
are
located. MetageneCreator can run without the covariance models but
the resulting clusters will not be overlapping. In addition, these clusters
are likely to be smaller in size. HERE you can download
an example.
·
“workdirectory” This is the directory where your dataset is located
and the directory in which the output files are saved.
·
“datafile” The name of your gene expression dataset. Rows
correspond to samples and columns correspond to variables. It is assumed that
the expression levels are on a log2 scale.
·
“annotationfile” This file gives a short description of each probe from
the “datafile”.
The description of the variable from column k in “datafile” is found on row k in “annotationfile”.
The ID of this variable (probe) is k.
·
“resultsfile” This is the file where your clusters are saved. The
first column represents the ID of each probe. The second column gives the
identifier of the group each probe belongs to. The third column is gives the
description of each probe if you have loaded one (the parameter “annotationfile”).
·
“metavarsfile” The name of the file in which the meta-genes are
saved. As before, rows correspond to samples and columns correspond to
meta-genes. The meta-genes are quantile-normalized and scaled (sample mean equal to zero
and sample variance equal to one).
·
“metavarslabelfile” The label attached with each meta-gene. If a group has
at least two variables, its name starts with “M”. Otherwise its
name starts with “V”.
·
“maxgroupsize” The maximum number of variables to be processed at
each iteration. The idea is to make this number as large as possible depending
on the capabilities of your computer.
·
“maxclustersize” The maximum size allowed for a cluster.
·
“minpvexplained” A cluster is not saved unless the first principal component (=the
meta-gene) explains at least “minpvexplained” percent of the variation within
the group. A higher value of this parameter increases the number of clusters
produced.
Running MetageneCreator
You
can either type “metagenecreator” in a
MATLAB session after you have correctly set the current directory or run the
program in batch using the following command:
matlab < metagenecreator.m
> mylogfile.log &
References
·
Dobra, A., Wang,
Q. and West, M. (2004). Graphical
model-based gene clustering and metagene expression
analysis. Manuscript submitted to Bioinformatics.