BIGS   BioInformatics Group Seville




SATuRNo: Supervised prognostic Approach Through Regression Networks

  • SATuRNo: a Java stand-alone application. The software is available upon request ( inepomuceno at us dot es) 
  • Motivation

    The application of information encoded in molecular networks for prognostic purposes is a crucial
    objective of systems biomedicine. This approach has not been widely investigated in the cardiovascular
    research area. Within this area, the prediction of clinical outcomes after suffering a heart attack would
    represent a significant step forward. We developed a new supervised prediction method for this prognostic
    problem based on the discovery of clinically- relevant transcriptional association networks. The method
    integrates regression trees and clinical class-specific networks, and can be applied to other clinical domains.

    Schematic view of the proposed method.

  • Application Download (Linux/Windows) a zip file that includes Linux/Windows executable and data example.
    IMPORTANT: Set the JAVA_HOME environment variable to point to the JDK installation and
    use prepare_data.bat and SATuRNo.bat

  • Usage summary:

    The file contains the following folders:

    • \IN: a small example, folder with the two dataset with the two groups of patients divided in two clinical class
    • \OUT: folder with output files. Here you can see the representative accuracy of the networks and .sif files to visualize the networks
  • Tu run the aplication

    First, SATuRNo prepare the dataset to applied leave-one-out cross-validation
    java -jar prepare_data.jar [inputs]

    • Name of the gene expression profile of a group of patients
    • Clinical class or category of the group of patients
    • Example: java -jar prepare_data.jar dataExample_class1.arff ONE

    Second, the method can be run
    java -jar SATuRNo.jar [inputs]

    • 1. Clinical class or category of the first group of patients
    • 2. Number of patients with the first clinical class
    • 3. Clinical class or category of the second group of patients
    • 4. Number of patients with the second clinical class
    • 5. Theta threshold (Pruning phase of model trees)
    • 6. Definition of a distance between true value (observed on the dataset) and predicted gene expression value. There are two possibilities: “absolute_error” and “relative_error”
    • Example: java -jar SATuRNo.jar ONE 12 TWO 10 35 absolute_error

    Finally, the results will be generated in the folder \OUT after run prepare_data.jar and SATuRNo.jar in a shell

    • 1. A file with extension .txt where the accuracy can be seen
    • 2. Two files with extension .sif where it can be seen the gene association networks (inferred from gene expression data of patients with the same clinical category)
    • 3. Other binary files used by the tool

  • Datasets

    You can use

    • A small example in folder "\IN"
    • The benchmark dataset reported in [1]. The experiments reported in the paper focus on a pre-processed version of this dataset
    • The heart dataset: (blood-derived) gene expression data generated by microarray experiments at Laboratory of Cardiovascular Research, Public Research Centre for Health, Luxembourg (Gene Expression Omnibus, GEO, accession number: GSE11947)

    [1]. Dunckley et al. Gene expression correlates of neurofibrillary tangles in alzheimer's disease. Neurobiol. Aging 2006;27:1359-1371 (2006).