We applied the same preprocessing steps to these inherited CNVs, resulting in 156 regions affecting 418 genes. To perform the NETBAG analysis, we built a background network connecting all pairs of human genes. Every gene pair in this network was assigned a score proportional to the log of the ratio of the likelihood that the two genes participate in the same
genetic phenotype to the likelihood that they do not (see Supplemental Experimental Hydroxychloroquine mw Procedures). Importantly, although similar in spirit to integrative methods that have been used previously to build functional networks in several model species (Lee et al., 2004 and Lee et al., 2008), the edges in our network represent the likelihood to participate in the same genetic phenotype rather than share a functional and molecular interaction. The likelihood network was build using, as a positive gold standard, the carefully curated set of human genes compiled recently by Feldman et al. (2008). This set contains 476 human genes associated with 132 different genetic phenotypes. As a negative gold standard we SB203580 mw used a set of randomly selected pairs of human genes that are not known to be associated
with identical diseases phenotypes. Importantly, no genes previously implicated in ASD or any biologically related functions were Dichloromethane dehalogenase used in the network construction. The likelihood score was derived based on naive Bayesian integration of various descriptors of proteins function: shared GO annotations, participation in the same KEGG pathways, shared protein domains in InterPro, direct protein-proteins interactions and shared interaction partners from multiple
databases (BIND, BioGRID, DIP, HPRD, InNetDB, IntAct, BiGG, MINT, and MIPS), sequence homology between the gene pair calculated using BLAST (Altschul et al., 1997), and two measures of similarity in coevolutionary patterns: phylogenetic profile similarity and chromosomal coclustering across genomes (Chen and Vitkup, 2006). We cross-validated the quality of the background network by showing that it can be successfully used to prioritize (rank) genes, located within a chromosomal region, across a variety of genetic phenotypes (see Supplemental Experimental Procedures for details). To score a cluster of genes in the network (Figure 1), we combined the scores for all gene pairs forming the cluster. The direct multiplication of the corresponding likelihoods (network edges) is conceptually equivalent to assuming that all connections within the cluster are independent; we refer to this procedure as the naive scoring scheme. Second, we applied a simple deweighting scheme used previously for functional data integration (Lee et al., 2004).