• 2019-07
  • 2019-08
  • 2019-09
  • 2019-10
  • 2019-11
  • 2020-03
  • 2020-07
  • 2020-08
  • 2021-03
  • br known cancer genes as heuristics to prioritize cancer


    known cancer genes as heuristics to prioritize cancer-related genes. Zhou et al. [19] applied a systems biology approach by combining differential Fasudil analysis and weighted gene co-expression net-work analysis (WGCNA) to detect the colon cancer-related miRNA and gene modules.
    However, limitations of current methods still exist. Most studies handle coding genes and miRNAs separately, ignoring the regulatory relationships between miRNAs and coding genes. In addition, most methods rely on a single source of data type, due to the computational complexity, especially in large datasets. Although the large-scale data enabled better identification of new cancer-related genes or miRNAs, few methods can handle large amounts of datasets in a time-efficient way. New comprehensive and time-efficient methods are increasingly needed as more and more data are becoming available.
    With the increasing availability of multi-dimensional biological datasets for the same samples (i.e., gene expression, miRNAs, and copy numbers), it becomes possible to systematically understand the reg-ulatory mechanisms in cancer [20]. Many studies have been conducted in this way [21]. For example, Freiesleben et al. [22] combined ana-lyses of miRNAs and gene expression profiles in uncovering pathways potentially involved in multiple sclerosis. Zhang et al. [23] developed a multiple nonnegative matrix factorization framework to integrate gene and miRNA expression data for identifying miRNA-gene regulatory comodules. Liu et al. [24] used data from mRNA and miRNA micro-array datasets to identify potential pancreatic cancer-related genes. It is important to discover a set of co-expressed genes and miRNAs re-presenting a functional gene module [25]. Research shows that joint analysis of expression data on the same set of samples from multiple omics sources has potential to achieve more comprehensive results than separate analyses [26,27]. On the other hand, many co-expression re-lationships are condition specific. Biclustering was developed to cluster a subset of genes that have similar expression in a subset of conditions [28]. For example, Fiannaca et al. [29] used a biclustering approach (ISA algorithm) to simultaneously select a subset of features that characterizes a subset of samples based on a “local similarity” criterion for analyzing the differential expression of miRNAs in breast cancer samples. As a result, they found 12 different miRNA clusters, associated to specific groups of patients. They concluded that clustering miRNAs according to subclass of tumours can help better define a potential role of miRNA as prognostic, diagnostic and therapeutic markers. However, the biclustering problem is NP-Complete [30] and very time-consuming to compute. A promising method to address this issue is the recently developed unsupervised learning approach Rectified Factor Networks (RFN), which is a generalized alternating minimization algorithm based on the posterior regularization method. RFN can efficiently construct very sparse, non-linear, high-dimensional representations of the input via their posterior means [31]. To speed up computation, RFN performs a gradient step both in E-step and M-step with GPU implementations. RFN can easily get thousands of biclusters from a very large matrix in a short time.
    In this paper, we propose a new method rfnGMI (rectified factor network for cancer-related coding Gene, MiRNA and their Interactions detection), which applies RFN on the analysis of the combined ex-pression profiles of miRNAs and coding genes from the same set of samples (Supplemental File, By analyzing combined ex-pression profiles of coding genes and miRNAs, a set of functional re-lated coding genes and miRNAs are clustered together by RFN, and the regulatory relationships between miRNAs and coding genes can be identified. To detect cancer-related coding genes, miRNAs and their interactions, only biclusters specific to a studied cancer type (breast cancer in this study) are considered. The selected biclusters are prior-itized by considering their differential expression and differential cor-relation values, protein–protein interaction (PPI) data, and overlaps with general known cancer marker coding genes and miRNAs. To get more robust result, a rank fusion process is used to obtain the final comprehensive rank by combining multiple ranking results together.