Provides R and C++ function that enable the user to conduct multiple kernel learning (MKL) and cross validation for support vector machine (SVM) models. Cross validation can be used to identify kernel shapes and hyperparameter combinations that can be used as candidate kernels for MKL. There are three implementations provided in this package, namely SimpleMKL, Simple and Efficient, and Dual augmented Lagrangian . These methods identify the convex combination of candidate kernels to construct an optimal hyperplane.
Reference: Wilson, C. M., et al. (2019). "Multiple-kernel learning for genomic data mining and prediction." BMC Bioinformatics 20(1): 426.
lncDIFF is a powerful differential analysis tool for low abundance non-coding RNA expression data. This method is compatible with various existing RNA-Seq quantification and normalization tools. lncDIFF is implemented in an R package available at
Download/Install at: https://github.com/qianli10000/lncDIFF.
References: Li Q, ... Wang X. lncDIFF: a novel quasi-likelihood method for differential expression analysis of non-coding RNA. BMC Genomics 20: 539 (2019).
Efficient procedures for adaptive LASSO and network regularized for Gaussian, logistic, and Cox model. Provides network estimation procedure (combination of methods proposed by Ucar, et. al (2007) and Meinshausen and Buhlmann (2006) cross validation and stability selection proposed by Meinshausen and Buhlmann (2010) and Liu, Roeder and Wasserman (2010) <arXiv:1006.3316> methods. Interactive R app is available.
CRAN link/download: https://cran.r-project.org/web/packages/glmaag/index.html
Perform family based association test via GEE Kernel Machine score test
Latest github link/install: https://github.com/xfwang/gskat
Wang X. et al. GEE-Based SNP Set Association Test for Continuous and Discrete Traits in Family-Based Association Studies. Genetic Epidemiology (2013) 37: 778-786
Wang X. et al. Rare variant association test in family based sequencing studies. Breifings in Bioinformatics (2017) 18:954-961
SCNV provides functions for performing CNV analysis with Single-cell DNA sequencing. Current pipeline majorly facilitates the binless segmentation on single-cell sequencing based on nonhomogeneous poisson process (NHPP). These CNV breakpoints may be used as surrogates for SNVs.
Wang X. et al. DNA copy number profiling using single cell sequencing. Briefings in Bioinformatics (2018) 19:731-6
CLOSE-R is a toolkit for CNA and LOH analysis (as well as CLOnality analysis) with SEquencing data implemented in R. Current pipeline majorly facilitates the analysis on paired tumor and normal samples. This pipeline consists of three major compartments: (1) ASCN (allele-specific copy number) estimation using model-free approach (distance-based Chinese Restaurant Process) or model-based approach (MAP, Maximum a posteriori); (3) global purity and ploidy estimation; (3) Genome-wide ASCN visualization.
Reference: Wang X et al. Global copy number profiling of cancer genomes. Bioinformatics 2016 32:926-928
Currently available bisulfite sequencing tools frequently suffer from low mapping rates and low methylation calls, especially for data generated from the Illumina sequencer, NextSeq. We introduce a sequential trimming-and-retrieving alignment approach for investigating DNA methylation patterns, which significantly improves the number of mapped reads and covered CpG sites.
Reference: Wang X, et al. A trimming-and-retrieving alignment scheme for bisulfite sequencing data. Bioinformatics (2015) 31(12):2040-2042.