Bioinformatics

The bioinformatics team can provide routine support for your project as a service, and undertake more complex analyses on a collaborative basis.

Experimental design

We can advise on experimental design and appropriate choice of assays and technology. Please contact us when planning your study if you would like to discuss suitable NGS platforms, numbers of replicates, sample size estimates or options for downstream analysis.

Standard data processing and QC pipelines

Sequencing data generated at the IGF will automatically be processed with our standard NGS pipelines which are primarily intended for quality control purposes but also provide output files including genomic alignments that may be useful in downstream analyses. Briefly, we demultiplex the raw Illumina data using bcl2fastq, remove generic adapters and generate fastq files. These files are then assessed against standard quality metrics using FastQC, FastQ Screen and MultiQC and the results summarised in an online report.

RNA-seq datasets are further trimmed with Fastp and aligned to the genome with STAR, gene counts are generated with FeatureCounts and transcripts quantified are with RSEM. Post-alignment quality metrics are collated from the post-alignment log files with MultiQC. For 10X single-cell data we run the standard Cellranger pipeline, collate appropriate Picard and Samtools quality metrics, and provide Scanpy QC and clustering data.

Mammalian DNA-seq datasets are further trimmed with Fastp, aligned to the genome with BWA and processed with Picard and Samtools to mark duplicates, add read groups and generate quality metrics that are summarised with MultiQC. For ChIP-seq and similar epigenomic sequencing, we also include metrics from Phantompeakqualtools and deepTools.

Data analysis

We carry out custom analyses on NGS and microarray data generated at the IGF and elsewhere. Common examples include differential expression and pathway analysis of RNA-seq data, variant calling from whole genome, exome or panel sequencing data, detection of somatic mutations in cancer data, and peak finding in ChIP-seq experiments. We can also help with downloading large datasets from (and submitting your own data to) public repositories, and applying for and managing controlled access data. Please contact us for further details.

Training

We can provide advice on using bioinformatics tools, databases, online resources and pipelines appropriate to your research and more formal training on commonly used NGS analysis techniques in periodic taught courses.

Bioinformatics

RNAseq

Sequencing data will be quality checked using FastQC, alignment is performed with TopHat2. Alignment metrics will be generated using Picard tools. The final analysis will generate a gene count table, differentially expressed genes and PCA plots files using HTSeq and DESeq2 respectively. Detailed instructions on how to access and download the data, analysis file(s)  and review the QC results will be provided once the analysis is completed.

Genomic DNA

Sequencing data will be quality checked using FastQC, alignment is performed with BWA-MEM. Alignment metrics will be generated using Picard and GATK tools. The final analysis will generate Variant Call Format (VCF), variant annotation and Copy Number Variation (CNV) files using GATK, ANNOVAR and ExomeDepth respectively. Detailed instructions on how to access and download the data, analysis file(s)  and review the QC results will be provided once the analysis is completed. 

General enquiries


For general enquiries, please email igf@imperial.ac.uk

Funded by NIHR