Uploader

Data upload is the first step in the analysis. Uploader is used to upload different data types on the platform.

Overview

Data upload is the first step in the analysis. Generally, four data types are required: 

        1. Sequencing data (read files)
        2. Reference genomes
        3. Annotation files
        4. Metadata

Click the icon in the Utilities menu to access the Uploader window. Each data type is uploaded from a separate tab and follows the following three high-level steps:

            1. Define the data types with correct tags
            2. Verify and check for its existence in users account, format, and name conflicts
            3. Upload

Pre-configured references and annotation files are available for the model organisms.

FigData upload interface - each tab allows uploading of different data types to the data store. A: Sequence data B: References C: Annotations D: Metadata..

 

UPLOAD SEQUENCE DATA

Sequencing files can be uploaded through the SEQUENCE DATA (Fig. 1) tab. Complete all the fields in the form before selecting the files.

Caution - Tags should contain alphanumeric characters only.

HINT - Currently allowed data file formats are fq, fastq, bam, sam, ubam, cram, hdf5, and their zipped versions. 

Ensure that each sample file is compressed separately with the same name as the name of the sample file itself

HINT - Sequencing data from other sources (NCBI SRA) should be converted to fastq/fq format.

HINT - If sequencing is multiplexed, demultiplex the data before uploading.

Pairing is done if forward and reverse samples have the suffix combinations: _1/_2; _F/_R; _f/_r; -1/-2; -F/-R; -f/-r; _R1/_R2.

 

 

 

 


UPLOAD REFERENCES

The latest versions of reference genomes and transcriptomes are available on the platform for eight model organisms. Pre-configured references can be explored through REFERENCES (Fig.). The owner column helps to identify the pre-configured references (owned by Stanome) from the custom references (owned by the users).

Click the REFERENCES tab on the Upload window to upload genome and transcriptome files.

Fig. 1. REFERENCE FILES upload window. Fields shown in red are mandatory.

 

Complete all the fields in the form before selecting the files.

  1. Add to Existing: Helps to add additional files such as transcriptome/genome. to an existing genome/transcriptome. This is an optional parameter (Default: No). 
  2. Organism*: Select the organism name.
  3. Version*: Provide Genome Build (version) name or the number of the reference file(s). This will help to select the correct reference version during analysis.

HINT - Allowed formats for genome or transcriptome files are fa, fasta, fna, and their compressed formats. Each file should be compressed separately.

Following actions are performed on the uploaded references:

Successfully uploaded reference files are stored in the REFERENCES. Custom references can be deleted using on the reference details window.

UPLOAD ANNOTATIONS

Pathways, gene ontology (GO) terms, ABR genes, and VEP (Variant Effect Predictor) files are classified as annotations and can be uploaded through the ANNOTATIONS (Fig.) tab.

ABR and GO_OBO upload has two fields:

  1. Organism: Name of the organism
  2. Tag: The version number of the annotation file

Gene Model, VEP, and Variations upload have three fields:

  1. Organism: Name of the organism
  2. Reference version: Version of the reference file
  3. Tag: Version of the annotation file

GO and Pathway  upload has four fields:

  1. Organism: Name of the organism
  2. Reference version: Version of the reference file
  3. GTF version: Version of the gene model file
  4. Tag: Version 

Different types of annotations and their allowed formats are explained in Table below.

Tag

Data Type

Format

Details

Gene models

Gene annotations

Gtf, Gff3

Source: Ensembl.

Should correspond to the Ensembl genome versions

Pathway

Pathways

Tab, Gmt

Source: Wiki pathways

Gene Ontology

Gene Ontology associations

Obo, Tab, Txt

GO terms

VEP

Variant annotations

Custom

Source: Ensembl.

Should correspond to the Ensembl genome versions

Variations

GATK variants

VCF

Source: Ensembl

Table. Annotation file formats. Compressed files are allowed as long as each file is compressed separately.

 

Fig. 1. ANNOTATION FILES upload window. One of the radio icons needs to be selected to upload the data.

 

Similar to Sequence and reference file uploads, the following actions are performed on the uploaded annotation files:

Successfully uploaded files are stored in ANNOTATIONS and can be accessed while executing the pipelines.

 

UPLOAD METADATA

Any files associated with an experiment (excluding sequencing files, references, and annotations) can be uploaded through the METADATA (Fig.) tab. During upload, the metadata files should always be associated with an organism and tagged appropriately, as explained in Table below.

Data Type

Tag Name

Format

Details

List of genes

Gene List

Tab, CSV, TXT

Ensembl Gene IDs only

Target markers

Hotspots

BED, VCF

SNPs, MNPs, INDELs

Amplicon ranges

Amplicon Range

BED, VCF

Target region with start and ends

Variants/genotypes 

Genotypes

BED, VCF

Called variants

Table. Metadata file formats and associated upload tags. 

 

Similar to Sequence and reference file uploads, the following actions are performed on the uploaded metadata files:

 

Fig. 1. METADATA FILES upload window.


Successfully uploaded metadata files are stored in METADATA and can be accessed while executing the pipelines.