Uploader
Data upload is the first step in the analysis. Uploader is used to upload different data types on the platform.
Overview
Data upload is the first step in the analysis. Generally, four data types are required:
-
-
-
- Sequencing data (read files)
- Reference genomes
- Annotation files
- Metadata
-
-
Click the icon in the Utilities menu to access the Uploader window. Each data type is uploaded from a separate tab and follows the following three high-level steps:
-
-
-
-
-
- Define the data types with correct tags
- Verify and check for its existence in users account, format, and name conflicts
- Upload
-
-
-
-
Pre-configured references and annotation files are available for the model organisms.
UPLOAD SEQUENCE DATA
Sequencing files can be uploaded through the SEQUENCE DATA (Fig. 1) tab. Complete all the fields in the form before selecting the files.
- MatePair: Select appropriate mate-pair information, if available. This is an optional parameter (Default: No).
- Strand: Select appropriate strand information, if available. This is an optional parameter (Default: No).
- Sequencing Platform*: Select the sequencing instrument used to generate the data (check with the provider).
- Organism*: Select the organism from the drop-down menu.
- HINT - Select Other if the organism is not listed.
- Tag*: Provide a tag name for the sample files that can be used to filter the data later.
- The user can select sample files through the selection dialog box.
HINT - Currently allowed data file formats are fq, fastq, bam, sam, ubam, cram, hdf5, and their zipped versions.
Ensure that each sample file is compressed separately with the same name as the name of the sample file itself
HINT - Sequencing data from other sources (NCBI SRA) should be converted to fastq/fq format.
HINT - If sequencing is multiplexed, demultiplex the data before uploading.
- Click to verify file types, check file existence in the user's account, and pair files (not applicable to Single-End data). File pairing information (i.e. forward/reverse sample) is retained for PE sample data.
Pairing is done if forward and reverse samples have the suffix combinations: _1/_2; _F/_R; _f/_r; -1/-2; -F/-R; -f/-r; _R1/_R2.
- Upon confirming the Data Store check, click to begin the file transfer (the progress bar indicates the upload status for each file). Following four actions are performed (failed action errors will be displayed, if any) after a successful upload.
-
- Uncompression of the files
- Data quality check with FastQC
- Registration into the SEQUENCE DATA
- Status notification through email
-
- Go to the SEQUENCE DATA to access the uploaded files and quality report (Fig. 2). Occasionally, files are not visible instantly - a window refresh is required.
- Click on a sample name to access its additional details.
- Use the icon on the upper right corner of the sample details window to delete samples.
UPLOAD REFERENCES
The latest versions of reference genomes and transcriptomes are available on the platform for eight model organisms. Pre-configured references can be explored through REFERENCES (Fig.). The owner column helps to identify the pre-configured references (owned by Stanome) from the custom references (owned by the users).
Click the REFERENCES tab on the Upload window to upload genome and transcriptome files.
Complete all the fields in the form before selecting the files.
- Add to Existing: Helps to add additional files such as transcriptome/genome. to an existing genome/transcriptome. This is an optional parameter (Default: No).
- Organism*: Select the organism name.
- Version*: Provide Genome Build (version) name or the number of the reference file(s). This will help to select the correct reference version during analysis.
HINT - Allowed formats for genome or transcriptome files are fa, fasta, fna, and their compressed formats. Each file should be compressed separately.
Following actions are performed on the uploaded references:
- Uncompression of the files
- Validation of the format and integrity of the file
- Registration into the REFERENCES
- Status notification through email
Successfully uploaded reference files are stored in the REFERENCES. Custom references can be deleted using on the reference details window.
UPLOAD ANNOTATIONS
Pathways, gene ontology (GO) terms, ABR genes, and VEP (Variant Effect Predictor) files are classified as annotations and can be uploaded through the ANNOTATIONS (Fig.) tab.
ABR and GO_OBO upload has two fields:
- Organism: Name of the organism
- Tag: The version number of the annotation file
Gene Model, VEP, and Variations upload have three fields:
- Organism: Name of the organism
- Reference version: Version of the reference file
- Tag: Version of the annotation file
GO and Pathway upload has four fields:
- Organism: Name of the organism
- Reference version: Version of the reference file
- GTF version: Version of the gene model file
- Tag: Version
Different types of annotations and their allowed formats are explained in Table below.
Similar to Sequence and reference file uploads, the following actions are performed on the uploaded annotation files:
- Uncompression of the files
- Validation of the format and integrity of the file and its compatibility to genome/gene annotation file.
- Registration into the ANNOTATIONS
- Status notification through email
Successfully uploaded files are stored in ANNOTATIONS and can be accessed while executing the pipelines.
UPLOAD METADATA
Any files associated with an experiment (excluding sequencing files, references, and annotations) can be uploaded through the METADATA (Fig.) tab. During upload, the metadata files should always be associated with an organism and tagged appropriately, as explained in Table below.
Similar to Sequence and reference file uploads, the following actions are performed on the uploaded metadata files:
-
- Uncompression of the files
- Validation of the format and integrity of the file and its compatibility to the reference genome.
- Registration into the METADATA
- Status notification through email
Successfully uploaded metadata files are stored in METADATA and can be accessed while executing the pipelines.