What does the Gene Annotation File Format look like?


The Gene Annotation File Format (GFF):

To upload Gene Annotation Files to easyGWAS the file needs to follow the slightly modified GFF file format, version 2.
The fields in the file must be tab-seperated. Empty fields should be denoted with a .. The file must have the file extension *.gff.
The file must have 9 columns with the following information:
  1. Chromosome ID -- This must be the same as in the MAP file
  2. Source, e.g. the source or version of the data
  3. Feature, e.g. chromosome, gene, CDS, exon ...
  4. Start, e.g. start position of the chromosome, gene, CDS, exon, ...
  5. End, e.g. end position of the chromosome, gene, CDS, exon, ...
  6. Score, not relevant for easyGWAS use '.'
  7. Strand, use + (forward) or - (reverse) strand
  8. Frame, not relevant for easyGWAS use '.'
  9. Attribute -- Two fields a needed separated by a semicolon, ID which is the unique ID of the chromosome or gene and Name that is the name of the gene or chromosome


Each chromosome must be defined at the beginning in the following way:
Chr1 	TAIR9	chromosome	1	30427671	.	.	.	ID=Chr1;Name=Chromosome1
Chr2 	TAIR9	chromosome	1	19698289	.	.	.	ID=Chr2;Name=Chromosome2
Chr3 	TAIR9	chromosome	1	23459830	.	.	.	ID=Chr3;Name=Chromosome3
Chr4 	TAIR9	chromosome	1	18585056	.	.	.	ID=Chr4;Name=Chromosome4
Chr5 	TAIR9	chromosome	1	26975502	.	.	.	ID=Chr5;Name=Chromosome5

Following the chromosome definition, all other features can be defined in the following way:
Chr1 	TAIR9	gene	3631	5899	.	+	.	ID=AT1G01010;Name=AT1G01010
Chr1 	TAIR9	mRNA	3631	5899	.	+	.	ID=AT1G01010.1;Name=AT1G01010.1
Chr1 	TAIR9	protein	3760	5630	.	+	.	ID=AT1G01010.1-Protein;Name=AT1G01010.1
Chr1 	TAIR9	gene	5928	8737	.	-	.	ID=AT1G01020;Name=AT1G01020
Chr1 	TAIR9	mRNA	5928	8737	.	-	.	ID=AT1G01020.1;Name=AT1G01020.1

Detailed information about the GFF file format can be found here: GFF/GTF File Format

Note: Genome-Annotation-Files can only be uploaded for existing species and genotypes. You first have to uploaded PED and MAP files. Chromosome identifiers in the GFF file must match the chromosome identifiers in the MAP file. No additional chromosome identifiers are allowed in the GFF file.