Phylogeny-based classification of plant retrotransposons
LTR retrotransposons are class I transposable element characterized by the presence of long terminal repeats (LTRs) directly flanking an internal coding region. As retrotransposons, they mobilize through reverse transcription of their mRNA and integration of the newly created cDNA into another location. Their mechanism of retrotransposition is shared with retroviruses, with the difference that most LTR-retrotransposons do not form infectious particles that leave the cells and therefore only replicate inside their genome of origin. Those they do (occasionally) from virus-like particles are classified under Ortervirales.
Their size ranges from a few hundred base pairs to 25kb, for example the Ogre retrotransposon in the pea genome.
In plant genomes, LTR retrotransposons are the major repetitive sequence class, for example, constituting more than 75% of the maize genome. LTR retrotransposons make up about 8% of the human genome and approximately 10% of the mouse genome.
https://github.com/maypoleflyn/Probe
- Identification of LTR-RTs in the genome using LTR_retriever
https://github.com/oushujun/LTR_retriever
/your_path_to/LTR_retriever -genome genome.fa -inharvest genome.fa.rawLTR.scn -threads 10
- Extracting LTR-RTs sequences from the outputs of LTR_retriever
perl ./scripts/list2fq.pl genome.pass.list prefix genome.fa
Inputs:
genome.pass.list: the outputs of LTR_retriever
prefix: the name you wanted
genome.fa: the reference genome
Outputs:
prefix.copia.fa
prefix.gyp.fa
- Analysis the functional domains of LTR-RTs
3.1 run interproscan
sh interproscan.sh -cpu 80 -i prefix.copia.fa -t n -appl Pfam,CDD
sh interproscan.sh -cpu 80 -i prefix.gyp.fa -t n -appl Pfam,CDD
3.2 extract RTs domain sequences
For copia:
perl ./scripts/extractRT.pl prefix.tsv prefix.gff c > copia.RT.fa
For Gypsy
perl ./scripts/extractRT.pl prefix.tsv prefix.gff g > gypsy.RT.fa
-
merge RT domain sequence with marker RT sequences
cat copia.RT.fa ./copia.marker.fa > copia.rt.fa cat gyp.RT.fa ./gyp.marker.fa > gyp.rt.fa
-
multiple alignment and construct tree
mafft copia.rt.fa > copia.rt.align
fastree -quote copia.rt.align > copia.rt.tree
mafft gyp.rt.fa > gyp.rt.align
fastree -quote gyp.rt.align > gyp.rt.tree
-
color the trees
draw the *itol file into ITOL