Phylogeny-based classification of plant retrotransposons

LTR retrotransposons are class I transposable element characterized by the presence of long terminal repeats (LTRs) directly flanking an internal coding region. As retrotransposons, they mobilize through reverse transcription of their mRNA and integration of the newly created cDNA into another location. Their mechanism of retrotransposition is shared with retroviruses, with the difference that most LTR-retrotransposons do not form infectious particles that leave the cells and therefore only replicate inside their genome of origin. Those they do (occasionally) from virus-like particles are classified under Ortervirales.

Their size ranges from a few hundred base pairs to 25kb, for example the Ogre retrotransposon in the pea genome.

In plant genomes, LTR retrotransposons are the major repetitive sequence class, for example, constituting more than 75% of the maize genome. LTR retrotransposons make up about 8% of the human genome and approximately 10% of the mouse genome.

https://github.com/maypoleflyn/Probe

  1. Identification of LTR-RTs in the genome using LTR_retriever
https://github.com/oushujun/LTR_retriever

/your_path_to/LTR_retriever -genome genome.fa -inharvest genome.fa.rawLTR.scn -threads 10 

  1. Extracting LTR-RTs sequences from the outputs of LTR_retriever
   perl ./scripts/list2fq.pl genome.pass.list prefix genome.fa 
   
   Inputs:
   
   genome.pass.list: the outputs of LTR_retriever
   prefix: the name you wanted 
   genome.fa: the reference genome
   
   Outputs:
   
   prefix.copia.fa
   prefix.gyp.fa
  1. Analysis the functional domains of LTR-RTs

3.1 run interproscan 

sh interproscan.sh  -cpu 80 -i prefix.copia.fa -t n -appl Pfam,CDD 

sh interproscan.sh  -cpu 80 -i prefix.gyp.fa -t n -appl Pfam,CDD 

3.2 extract RTs domain sequences

For copia:
perl ./scripts/extractRT.pl prefix.tsv prefix.gff c > copia.RT.fa

For Gypsy
perl ./scripts/extractRT.pl prefix.tsv prefix.gff g > gypsy.RT.fa
  1. merge RT domain sequence with marker RT sequences

    cat copia.RT.fa ./copia.marker.fa > copia.rt.fa
    cat gyp.RT.fa ./gyp.marker.fa > gyp.rt.fa 
    
  2. multiple alignment and construct tree

   mafft copia.rt.fa > copia.rt.align
   fastree -quote copia.rt.align > copia.rt.tree
   
   mafft gyp.rt.fa > gyp.rt.align
   fastree -quote gyp.rt.align > gyp.rt.tree
  1. color the trees

    draw the *itol file into ITOL