Skip to content
Snippets Groups Projects
Commit b5211d19 authored by Niko (Nikolaos) Papadopoulos's avatar Niko (Nikolaos) Papadopoulos
Browse files

repeat annotation

parent 8e820f3f
No related branches found
No related tags found
No related merge requests found
# Repeat analysis
Code in this folder covers the repeat analysis. It follows the production of a scaffolded draft
genome (after juicebox).
### Repeat prediction
We modeled repeat families on the draft genome of _P. litorale_ using
[RepeatModeler](prep-repeat-modeler.sh) and (soft-)masked them with
[RepeatMasker](prep-repeat_masker.sh).
\ No newline at end of file
#!/usr/bin/env bash
#
#SBATCH --job-name=repeatmodeler_pycno
#SBATCH --cpus-per-task=32
#SBATCH --mem=20G
#SBATCH --time=30:00:00
#SBATCH --mail-type=ALL
#SBATCH --mail-user=nikolaos.papadopoulos@univie.ac.at
#SBATCH --output=/lisc/user/papadopoulos/log/pycno-repeats-%j.out
#SBATCH --error=/lisc/user/papadopoulos/log/pycno-repeats-%j.err
# will need optimisation from scratch when I next need it
module load repeatmodeler/2.0.5-5.40.0-1.0.6
# OLDDIR=/lisc/slurm/node-d02/tmp/slurm-9303351
cd "$TMPDIR" || exit 1
OUTDIR=/lisc/scratch/zoology/pycnogonum/genome/draft/repeats/repeat_modeller
SCAFFOLDS=/lisc/scratch/zoology/pycnogonum/genome/draft/draft.fasta
mkdir -p "$OUTDIR" || exit 1
BuildDatabase -name pycno "$SCAFFOLDS"
RepeatModeler -database pycno -threads 32 -LTRStruct > $OUTDIR/repeat_modeller_run.out
# tar "$TMPDIR"/RM_*/ -czf "$OUTDIR"/repeatmodeler.tar.gz
# cp "$TMPDIR"/run.out "$OUTDIR"/run.out
# if the run was successful, there should be three result files:
cp "$TMPDIR"/pycno-families.fa "$OUTDIR"/pycno-families.fa
cp "$TMPDIR"/pycno-families.stk "$OUTDIR"/pycno-families.stk
cp "$TMPDIR"/pycno-rmod.log "$OUTDIR"/pycno-rmod.log
\ No newline at end of file
#!/usr/bin/env bash
#
#SBATCH --job-name=repeatmasker_pycno
#SBATCH --cpus-per-task=4
#SBATCH --mem=500M
#SBATCH --time=4:00:00
#SBATCH --mail-type=ALL
#SBATCH --mail-user=nikolaos.papadopoulos@univie.ac.at
#SBATCH --output=/lisc/user/papadopoulos/log/pycno-repeat-masker-%j.out
#SBATCH --error=/lisc/user/papadopoulos/log/pycno-repeat-masker-%j.err
# RepeatMasker is a lot less resource-intense than RepeatModeler, so it makes sense to run it as a separate script.
# it ran out of memory with 500M, but it's unclear how much the last step needs
module load repeatmasker/4.1.6-3.12.4-5.40.0
SCAFFOLDS=/lisc/scratch/zoology/pycnogonum/genome/draft/draft.fasta
OUTDIR=/lisc/scratch/zoology/pycnogonum/genome/draft/repeats/repeat_masker_gff
mkdir -p "$OUTDIR" || exit 1
cd "$OUTDIR" || exit 1
FAMILIES="../repeat_modeller/pycno-families.fa"
RepeatMasker -pa 4 -xsmall -gff -dir "$TMPDIR" -lib "$FAMILIES" "$SCAFFOLDS"
# copy the results to the output directory:
cp "$TMPDIR"/*.masked "$OUTDIR"/
cp "$TMPDIR"/*.out "$OUTDIR"/
cp "$TMPDIR"/*.tbl "$OUTDIR"/
cp "$TMPDIR"/*.cat* "$OUTDIR"/
cp "$TMPDIR"/*.gff "$OUTDIR"/
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment