Path to VCF OUTPUT : /mnt/scratch/users/doo513/from_fsp504/rna_seq_oct22/reads/RawData/Correct_Variant_calling
To study the diversity of the Moringa panel, the mRNA-Seq data (fasta) was mapped against the CDS models of the reference genome (MoringaV2.cds.fa) using STAR 1 to obtain the bam files required to perform a variant calling analysis using GATK version 4.3.0.0 2. Variant hard-filtering was performed with GATK by filtering variants with the following parameters: QUAL <30.0, MQRankSum < -2.5 and QD < 2.0.  After that, all the variants of the 36 samples were merged using VCFTOOLS version 0.1.16 3. Finally, the genotypes were filtered if not fulfilling the following: minimum allele count = 5, minimum number of alleles = 2, maximum missing data 25% (across all individuals), SNPs with > 3 alleles and quality < 30. The vcf files were converted to geno format using the vcf2geno (package LEA) in R and then used to assess the population stratification using TASSEL 4 and PSIKO v2 5. The package SNPRelate in R was used to infer a PCA analysis of the variants (function snpgdsPCA). Correlation between the SNPs and the PCA was also obtained using the function snpgdsPCACorr.

RESULTS
After filtering, the total number of variants was 149740. Loci are 90751


STAR INDEX
#!/bin/bash
#SBATCH --job-name=makeidx      
#SBATCH --mail-type=ALL         
#SBATCH --mail-user=doo513@york.ac.uk      
#SBATCH --ntasks=1                       
#SBATCH --mem-per-cpu=6gb                       
#SBATCH --time=00:40:00
#SBATCH --cpus-per-task=10                  
#SBATCH --output=basic_job_%j.log        
#SBATCH --account=biol-moringa-2022


module load bio/STAR

mkdir moringa_v2_idx

STAR --runThreadN 10 --runMode genomeGenerate \
--genomeDir moringa_v2_idx --genomeFastaFiles MoringaV2.genome.fa --genomeSAindexNbases 12



STAR ALIGNMENT
#!/bin/bash
#SBATCH --job-name=star     
#SBATCH --mail-type=ALL         
#SBATCH --mail-user=doo513@york.ac.uk      
#SBATCH --ntasks=1                       
#SBATCH --mem-per-cpu=6gb                       
#SBATCH --time=48:00:00
#SBATCH --cpus-per-task=40                  
#SBATCH --output=basic_job_%j.log        
#SBATCH --account=biol-moringa-2022


module load bio/STAR
module load bio/SAMtools

mkdir staroutput

file=star_outputs/samples_aln
while read -r line; do

/mnt/scratch/users/doo513/from_fsp504/rna_seq_oct22/reads/RawData/Reference_MoringaV2/MoringaV2.gff3
/mnt/scratch/users/doo513/Moringa23/Rawdata_Kenyagrp/star_outputs/IMo9.STAR.sorted.bam.bai


STAR --runMode genomeGenerate --runThreadN 40 --genomeDir moringa_v2_idx --genomeFastaFiles /mnt/scratch/users/doo513/from_fsp504/rna_seq_oct22/reads/RawData/Reference_MoringaV2/MoringaV2.cds.fa 

for dir in */; do 

STAR --runThreadN 20 --genomeDir moringa_v2_idx --readFilesIn ${dir}${dir%%/}_1.fq.gz ${dir}${dir%%/}_2.fq.gz \
--readFilesCommand gunzip -c \
--outFileNamePrefix star_outputs/${dir%%/}.STAR --outSAMmapqUnique 60 --outSAMtype BAM Unsorted

done 

######using the samples file and sample to loop and do the rest of the script
file=star_outputs/samples

#while read -r line; do
#samtools sort -@ 16 star_outputs/${line}.STARAligned.out.bam > star_outputs/${line}.STAR.sorted.bam
#samtools index -@ 16 star_outputs/${line}.STAR.sorted.bam 

#done < "$file"

#!/bin/bash
#SBATCH --job-name=star     
#SBATCH --mail-type=ALL         
#SBATCH --mail-user=doo513@york.ac.uk      
#SBATCH --ntasks=1                       
#SBATCH --mem-per-cpu=6gb                       
#SBATCH --time=48:00:00
#SBATCH --cpus-per-task=40                  
#SBATCH --output=basic_job_%j.log        
#SBATCH --account=biol-moringa-2022


#module load bio/STAR
#module load bio/SAMtools

#file=star_outputs/samples
#while read -r line; do

#/mnt/lustre/users/doo513/rna_seq_oct22/reads/RawData/MoringaV2.gff3
#/mnt/lustre/users/doo513/rna_seq_oct22/reads/RawData/star_outputs/MO1.STAR.sorted.bam.bai


#STAR --runMode genomeGenerate --runThreadN 40 --genomeDir moringa_v2_idx --genomeFastaFiles //userfs/doo513/w2k/MoringaV2.cds.fa 

#for dir in */; do 

#STAR --runThreadN 20 --genomeDir moringa_v2_idx --readFilesIn ${dir}${dir%%/}_1.fq.gz ${dir}${dir%%/}_2.fq.gz \
#--readFilesCommand gunzip -c \
#--outFileNamePrefix star_outputs/${dir%%/}.STAR --outSAMmapqUnique 60 --outSAMtype BAM Unsorted

#done 

######using the samples file and sample to loop and do the rest of the script
file=star_outputs/samples

#while read -r line; do
#samtools sort -@ 16 star_outputs/${line}.STARAligned.out.bam > star_outputs/${line}.STAR.sorted.bam
#samtools index -@ 16 star_outputs/${line}.STAR.sorted.bam 

#done < "$file"

STAR BAM_SORTED

#!/bin/bash
#SBATCH --job-name=star_sorted_fasta     
#SBATCH --mail-type=ALL         
#SBATCH --mail-user=doo513@york.ac.uk      
#SBATCH --partition=himem
#SBATCH 
#SBATCH 
#SBATCH --time=48:00:00                                           
#SBATCH --cpus-per-task=40              
#SBATCH --output=basic_job_%j.log        
#SBATCH --account=biol-moringa-2022

module load bio/SAMtools
module load bio/picard/2.25.5-Java-13            
module load bio/NGS/2.10.5-foss-2018b-Java-1.8
module load  bio/GATK/4.3.0.0-GCCcore-11.3.0-Java-11
module load bio/STAR/2.7.10a-GCC-11.2.0 
module load bio/VCFtools/0.1.16-GCC-11.2.0                                                                                                                                           
module load bio/BCFtools/1.15-GCC-11.2.0
module load bio/MAFFT/7.487-gompi-2021a-with-extensions
module load bio/trimAl/1.4.1-GCC-9.3.0
module load bio/IQ-TREE/2.2.1-gompi-2021b
module load bio/PLINK/2.00A2.3-GCC-10.3.0


#mkdir star_outputs

#STAR --runMode genomeGenerate --runThreadN 40 --genomeDir moringa_v2_idx --genomeFastaFiles /users/doo513/scratch/rna_seq_oct22/reads/MoringaV2.cds.fa 

####STAR for mapping in each folder of your samples
#for dir in */; do 
#STAR --runThreadN 20 --genomeDir moringa_v2_idx --readFilesIn ${dir}${dir%%/}_1.fq.gz ${dir}${dir%%/}_2.fq.gz \
#--readFilesCommand gunzip -c \
#--outFileNamePrefix star_outputs/${dir%%/}.STAR --outSAMmapqUnique 60 --outSAMtype BAM Unsorted
#done 

 
######using the samples file and sample to loop and do the rest of the script
file=star_outputs/samples

#mkdir VCFs
#samtools faidx /users/doo513/scratch/rna_seq_oct22/reads/MoringaV2.cds.fa
#samtools faidx/mnt/lustre/users/doo513/rna_seq_oct22/reads/MoringaV2.cds.fa.fai
#java -Xmx40g -jar $EBROOTPICARD/picard.jar CreateSequenceDictionary R=/users/doo513/scratch/rna_seq_oct22/reads/MoringaV2.cds.fa O=/users/doo513/scratch/rna_seq_oct22/reads/MoringaV2.cds.dict

#while read -r line; do
#samtools sort -@ 16 star_outputs/${line}.STARAligned.out.bam > star_outputs/${line}.STAR.sorted.bam
#samtools index -@ 16 star_outputs/${line}.STAR.sorted.bam 
#samtools flagstat star_outputs/${line}.STAR.sorted.bam > star_outputs/flagstat_${line}.txt
#java -Xmx7g -jar $EBROOTPICARD/picard.jar MarkDuplicates --INPUT star_outputs/${line}.STAR.sorted.bam --METRICS_FILE star_outputs/${line}_aligned_sorted_duplicates --OUTPUT star_outputs/${line}_aligned_sorted_duplicates.out.bam
#java -Xmx7g -jar $EBROOTPICARD/picard.jar AddOrReplaceReadGroups I=star_outputs/${line}_aligned_sorted_duplicates.out.bam O=star_outputs/${line}_aligned_sorted_duplicates_corrected.out.bam RGID=${line} RGLB=lib1 RGPL=illumina RGPU=moringa RGSM=${line}
#samtools index star_outputs/${line}_aligned_sorted_duplicates_corrected.out.bam

#mkdir -p Filtered_Variants
while read -r line; do
gatk --java-options "-Xmx7g" HaplotypeCaller -I star_outputs/${line}_aligned_sorted_duplicates_corrected.out.bam -R /users/doo513/scratch/rna_seq_oct22/reads/MoringaV2.cds.fa -O VCFs/${line}_aligned_sorted_duplicates_corrected.output_variants2.g.vcf -ERC GVCF
 gatk VariantFiltration \
   -R /users/doo513/scratch/rna_seq_oct22/reads/RawData/MoringaV2.cds.fa \
    -V /mnt/lustre/users/doo513/rna_seq_oct22/reads/RawData/VCFs/${line}_aligned_sorted_duplicates_corrected.output_variants2.g.vcf \
    -O /mnt/lustre/users/doo513/rna_seq_oct22/reads/RawData/Filtered_Variants/${line}_aligned_sorted_duplicates.output_variants_Filtered_Variants_output_non_ref.g.vcf \
    --filter-name "QD" \
    --filter-expression "QD < 2.0" \
    --filter-name "QUAL" \
   --filter-expression "QUAL < 30.0" \
   --filter-name "MQRankSum" \
   --filter-expression "MQRankSum < -2.5" \
   --filter-name "MQ" \
   --filter-expression "MQ < 40" \
   --filter-name "FS60" \
   --filter-expression "FS > 60" \
   --filter-name "SOR" \
   --filter-expression "SOR > 3" \
   --filter-name "ReadPosRankSum" \
   --filter-expression "ReadPosRankSum < -8.0"
done < "$file"

####combine all the files created 
#mkdir all_combined.vcf
#find ./Filtered_Variants/ -type f -name "*Filtered_Variants_output.g.vcf" >input.list 
##### the input.list file might need some modification of the path or the files will not be found later in CombineGVCFs

###give more memjava 20g and lots of time
#gatk CombineGVCFs --java-options "-Xmx20g" -R /users/doo513/scratch/rna_seq_oct22/reads/RawData/MoringaV2.cds.fa --variant input.list -O all_combined.vcf ####this file can not be modified
####now apply gneotype 

######################filter the genotypes
#TEST
#vcftools --vcf all_combined.vcf --out OUTPUT_1_all_combined_filtered.vcf --max-missing 0.25 --minGQ 20 --minDP 10 --remove-indels --min-alleles 2 --max-alleles 3 --mac 5--recode --recode-INFO-all
###GQ=Genotype Quality, minDP=minimum 10 reads per site, remove-indels=removing insertions and deletions, mac=minimum allele count--Minor allele frequency (MAF) is the frequency at which the second most common allele occurs in a given population
###--max-missing 0.25----> at least present in 25% of your samples



#####Part 2
#git clone https://bitbucket.org/tasseladmin/tassel-5-standalone.git
###to install tassel in the reads folder
#/mnt/lustre/users/doo513/rna_seq_oct22/reads/tassel-5-standalone/run_pipeline.pl -Xmx40g -SortGenotypeFilePlugin -inputFile /mnt/lustre/users/doo513/rna_seq_oct22/reads/RawData/OUTPUT_1_all_combined_filtered.vcf.recode.vcf -outputFile OUTPUT_1_all_combined_filtered_sorted.vcf -fileType VCF
#/mnt/lustre/users/doo513/rna_seq_oct22/reads/tassel-5-standalone/run_pipeline.pl-Xmx40g -fork1 -vcf OUTPUT_1_all_combined_filtered_sorted.vcf -export OUTPUT_1_all_combined_filtered_sorted_tassel -exportType Hapmap -runfork1

#vcftools --vcf OUTPUT_1_all_combined_filtered_sorted_tassel --out vcf_final --plink

###in R to get .geno
#$library(LEA)
#$ped2geno("/users/sfo503/scratch/chloroplast_phylogeny_196_UK/ash_RNAseq_2017/vcf_rice.ped", force=TRUE, output.file="/users/sfo503/scratch/chloroplast_phylogeny_196_UK/ash_RNAseq_2017/vcf_rice.")

###PSIKO not done
#/users/sfo503/scratch/Danish/PSIKOv2/PSIKOv2/PSIKOBinary/Linux/PSIKO -i vcf_rice.geno
#bcftools query -l vcf_all_combined_genotypes_SNPs_filtered_2_ordered.vcf >names

####GET fasta sequences from each vcf file using Nipponbare as reference
#file=star_outputs/samples
#mkdir fasta
#while read -r line; do
#bcftools view -f PASS /mnt/lustre/users/doo513/rna_seq_oct22/reads/RawData/Filtered_Variants/${line}_aligned_sorted_duplicates.output_variants_Filtered_Variants_output.g.vcf > /mnt/lustre/users/doo513/rna_seq_oct22/reads/RawData/Filtered_Variants/${line}_aligned_sorted_duplicates.output_variants_Filtered_Variants_output_pass_bcf.g.vcf
#gatk --java-options "-Xmx40g" HaplotypeCaller gatk IndexFeatureFile  -I genomeDir moringa_v2_idx --genomeFastaFiles /users/doo513/scratch/rna_seq_oct22/reads/MoringaV2.cds.fa  /mnt/lustre/users/doo513/rna_seq_oct22/reads/RawData/Filtered_Variants/${line}_aligned_sorted_duplicates.output_variants_Filtered_Variants_output_pass.g.vcf
#bgzip < /mnt/lustre/users/doo513/rna_seq_oct22/reads/RawData/Filtered_Variants/${line}_aligned_sorted_duplicates.output_variants_Filtered_Variants_output_pass_bcf.g.vcf > /mnt/lustre/users/doo513/rna_seq_oct22/reads/RawData/Filtered_Variants/${line}_aligned_sorted_duplicates.output_variants_Filtered_Variants_output_pass_bcf.g.vcf.gz

#gatk IndexFeatureFile -F /mnt/lustre/users/doo513/rna_seq_oct22/reads/RawData/Filtered_Variants/${line}_aligned_sorted_duplicates.output_variants_Filtered_Variants_output_pass_bcf.g.vcf.gz

#cat /users/doo513/scratch/rna_seq_oct22/reads/MoringaV2.cds.fa  | bcftools consensus  /mnt/lustre/users/doo513/rna_seq_oct22/reads/RawData/Filtered_Variants/${line}_aligned_sorted_duplicates.output_variants_Filtered_Variants_output_pass_bcf.g.vcf.gz > /mnt/lustre/users/doo513/rna_seq_oct22/reads/RawData/Filtered_Variants/${line}­_aligned_sorted_duplicates.output_variants_Filtered_Variants_output_pass_bcf.fasta
#done < "$file"


#####next step is to rename the fasta files so they only contain the name of your sample
#mkdir fasta
###cp ./Filtered_Variants/*.fasta fasta/
#cp ./Filtered_Variants/*.fasta fasta/

#rename "­_aligned_sorted_duplicates.output_variants_Filtered_Variants_output_pass_bcf" "" *
####modify the name in fasta
#for FILE in *.fasta;
#do
#awk '/>/{sub(">","&"FILENAME"_");sub(/\.fasta/ ,x)}1' $FILE > changed_${FILE}.fasta
#done

###delete the ones that do not have the correct name 
####rename again the changed ones
#rename "changed_" "" *
#rename ".fasta.fasta" ".fasta" *
###create combined_fasta
#cat *.fasta > combined_fasta/combined.txt

#cat combined5.fasta > combined5.fasta/combined5.txt
 

####alignment and trimming of the cDNA
#mafft --thread 24 ./fasta/combined_fasta/combined.txt > ./fasta/combined_fasta/combined_aligned.txt
#trimal -in ./fasta/combined_fasta/combined_aligned.txt -out ./fasta/combined_fasta/combined_aligned_trimmed.txt -automated1 -phylip
#iqtree ./fasta/combined_fasta/combined_aligned_trimmed.txt  --alrt 1000 -B 1000



####################REPEAT 
##file=star_outputs/samples
##while read -r line; do
##sed 's/<NON_REF>//g' /mnt/lustre/users/doo513/rna_seq_oct22/reads/RawData/Filtered_Variants/${line}_aligned_sorted_duplicates.output_variants_Filtered_Variants_output.g.vcf > /mnt/lustre/users/doo513/rna_seq_oct22/reads/RawData/Filtered_Variants/${line}_aligned_sorted_duplicates.output_variants_Filtered_Variants_output_non_ref.g.vcf 
##done < "$file"



####combine all the files created 
#mkdir all_combined.vcf
#find ./Filtered_Variants/ -type f -name "*Filtered_Variants_output.g.vcf" >input.list 
##### the input.list file might need some modification of the path or the files will not be found later in CombineGVCFs

###give more memjava 40g and lots of time
gatk CombineGVCFs --java-options "-Xmx40g" -R /users/doo513/scratch/rna_seq_oct22/reads/RawData/MoringaV2.cds.fa --variant input.list -O all_combined_non_ref.vcf ####this file can not be modified
####now apply gneotype 

######################filter the genotypes
#TEST
vcftools --vcf all_combined_non_ref.vcf --out OUTPUT_1_all_combined_filtered_non_ref.vcf --max-missing 0.25 --minGQ 20 --minDP 10 --remove-indels --min-alleles 2 --max-alleles 3 --mac 5--recode --recode-INFO-all
###GQ=Genotype Quality, minDP=minimum 10 reads per site, remove-indels=removing insertions and deletions, mac=minimum allele count--Minor allele frequency (MAF) is the frequency at which the second most common allele occurs in a given population
###--max-missing 0.25----> at least present in 25% of your samples



#####Part 2
#git clone https://bitbucket.org/tasseladmin/tassel-5-standalone.git
###to install tassel in the reads folder
/mnt/lustre/users/doo513/rna_seq_oct22/reads/tassel-5-standalone/run_pipeline.pl -Xmx40g -SortGenotypeFilePlugin -inputFile /mnt/lustre/users/doo513/rna_seq_oct22/reads/RawData/OUTPUT_1_all_combined_filtered_non_ref.vcf.recode.vcf -outputFile OUTPUT_1_all_combined_filtered_sorted_non_ref.vcf -fileType VCF
/mnt/lustre/users/doo513/rna_seq_oct22/reads/tassel-5-standalone/run_pipeline.pl-Xmx40g -fork1 -vcf OUTPUT_1_all_combined_filtered_sorted_non_ref.vcf -export OUTPUT_1_all_combined_filtered_sorted_tassel_non_ref -exportType Hapmap -runfork1

bcftools view -H OUTPUT_1_all_combined_filtered_sorted_non_ref.vcf | cut -f 1 | uniq | awk '{print $0"\t"$0}' > OUTPUT_1_all_combined_filtered_sorted_non_ref.chrom-map.txt
vcftools --vcf OUTPUT_1_all_combined_filtered_sorted_non_ref.vcf --plink --chrom-map OUTPUT_1_all_combined_filtered_sorted_non_ref.chrom-map.txt --out vcf_Dorcas


#####check if non_ref is still there
 #tail -10 all_combined_non_ref.vcf

GATK VARIANT CALLING SCRIPT

#!/bin/bash
#SBATCH --job-name=Rs      # Job name
#SBATCH --mail-type=END,FAIL               # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=sara.francoortega@york.ac.uk        # Where to send mail
#SBATCH --ntasks=1                         # Run a single task...
#SBATCH --cpus-per-task=20                 # ...with four cores
#SBATCH                        # Job memory request
#SBATCH --time=10:00:00                    # Time limit hrs:min:sec
#SBATCH --output=threaded_job_%j.log       # Standard output and error log
#SBATCH --account=biol-ralstonph-2023           # Project account
#SBATCH 
#SBATCH

echo My working directory is `pwd`
echo Running job on host:
echo -e '\t'`hostname` at `date`
echo $SLURM_CPUS_ON_NODE CPU cores available
echo
module load lang/Java
#module load bio/minimap2
#module load bio/SAMtools
#module load bio/Flye
#module load bio/Racon
module load bio/canu
#module load bio/BEDTools
##module load bio/medaka
#module load bio/quast
module load bio/BWA
module load GATK/4.3.0.0-GCCcore-11.3.0-Java-11 
module load picard/2.25.5-Java-13
module load VCFtools/0.1.16-GCC-11.2.0
module load BCFtools/1.15.1-GCC-11.3.0 

#ls ./*.bam > list_files
#sed 's/_aligned_sorted_duplicates_corrected.out.bam//g' list_files > list_files2
#sed 's\./\\g' list_files2 > list_files3

#mkdir /users/sfo503/scratch/Dorcas/vcf_rep
#mkdir ./alignment_cdna

#java -jar $EBROOTPICARD/picard.jar CreateSequenceDictionary R=MoringaV2.cds.fa O=MoringaV2.cds.dict

#file=list_files3
#while read -r line; do
 ##bwa mem -t 8 ./MoringaV2.cds.fa /trimmed/${line}.fastq.gz > ./alignment_cdna/${line}_aligned_gene.sam
 ##samtools view -S -b ./alignment_cdna/${line}_aligned_gene.sam >  ./alignment_cdna/${line}_aligned_gene.bam  
 ##samtools sort ./alignment_cdna/${line}_aligned_gene.bam  --threads $SLURM_CPUS_PER_TASK -o  ./alignment_cdna/${line}_aligned_gene_sorted.bam 
 ##samtools index  ./alignment_cdna/${line}_aligned_gene_sorted.bam
 ##samtools flagstat  ./alignment_cdna/${line}_aligned_gene_sorted.bam > flagstat_${line}.txt
 #java -Xmx7g -jar /users/sfo503/scratch/chloroplast_phylogeny_196_UK/ash_RNAseq_2017/picard.jar MarkDuplicates --INPUT  ./bwa/${line}.sorted.bam --METRICS_FILE /users/sfo503/scratch/rice_2022/vcf/${line}_metrics --OUTPUT /users/sfo503/scratch/rice_2022/vcf/${line}_duplicates.out.bam
 #samtools sort /users/sfo503/scratch/rice_2022/vcf/${line}_duplicates.out.bam --threads $SLURM_CPUS_PER_TASK -o /users/sfo503/scratch/rice_2022/vcf/${line}_duplicates_sorted.out.bam 
 #java -Xmx7g -jar /users/sfo503/scratch/chloroplast_phylogeny_196_UK/ash_RNAseq_2017/picard.jar ValidateSamFile I=/users/sfo503/scratch/rice_2022/vcf/${line}_duplicates_sorted.out.bam  MODE=SUMMARY
 #java -Xmx7g -jar /users/sfo503/scratch/chloroplast_phylogeny_196_UK/ash_RNAseq_2017/picard.jar AddOrReplaceReadGroups I=/users/sfo503/scratch/rice_2022/vcf/${line}_duplicates_sorted.out.bam O=/users/sfo503/scratch/rice_2022/vcf/${line}_duplicates_sorted_corrected.out.bam RGID=${line} RGLB=lib1 RGPL=illumina RGPU=Rice RGSM=${line}
 #samtools index /users/sfo503/scratch/rice_2022/vcf/${line}_duplicates_sorted_corrected.out.bam
 #gatk --java-options "-Xmx7g" HaplotypeCaller -I ./${line}_aligned_sorted_duplicates_corrected.out.bam -R MoringaV2.cds.fa -O vcf_rep/${line}_duplicates_sorted_corrected_variants.g.vcf -ERC GVCF
 #gatk VariantFiltration \
 #  -R MoringaV2.cds.fa \
 #  -V vcf_rep/${line}_duplicates_sorted_corrected_variants.g.vcf \
 #  -O vcf_rep/${line}_duplicates_sorted_corrected_variants_filtered_output_rep.g.vcf \
 #  --filter-name "QD" \
 #  --filter-expression "QD < 2.0" \
 #  --filter-name "QUAL" \
 #  --filter-expression "QUAL <30.0" \
 #  --filter-name "MQRankSum" \
 #  --filter-expression "MQRankSum < -2.5"
 #  --filter-name "MQ" \
 # --filter-expression "MQ < 40"
 #  --filter-name "FS60" \
 #  --filter-expression "FS > 60"
 #  --filter-name "SOR" \
 #  --filter-expression "SOR > 3"
 #  --filter-name "ReadPosRankSum" \
 #  --filter-expression "ReadPosRankSum < -8.0" 
#done < "$file"

#mkdir all_vcf
#find ./vcf_rep -type f -name "*filtered_output_rep.g.vcf" >input.list ###need add users/sfo503///in file


###give more memjava 20g and lots of time
#gatk CombineGVCFs --java-options "-Xmx20g" -R MoringaV2.cds.fa  --variant input.list -O vcf_all_combined_rep.vcf
#gatk GenotypeGVCFs --java-options "-Xmx40g" -R MoringaV2.cds.fa  --variant  vcf_all_combined_rep.vcf -O  vcf_all_combined_genotypes.vcf
#gatk CountVariants --java-options "-Xmx20g" -V  vcf_all_combined_genotypes.vcf
#
#vcftools --vcf vcf_all_combined_genotypes.vcf --minQ 15 --max-alleles 3 --remove-indels --out  vcf_all_combined_genotypes_SNPs_filtered  --recode --recode-INFO-all
#vcftools --vcf vcf_all_combined_genotypes_SNPs_filtered.recode.vcf --min-alleles 2 --mac 3 --out vcf_all_combined_genotypes_SNPs_filtered_2 --recode --recode-INFO-all




#vcftools --vcf vcf_all_combined_genotypes_SNPs_filtered_2.recode.vcf --out vcf_moringa --plink
####now in R
module load R/4.2.1-foss-2022a 

###in R to get .geno
#library(LEA)
#ped2geno("/users/sfo503/scratch/Dorcas/vcf_moringa.ped", force=TRUE, output.file="/users/sfo503/scratch/Dorcas/vcf_moringa.geno")

###PSIKO not working
###TrySPAGEDI?
##################dont use vcftools use tessel for the hapmap below
####use Tassel NOT DONE!
#/mnt/scratch/users/sfo503/Dorcas/tassel-3.0-src/run_pipeline.pl -Xmx40g -SortGenotypeFilePlugin -inputFile vcf_all_combined_genotypes_SNPs_filtered_2.recode.vcf -outputFile vcf_all_combined_genotypes_SNPs_filtered_2_ordered -fileType VCF
#/mnt/scratch/users/sfo503/Dorcas/tassel-3.0-src/run_pipeline.pl -Xmx40g -fork1 -vcf vcf_all_combined_genotypes_SNPs_filtered_2_ordered.vcf -export vcf_all_combined_genotypes_SNPs_filtered_2_ordered_tassel -exportType Hapmap -runfork1


########GET FASTA FOR EACH
####GET fasta sequences from each vcf file using MoringaV2.cds.fa as reference
file=list_files3
mkdir fasta
while read -r line; do
 bcftools view -f PASS /users/sfo503/scratch/Dorcas/vcf_rep/${line}_duplicates_sorted_corrected_variants_filtered_output_rep.g.vcf > /users/sfo503/scratch/Dorcas/vcf_rep/${line}_duplicates_sorted_corrected_variants_filtered_output_rep_bcf.g.vcf
 gatk IndexFeatureFile -I /users/sfo503/scratch/Dorcas/vcf_rep/${line}_duplicates_sorted_corrected_variants_filtered_output_rep_bcf.g.vcf
 bgzip < /users/sfo503/scratch/Dorcas/vcf_rep/${line}_duplicates_sorted_corrected_variants_filtered_output_rep_bcf.g.vcf > /users/sfo503/scratch/Dorcas/vcf_rep/${line}_duplicates_sorted_corrected_variants_filtered_output_rep_bcf.g.vcf.gz
 gatk IndexFeatureFile -I  /users/sfo503/scratch/Dorcas/vcf_rep/${line}_duplicates_sorted_corrected_variants_filtered_output_rep_bcf.g.vcf.gz
 cat MoringaV2.cds.fa| bcftools consensus  /users/sfo503/scratch/Dorcas/vcf_rep/${line}_duplicates_sorted_corrected_variants_filtered_output_rep_bcf.g.vcf.gz >  /users/sfo503/scratch/Dorcas/fasta/${line}_duplicates_sorted_corrected_variants_filtered_output_rep_bcf.fasta
done < "$file"

##change the names of each file to only the ID using rename
#rename "_duplicates_sorted_corrected_variants_filtered_output_rep_bcf" "" *

Last Filtering Step
My working directory is /users/sfo503/scratch/Dorcas
Running job on host:
	node089.viking2.yor.alces.network at Thu Jan 4 07:53:14 GMT 2024
40 CPU cores available

Lmod has detected the following error: The following module(s) are unknown:
"lang/Java"

Please check the spelling or version number. Also try "module spider ..."
It is also possible your cache file is out-of-date; it may help to try:
  $ module --ignore_cache load "lang/Java"

Also make sure that all modulefiles written in TCL start with the string
#%Module



Lmod has detected the following error: The following module(s) are unknown:
"bio/canu"

Please check the spelling or version number. Also try "module spider ..."
It is also possible your cache file is out-of-date; it may help to try:
  $ module --ignore_cache load "bio/canu"

Also make sure that all modulefiles written in TCL start with the string
#%Module



Lmod has detected the following error: The following module(s) are unknown:
"bio/BWA"

Please check the spelling or version number. Also try "module spider ..."
It is also possible your cache file is out-of-date; it may help to try:
  $ module --ignore_cache load "bio/BWA"

Also make sure that all modulefiles written in TCL start with the string
#%Module



To execute picard run: java -jar $EBROOTPICARD/picard.jar

The following have been reloaded with a version change:
  1) Java/11.0.20 => Java/13.0.2


The following have been reloaded with a version change:
  1) GCCcore/11.3.0 => GCCcore/11.2.0
  2) XZ/5.2.5-GCCcore-11.3.0 => XZ/5.2.5-GCCcore-11.2.0
  3) binutils/2.38-GCCcore-11.3.0 => binutils/2.37-GCCcore-11.2.0
  4) bzip2/1.0.8-GCCcore-11.3.0 => bzip2/1.0.8-GCCcore-11.2.0
  5) libreadline/8.1.2-GCCcore-11.3.0 => libreadline/8.1-GCCcore-11.2.0
  6) ncurses/6.3-GCCcore-11.3.0 => ncurses/6.2-GCCcore-11.2.0
  7) zlib/1.2.12-GCCcore-11.3.0 => zlib/1.2.11-GCCcore-11.2.0


The following have been reloaded with a version change:
  1) GCC/11.2.0 => GCC/11.3.0
  2) GCCcore/11.2.0 => GCCcore/11.3.0
  3) HTSlib/1.14-GCC-11.2.0 => HTSlib/1.15.1-GCC-11.3.0
  4) XZ/5.2.5-GCCcore-11.2.0 => XZ/5.2.5-GCCcore-11.3.0
  5) binutils/2.37-GCCcore-11.2.0 => binutils/2.38-GCCcore-11.3.0
  6) bzip2/1.0.8-GCCcore-11.2.0 => bzip2/1.0.8-GCCcore-11.3.0
  7) cURL/7.78.0-GCCcore-11.2.0 => cURL/7.83.0-GCCcore-11.3.0
  8) zlib/1.2.11-GCCcore-11.2.0 => zlib/1.2.12-GCCcore-11.3.0

Using GATK jar /opt/apps/eb/software/GATK/4.3.0.0-GCCcore-11.3.0-Java-11/gatk-package-4.3.0.0-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx40g -jar /opt/apps/eb/software/GATK/4.3.0.0-GCCcore-11.3.0-Java-11/gatk-package-4.3.0.0-local.jar GenotypeGVCFs -R MoringaV2.cds.fa --variant vcf_all_combined_rep.vcf -O vcf_all_combined_genotypes.vcf
07:53:27.407 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/opt/apps/eb/software/GATK/4.3.0.0-GCCcore-11.3.0-Java-11/gatk-package-4.3.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
07:53:27.641 INFO  GenotypeGVCFs - ------------------------------------------------------------
07:53:27.641 INFO  GenotypeGVCFs - The Genome Analysis Toolkit (GATK) v4.3.0.0
07:53:27.641 INFO  GenotypeGVCFs - For support and documentation go to https://software.broadinstitute.org/gatk/
07:53:27.641 INFO  GenotypeGVCFs - Executing as sfo503@node089.viking2.yor.alces.network on Linux v4.18.0-477.15.1.el8_8.x86_64 amd64
07:53:27.641 INFO  GenotypeGVCFs - Java runtime: OpenJDK 64-Bit Server VM v13.0.2+8
07:53:27.641 INFO  GenotypeGVCFs - Start Date/Time: January 4, 2024 at 7:53:27 AM GMT
07:53:27.641 INFO  GenotypeGVCFs - ------------------------------------------------------------
07:53:27.641 INFO  GenotypeGVCFs - ------------------------------------------------------------
07:53:27.642 INFO  GenotypeGVCFs - HTSJDK Version: 3.0.1
07:53:27.642 INFO  GenotypeGVCFs - Picard Version: 2.27.5
07:53:27.642 INFO  GenotypeGVCFs - Built for Spark Version: 2.4.5
07:53:27.642 INFO  GenotypeGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 2
07:53:27.642 INFO  GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
07:53:27.642 INFO  GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
07:53:27.642 INFO  GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
07:53:27.642 INFO  GenotypeGVCFs - Deflater: IntelDeflater
07:53:27.643 INFO  GenotypeGVCFs - Inflater: IntelInflater
07:53:27.643 INFO  GenotypeGVCFs - GCS max retries/reopens: 20
07:53:27.643 INFO  GenotypeGVCFs - Requester pays: disabled
07:53:27.643 INFO  GenotypeGVCFs - Initializing engine
07:53:28.072 INFO  FeatureManager - Using codec VCFCodec to read file file:///mnt/scratch/users/sfo503/Dorcas/vcf_all_combined_rep.vcf
07:53:28.601 INFO  GenotypeGVCFs - Done initializing engine
07:53:28.805 INFO  ProgressMeter - Starting traversal
07:53:28.806 INFO  ProgressMeter -        Current Locus  Elapsed Minutes    Variants Processed  Variants/Minute
07:53:38.842 INFO  ProgressMeter -    Morol01g04730:198              0.2                252000        1506576.3
07:53:48.852 INFO  ProgressMeter -   Morol01g08410:1100              0.3                547000        1637234.4
07:53:58.867 INFO  ProgressMeter -    Morol01g12540:451              0.5                831000        1658627.5
07:54:08.881 INFO  ProgressMeter -    Morol01g16690:625              0.7               1123000        1681347.5
07:54:18.887 INFO  ProgressMeter -   Morol01g20800:2237              0.8               1408000        1686867.3
07:54:28.909 INFO  ProgressMeter -    Morol01g24840:144              1.0               1692000        1689100.4
07:54:38.939 INFO  ProgressMeter -   Morol01g28870:1786              1.2               1978000        1692213.4
07:54:48.955 INFO  ProgressMeter -    Morol01g33740:575              1.3               2260000        1691848.9
07:54:58.966 INFO  ProgressMeter -    Morol02g04110:311              1.5               2550000        1696983.1
07:55:08.966 INFO  ProgressMeter -    Morol02g07960:732              1.7               2830000        1695287.5
07:55:18.967 INFO  ProgressMeter -   Morol02g11960:1247              1.8               3116000        1697152.3
07:55:28.979 INFO  ProgressMeter -    Morol02g16390:116              2.0               3392000        1693558.5
07:55:34.669 WARN  MinimalGenotypingEngine - Attempting to genotype more than 50 alleles. Site will be skipped at location Morol02g18690:159
07:55:38.990 INFO  ProgressMeter -    Morol02g20820:514              2.2               3672000        1692373.9
07:55:49.020 INFO  ProgressMeter -   Morol03g01990:1553              2.3               3949000        1689845.5
07:55:59.028 INFO  ProgressMeter -   Morol03g06090:1424              2.5               4227000        1688312.6
07:56:09.039 INFO  ProgressMeter -    Morol03g09720:879              2.7               4516000        1691037.4
07:56:19.044 INFO  ProgressMeter -   Morol03g13830:1284              2.8               4806000        1693863.9
07:56:24.376 WARN  MinimalGenotypingEngine - Attempting to genotype more than 50 alleles. Site will be skipped at location Morol03g16100:660
07:56:26.097 WARN  MinimalGenotypingEngine - Attempting to genotype more than 50 alleles. Site will be skipped at location Morol03g16920:347
07:56:29.044 INFO  ProgressMeter -    Morol03g17820:896              3.0               5090000        1694426.3
07:56:39.044 INFO  ProgressMeter -   Morol03g21870:2494              3.2               5372000        1694298.7
07:56:49.073 INFO  ProgressMeter -    Morol04g03420:448              3.3               5656000        1694537.8
07:56:56.958 WARN  MinimalGenotypingEngine - Attempting to genotype more than 50 alleles. Site will be skipped at location Morol04g06800:826
07:56:59.077 INFO  ProgressMeter -     Morol04g07720:77              3.5               5941000        1695240.9
07:57:09.106 INFO  ProgressMeter -     Morol04g12460:47              3.7               6219000        1693781.2
07:57:17.902 WARN  MinimalGenotypingEngine - Attempting to genotype more than 50 alleles. Site will be skipped at location Morol05g01880:427
07:57:19.117 INFO  ProgressMeter -    Morol05g02350:727              3.8               6502000        1693883.5
07:57:29.136 INFO  ProgressMeter -    Morol05g06500:232              4.0               6780000        1692672.6
07:57:39.139 INFO  ProgressMeter -   Morol05g11560:2808              4.2               7058000        1691673.5
07:57:49.141 INFO  ProgressMeter -   Morol05g15060:1239              4.3               7343000        1692357.9
07:57:59.141 INFO  ProgressMeter -   Morol06g02180:1131              4.5               7609000        1688793.5
07:58:09.157 INFO  ProgressMeter -   Morol06g05920:1749              4.7               7876000        1685601.3
07:58:19.171 INFO  ProgressMeter -   Morol06g10000:1006              4.8               8165000        1687186.8
07:58:29.197 INFO  ProgressMeter -   Morol06g14100:1068              5.0               8460000        1689797.6
07:58:39.203 INFO  ProgressMeter -   Morol07g00760:2014              5.2               8751000        1691575.6
07:58:49.229 INFO  ProgressMeter -     Morol07g05470:49              5.3               9029000        1690702.6
07:58:59.244 INFO  ProgressMeter -   Morol07g09870:1216              5.5               9317000        1691754.6
07:59:05.336 WARN  MinimalGenotypingEngine - Attempting to genotype more than 50 alleles. Site will be skipped at location Morol08g03000:954
07:59:09.273 INFO  ProgressMeter -   Morol08g04210:2863              5.7               9595000        1690912.8
07:59:19.278 INFO  ProgressMeter -    Morol08g08550:337              5.8               9883000        1691946.9
07:59:20.102 WARN  MinimalGenotypingEngine - Attempting to genotype more than 50 alleles. Site will be skipped at location Morol08g08960:370
07:59:29.298 INFO  ProgressMeter -     Morol08g12450:58              6.0              10176000        1693685.3
07:59:39.318 INFO  ProgressMeter -   Morol08g16840:3611              6.2              10472000        1695815.5
07:59:45.101 WARN  MinimalGenotypingEngine - Attempting to genotype more than 50 alleles. Site will be skipped at location Morol09g00570:836
07:59:49.320 INFO  ProgressMeter -  Morol09g02350:12074              6.3              10754000        1695706.3
07:59:57.271 WARN  MinimalGenotypingEngine - Attempting to genotype more than 50 alleles. Site will be skipped at location Morol09g05320:319
07:59:59.321 INFO  ProgressMeter -    Morol09g06350:980              6.5              11042000        1696528.9
08:00:09.344 INFO  ProgressMeter -   Morol09g10350:1350              6.7              11321000        1695869.1
08:00:19.376 INFO  ProgressMeter -    Morol09g15020:543              6.8              11608000        1696373.3
08:00:29.404 INFO  ProgressMeter -   Morol10g02830:2965              7.0              11888000        1695871.1
08:00:39.407 INFO  ProgressMeter -    Morol10g07310:373              7.2              12163000        1694794.0
08:00:49.408 INFO  ProgressMeter -   Morol10g11290:4878              7.3              12451000        1695547.7
08:00:59.409 INFO  ProgressMeter -    Morol10g15500:670              7.5              12737000        1695994.0
08:01:09.413 INFO  ProgressMeter -    Morol11g02880:702              7.7              13022000        1696283.4
08:01:19.433 INFO  ProgressMeter -    Morol11g06750:640              7.8              13310000        1696885.2
08:01:29.442 INFO  ProgressMeter -     Morol12g00980:89              8.0              13591000        1696630.5
08:01:39.460 INFO  ProgressMeter -   Morol12g04880:3306              8.2              13884000        1697815.6
08:01:44.483 WARN  MinimalGenotypingEngine - Attempting to genotype more than 50 alleles. Site will be skipped at location Morol12g06800:234
08:01:49.481 INFO  ProgressMeter -   Morol12g08790:1403              8.3              14163000        1697268.7
08:01:59.514 INFO  ProgressMeter -   Morol13g01310:1321              8.5              14429000        1695176.1
08:02:05.808 WARN  MinimalGenotypingEngine - Attempting to genotype more than 50 alleles. Site will be skipped at location Morol13g03640:1308
08:02:09.523 INFO  ProgressMeter -    Morol13g05180:358              8.7              14710000        1694970.6
08:02:19.539 INFO  ProgressMeter -   Morol13g08920:2144              8.8              14997000        1695432.0
08:02:25.160 WARN  MinimalGenotypingEngine - Attempting to genotype more than 50 alleles. Site will be skipped at location Morol63g00020:243
08:02:25.217 INFO  ProgressMeter -    Morol63g00050:289              8.9              15147446        1694310.4
08:02:25.217 INFO  ProgressMeter - Traversal complete. Processed 15147446 total variants in 8.9 minutes.
08:02:25.290 INFO  GenotypeGVCFs - Shutting down engine
[January 4, 2024 at 8:02:25 AM GMT] org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs done. Elapsed time: 8.97 minutes.
Runtime.totalMemory()=3221225472
Using GATK jar /opt/apps/eb/software/GATK/4.3.0.0-GCCcore-11.3.0-Java-11/gatk-package-4.3.0.0-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx20g -jar /opt/apps/eb/software/GATK/4.3.0.0-GCCcore-11.3.0-Java-11/gatk-package-4.3.0.0-local.jar CountVariants -V vcf_all_combined_genotypes.vcf
08:02:26.725 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/opt/apps/eb/software/GATK/4.3.0.0-GCCcore-11.3.0-Java-11/gatk-package-4.3.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
08:02:26.893 INFO  CountVariants - ------------------------------------------------------------
08:02:26.893 INFO  CountVariants - The Genome Analysis Toolkit (GATK) v4.3.0.0
08:02:26.894 INFO  CountVariants - For support and documentation go to https://software.broadinstitute.org/gatk/
08:02:26.894 INFO  CountVariants - Executing as sfo503@node089.viking2.yor.alces.network on Linux v4.18.0-477.15.1.el8_8.x86_64 amd64
08:02:26.894 INFO  CountVariants - Java runtime: OpenJDK 64-Bit Server VM v13.0.2+8
08:02:26.894 INFO  CountVariants - Start Date/Time: January 4, 2024 at 8:02:26 AM GMT
08:02:26.894 INFO  CountVariants - ------------------------------------------------------------
08:02:26.894 INFO  CountVariants - ------------------------------------------------------------
08:02:26.894 INFO  CountVariants - HTSJDK Version: 3.0.1
08:02:26.894 INFO  CountVariants - Picard Version: 2.27.5
08:02:26.895 INFO  CountVariants - Built for Spark Version: 2.4.5
08:02:26.895 INFO  CountVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2
08:02:26.895 INFO  CountVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
08:02:26.895 INFO  CountVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
08:02:26.895 INFO  CountVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
08:02:26.895 INFO  CountVariants - Deflater: IntelDeflater
08:02:26.895 INFO  CountVariants - Inflater: IntelInflater
08:02:26.895 INFO  CountVariants - GCS max retries/reopens: 20
08:02:26.895 INFO  CountVariants - Requester pays: disabled
08:02:26.895 INFO  CountVariants - Initializing engine
08:02:26.971 INFO  FeatureManager - Using codec VCFCodec to read file file:///mnt/scratch/users/sfo503/Dorcas/vcf_all_combined_genotypes.vcf
08:02:27.275 INFO  CountVariants - Done initializing engine
08:02:27.275 INFO  ProgressMeter - Starting traversal
08:02:27.275 INFO  ProgressMeter -        Current Locus  Elapsed Minutes    Variants Processed  Variants/Minute
08:02:28.297 INFO  ProgressMeter -    Morol58g00020:547              0.0                184268       10828677.8
08:02:28.297 INFO  ProgressMeter - Traversal complete. Processed 184268 total variants in 0.0 minutes.
08:02:28.297 INFO  CountVariants - Shutting down engine
[January 4, 2024 at 8:02:28 AM GMT] org.broadinstitute.hellbender.tools.walkers.CountVariants done. Elapsed time: 0.03 minutes.
Runtime.totalMemory()=1136656384
Tool returned:
184268

VCFtools - 0.1.16
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
	--vcf vcf_all_combined_genotypes.vcf
	--recode-INFO-all
	--max-alleles 3
	--minQ 15
	--out vcf_all_combined_genotypes_SNPs_filtered
	--recode
	--remove-indels

Warning: Expected at least 2 parts in FORMAT entry: ID=PGT,Number=1,Type=String,Description="Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another; will always be heterozygous and is not intended to describe called alleles">
Warning: Expected at least 2 parts in FORMAT entry: ID=PID,Number=1,Type=String,Description="Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing group">
Warning: Expected at least 2 parts in FORMAT entry: ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
Warning: Expected at least 2 parts in FORMAT entry: ID=RGQ,Number=1,Type=Integer,Description="Unconditional reference genotype confidence, encoded as a phred quality -10*log10 p(genotype call is wrong)">
Warning: Expected at least 2 parts in INFO entry: ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed">
After filtering, kept 32 out of 32 Individuals
Outputting VCF file...
After filtering, kept 149740 out of a possible 184268 Sites
Run Time = 8.00 seconds

VCFtools - 0.1.16
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
	--vcf vcf_all_combined_genotypes_SNPs_filtered.recode.vcf
	--recode-INFO-all
	--mac 3
	--min-alleles 2
	--out vcf_all_combined_genotypes_SNPs_filtered_2
	--recode

Warning: Expected at least 2 parts in FORMAT entry: ID=PGT,Number=1,Type=String,Description="Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another; will always be heterozygous and is not intended to describe called alleles">
Warning: Expected at least 2 parts in FORMAT entry: ID=PID,Number=1,Type=String,Description="Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing group">
Warning: Expected at least 2 parts in FORMAT entry: ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
Warning: Expected at least 2 parts in FORMAT entry: ID=RGQ,Number=1,Type=Integer,Description="Unconditional reference genotype confidence, encoded as a phred quality -10*log10 p(genotype call is wrong)">
Warning: Expected at least 2 parts in INFO entry: ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed">
Warning: Expected at least 2 parts in INFO entry: ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed">
After filtering, kept 32 out of 32 Individuals
Outputting VCF file...
After filtering, kept 91397 out of a possible 149740 Sites
Run Time = 5.00 seconds

============================
 Job utilisation efficiency
============================

Job ID: 2156096
Cluster: viking2.yor.alces.network
User/Group: sfo503/clusterusers
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 40
CPU Utilized: 00:10:07
CPU Efficiency: 2.68% of 06:18:00 core-walltime
Job Wall-clock time: 00:09:27
Memory Utilized: 2.84 GB
Memory Efficiency: 1.36% of 209.57 GB
 Requested wall clock time: 2-00:00:00
    Actual wall clock time: 00:09:27
Wall clock time efficiency: 0.3%
           Job queued time: 00:00:15

