Practicals on SCZ GWAS 2022 =========================== Overview -------- In this practicals, we are going to analyze the GWAS from Schizophrenia 2022 to showcase how to use the new modules from FUMA v2.0.0 Download and Format GWAS ------------------------- - Download the GWAS summary statistics for the Schizophrenia GWAS from 2022 (https://pubmed.ncbi.nlm.nih.gov/35396580/) from https://pgc.unc.edu/for-researchers/download-results/ - Download the file `PGC3_SCZ_wave3.european.autosome.public.v3.vcf.tsv.gz` .. tip:: ALWAYS inspect the GWAS summary statistics file and format it before submitting to FUMA - Use `zless PGC3_SCZ_wave3.european.autosome.public.v3.vcf.tsv.gz` to view the content of the file. You will see that the file contains many lines with the `#` symbol before the actual data: .. image:: scz_2022_sumstat.png :width: 800 1. Check which build, GRCh37 or GRCh38 - Spot check a few variants on gnomad, the chromosome and position matches with GRCh37. For example: - https://gnomad.broadinstitute.org/variant/8-101592213-C-T?dataset=gnomad_r2_1 - https://gnomad.broadinstitute.org/variant/8-109954636-T-C?dataset=gnomad_r2_1 2. Check which alleles are reference vs alternate - From the image above, it says A1 is the SNP reference allele and A2 is the SNP alternate allele. This matches with the information from gnomad as well. 3. Check other columns - p value is found under column name `PVAL` - beta is found under column name `BETA` - Sample sizes: there are 3 columns for sample size in this file: `NCAS`, `NCON`, and `NEFF`. For this practicals, I will define the sample size as equal to the sum of `NCAS` and `NCON`. - `FCAS` and `FCON` are the frequency of A1 in cases and controls, respectively. 4. Let's prepare the input GWAS sumstat for running SNP2GENE and FLAMES. - For simplicity, I will prepare the input GWAS sumstat for running FLAMES, which I will also use for running a standard SNP2GENE analysis - Follow the instruction in https://fuma-docs.readthedocs.io/en/latest/flames/quick_start.html#submit-a-snp2gene-job - Based on the instruction, I will subset the file `PGC3_SCZ_wave3.european.autosome.public.v3.vcf.tsv.gz` to contain the following columns: `CHROM`, `ID`, `POS`, `A1`, `A2`, `BETA`, and `PVAL`. Example codes: .. code-block:: python import gzip outfile = open("scz2022_sumstat_fuma.txt", "w") with gzip.open("PGC3_SCZ_wave3.european.autosome.public.v3.vcf.tsv.gz", "rt") as f: for line in f: if line.startswith("#"): continue fields = line.strip().split("\t") chrom = fields[0] pos = fields[2] id = fields[1] a1 = fields[3] a2 = fields[4] beta = fields[8] pval = fields[10] print("\t".join([chrom, id, pos, a1, a2, beta, pval]), file=outfile) Run FLAMES ---------- - In this section we will try to identify the effect genes by running FLAMES Step 1. Run SNP2GENE with MAGMA ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - Submit a SNP2GENE job - Make sure to click on the button to keep the input gwas sumstat file for easy implementation of FLAMES .. image:: scz_2022_upload.png :width: 800 - Make sure to put in an integer value for the sample size .. image:: scz_2022_samplesize.png :width: 800 - Section 3-5 can be left as default - Make sure to check MAGMA in section 6 .. image:: scz_2022_magma.png :width: 800 - Put in a title and submit .. image:: scz_2022_submit.png :width: 800 - Job finished successfully: .. image:: scz_2022_snp2gene_for_flame_ok.png :width: 800 Step 2. Submit a FLAMES job ^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. image:: scz_2022_submit_flames.png :width: 800 Run a SNP2GENE job with xQTLs mapping ------------------------------------- - In this section we will run a SNP2GENE job with xQTLs mapping - In theory, you can combine this with the run for FLAMES. However, it could be the case that MAGMA take a long time. Because each job has a limit of 8 hours, I recommend to run the MAGMA job separately. Step 1. Submit a SNP2GENE job ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - Make sure to leave the button to keep the input gwas sumstat file **UNCHECKED** (this is the default) - In step 3, click on **Perform xQTLs Mapping** to expand the options and select the datasets - Because I know apriori that schizophrenia is a brain related phenotype, I will select brain related xQTLs datasets - eQTLs from the brain .. image:: scz_2022_brain_eqtls.png :width: 800 - sQTLs from the brain .. image:: scz_2022_brain_sqtls.png :width: 800 - pQTLs from the brain .. image:: scz_2022_brain_pqtls.png :width: 800 - sceQTLs from the brain .. image:: scz_2022_brain_sceqtls.png :width: 800 Analysis: Find predicted relevant genes per genomic risk locus -------------------------------------------------------------- - In this section, we are going to perform follow-up analyses after: - running a SNP2GENE job to map genes using positional mapping and xQTLs mapping - running a FLAMES job which predicts an effector gene per genomic risk locus - We are interested in: which genes are predicted as being relevant for our GWAS when using: - positional mapping - xQTLs mapping - FLAMES 1. Download the result files from SNP2GENE - Download `Gene table (mapped genes)` and `xQTLs mapping results` .. image:: scz_2022_download.png :width: 800 - Then, unzip the folder 2. Download the result file from FLAMES .. image:: scz_2022_download_flames.png :width: 800 3. Create a directory called `scz_2022` and copy and renames these files .. code-block:: bash mv genes.txt scz_2022/positional_mapped_genes.txt mv mv xqtls.txt scz_2022/xqtls_mapped_genes.txt mv FLAMES_scores_fmt.pred scz_2022/flames_mapped_genes.txt - Then, run script `list_predicted_genes_per_locus.py` (available at: https://github.com/tanyaphung/commonly_used_codes/tree/master/fuma_related/list_predicted_genes) .. code-block:: bash python list_predicted_genes_per_locus.py --filedir scz_2022/ - The above script returns: - column 1: genomic risk locus - column 2: whether it is positional (for positional mapping), xqtls (for xqtls mapping), or flames - column 3: the predicted or relevant gene - column 4: only in the xqtls mapping, it returns the name of the datasets .. tip:: Analyze each genomic risk locus one at a time 4. Example from genomic risk locus 6 - 3 genes were mapped using positional mapping: .. code-block:: bash awk -F "," '$1==6{print}' mapped_genes.txt | grep positional 6,positional,PTPRF, 6,positional,KDM4A, 6,positional,ST3GAL3, - 18 genes where the GWAS hits were also xQTLs: .. code-block:: bash awk -F "," '$1==6{print}' mapped_genes.txt | grep xqtls | awk -F "," '{print$3}' | sort | uniq ARTN ATP6V0B CCDC24 CDC20 DPH2 ERI3 FAM183A HYI IPO13 KDM4A MED8 PPIH PTPRF ST3GAL3 SZT2 TIE1 TMEM125 YBX1 - FLAMES predicted `PTPRF` to be the effector gene .. code-block:: bash awk -F "," '$1==6{print}' mapped_genes.txt | grep flames 6,flames,PTPRF, 5. We then can run an QTLs Analysis module on FUMA to gain additional insights on the effects of GWAS variants within genomic risk locus 6 Example running QTLs Analysis ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1. Obtain the range for genomic risk locus 6 - You can obtain this from the Results table, under tab `Genomic risk loci`: .. image:: scz_2022_locus6.png :width: 800 - the range is chr1:43982527-44378198 2. Subset the GWAS sumstat for this range - For illustration purpose, let's use `FCON` as the MAF - Example code: .. code-block:: bash import gzip outfile = open("scz2022_sumstat_fuma.txt", "w") with gzip.open("PGC3_SCZ_wave3.european.autosome.public.v3.vcf.tsv.gz", "rt") as f: for line in f: if line.startswith("#"): continue fields = line.strip().split("\t") chrom = fields[0] pos = fields[2] id = fields[1] a1 = fields[3] a2 = fields[4] beta = fields[8] pval = fields[10] print("\t".join([chrom, id, pos, a1, a2, beta, pval]), file=outfile) 3. Submit a QTLs Analysis job for genomic risk locus 6 - We can use the results from the `xQTLs mapping` to inform our selection of datasets. For example: .. code-block:: bash eQTL:gtex_v10:Brain_Cerebellar_Hemisphere eQTL:gtex_v10:Brain_Cerebellum eQTL:gtex_v10:Brain_Cortex eQTL:gtex_v10:Brain_Frontal_Cortex_BA9 eQTL:metabrain:Brain_cerebellum eQTL:metabrain:Brain_cortex sceQTL:brainscope:Brain_brainscope_L2.3.IT sceQTL:bryois2022Brain:Brain_bryois2022Brain_Astrocytes sceQTL:bryois2022Brain:Brain_bryois2022Brain_Endothelial.cells sceQTL:bryois2022Brain:Brain_bryois2022Brain_Excitatory.neurons sceQTL:bryois2022Brain:Brain_bryois2022Brain_OPCs sceQTL:jerber2021Dopaminergic:Brain_jerber2021Dopaminergic_D11.FPP sceQTL:jerber2021Dopaminergic:Brain_jerber2021Dopaminergic_D11.P_FPP sceQTL:jerber2021Dopaminergic:Brain_jerber2021Dopaminergic_D30.DA sceQTL:jerber2021Dopaminergic:Brain_jerber2021Dopaminergic_D30.Epen1 sceQTL:jerber2021Dopaminergic:Brain_jerber2021Dopaminergic_D30.FPP sceQTL:jerber2021Dopaminergic:Brain_jerber2021Dopaminergic_D30.Sert sceQTL:jerber2021Dopaminergic:Brain_jerber2021Dopaminergic_D52.Epen1.ROT_treated sceQTL:jerber2021Dopaminergic:Brain_jerber2021Dopaminergic_D52.Epen1.untreated sceQTL:jerber2021Dopaminergic:Brain_jerber2021Dopaminergic_D52.Sert.ROT_treated sceQTL:jerber2021Dopaminergic:Brain_jerber2021Dopaminergic_D52.Sert.untreated sceQTL:jerber2021Dopaminergic:Brain_jerber2021Dopaminergic_D52.pseudobulk.untreated sceQTL:singlebrain:Brain_singlebrain_Ast sceQTL:singlebrain:Brain_singlebrain_Ext3 sceQTL:singlebrain:Brain_singlebrain_Ext5 sceQTL:singlebrain:Brain_singlebrain_Ext7 sceQTL:singlebrain:Brain_singlebrain_MG2 sceQTL:singlebrain:Brain_singlebrain_MiGA3 - Note that the datasets from single brain can only be implemented with LAVA currently. - Submit a QTLs Analysis job: .. image:: scz_2022_locus6_qtlsAnalysis.png :width: 800