GSS 2019 Preparations - Differential Expression between infected and uninfected

Today, I continued to prepare a presentation for GSS 2019 (this Thursday - my talk is at 1:30pm). I met with Steven and discussed a plan for results, and he shared with me the analyses he’s done during the past couple of days. Details of results and what I’ve done in the post. I also tried doing edgeR stuff yesterday and didn’t get super far, but details will be at the end of the post for my future self (won’t do edgeR for GSS).

Stuff SR did (as far as I can understand at the moment)

GitHub Issue #790

sr320/nb-2019/C_bairdi/11-Deseq2.html

He joined three files

  1. Abundance-merge.txt - made from the outputs I made in kallisto (bairdisamples-kallisto-updates).
  2. sr320/nb-2019/Crab_DEGlist.txt - a list of all the Differentially Expressed Genes
  3. Cb_v1_blastx_sp_imac.tab - which is the blastoutput from the first assembled transcriptome against uniprot

The combined files resulted in: bigtable.txt

sr320/nb-2019/C_bairdi/11-Deseq2.html

In the above link, Steven used DESeq2 to get the differentially expressed genes from the Abundance-merge.txt table.

DEGs (p-value < 0.1):

## out of 137634 with nonzero total read count
## adjusted p-value < 0.1
## LFC > 0 (up)       : 4529, 3.3%
## LFC < 0 (down)     : 59567, 43%
## outliers [1]       : 858, 0.62%
## low counts [2]     : 10635, 7.7%
## (mean count < 1)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
  • Total DEGs with adj p-value <0.1 = 64096
  • DEGs in the infected group (down; negative LFC values) = 59567
  • DEGs in the uninfected group (up; positive LFC values) = 4529

DEGs (p-value < 0.05):

## out of 137634 with nonzero total read count
## adjusted p-value < 0.05
## LFC > 0 (up)       : 3062, 2.2%
## LFC < 0 (down)     : 55406, 40%
## outliers [1]       : 858, 0.62%
## low counts [2]     : 13314, 9.7%
## (mean count < 1)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
  • Total DEGs with adj p-value <0.05 = 58468
  • DEGs in the infected group (down; negative LFC values) = 55406
  • DEGs in the uninfected group (up; positive LFC values) = 3062

Results for GSS

After talking with Steven, it’s been decided to focus on simpler results since GSS is on Thursday, and I have to submit my slides by tomorrow at 5pm.

I’ll model my talk after what I did in September at PCSGA, but I’ll add the results from the work Steven did.

I’ll talk about:

  • Number of DEGs
  • Number of up DEGs (uninfected)
  • Number of down DEGs (infected)
  • Annotate
  • Enrichment

**I will use the DEG values from adj p-value <0.05, because that is what SR used to get the deg.annot.txt.

Other things I did today:

DAVID

Steven provided this annotated DEG list: deg.annot.txt

I separated out the DEGs for the infected (negative LFC) and the uninfected (positive LFC) so that I had two annotated files.

I then used DAVID to get enriched GO terms for both lists.

DAVID resulted in the following tables:

Revigo

From the two enrichedGO-davidout files listed directly above, I took the GO IDs, and the associated Fold Enrichments (for infected and uninfected separately) in order to get some visualizations in revigo, and I saved the revigo output tables.

Infected:

Uninfected:

Some interesting things from this:

  • There’s a lot more DEGs in infected - we’re getting Hematodinium stuff likely!
  • In the infected DEGs, there’s a lot of genes that are to do with asexual reproduction (dinoflagellates reproduce asexually); cilium and flagellum movement (dinoflagellates move via flagella/cilia!); __

edgeR stuff from yesterday:

GitHub Issue #793

Jupyter notebook: 1111-blastx-uniprot-file-merging-for-EdgeR.ipynb

Work done on Hummingbird.

My plan:

  • Finish up with the above linked notebook –> it will result in a file that will work in edgeR
  • Use R to perform tasks based on this notebook from EIMD 2019: NonzosteraEdgeR_7_15.R –> will result in tables that can be used to create heatmaps
  • Use R to perform code to create heatmaps modeledd from notebook from EIMD 2019: 2019-07-15-Gene-Expression-Heatmaps.Rmd

There’s likely lots more that I can and will do, but this is what I’m aware of and thinking about at the moment. But, again, I won’t do this stuff for GSS.

Written on November 12, 2019