Today, I continued to prepare a presentation for GSS 2019 (this Thursday - my talk is at 1:30pm). I met with Steven and discussed a plan for results, and he shared with me the analyses he’s done during the past couple of days. Details of results and what I’ve done in the post. I also tried doing edgeR stuff yesterday and didn’t get super far, but details will be at the end of the post for my future self (won’t do edgeR for GSS).
Stuff SR did (as far as I can understand at the moment)
GitHub Issue #790
sr320/nb-2019/C_bairdi/11-Deseq2.html
He joined three files
- Abundance-merge.txt - made from the outputs I made in
kallisto
(bairdisamples-kallisto-updates). - sr320/nb-2019/Crab_DEGlist.txt - a list of all the Differentially Expressed Genes
- Cb_v1_blastx_sp_imac.tab - which is the blastoutput from the first assembled transcriptome against uniprot
The combined files resulted in: bigtable.txt
sr320/nb-2019/C_bairdi/11-Deseq2.html
In the above link, Steven used DESeq2 to get the differentially expressed genes from the Abundance-merge.txt table.
DEGs (p-value < 0.1):
## out of 137634 with nonzero total read count
## adjusted p-value < 0.1
## LFC > 0 (up) : 4529, 3.3%
## LFC < 0 (down) : 59567, 43%
## outliers [1] : 858, 0.62%
## low counts [2] : 10635, 7.7%
## (mean count < 1)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
- Total DEGs with adj p-value <0.1 = 64096
- DEGs in the infected group (down; negative LFC values) = 59567
- DEGs in the uninfected group (up; positive LFC values) = 4529
DEGs (p-value < 0.05):
## out of 137634 with nonzero total read count
## adjusted p-value < 0.05
## LFC > 0 (up) : 3062, 2.2%
## LFC < 0 (down) : 55406, 40%
## outliers [1] : 858, 0.62%
## low counts [2] : 13314, 9.7%
## (mean count < 1)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
- Total DEGs with adj p-value <0.05 = 58468
- DEGs in the infected group (down; negative LFC values) = 55406
- DEGs in the uninfected group (up; positive LFC values) = 3062
Results for GSS
After talking with Steven, it’s been decided to focus on simpler results since GSS is on Thursday, and I have to submit my slides by tomorrow at 5pm.
I’ll model my talk after what I did in September at PCSGA, but I’ll add the results from the work Steven did.
I’ll talk about:
- Number of DEGs
- Number of up DEGs (uninfected)
- Number of down DEGs (infected)
- Annotate
- Enrichment
**I will use the DEG values from adj p-value <0.05, because that is what SR used to get the deg.annot.txt.
Other things I did today:
DAVID
Steven provided this annotated DEG list: deg.annot.txt
I separated out the DEGs for the infected (negative LFC) and the uninfected (positive LFC) so that I had two annotated files.
I then used DAVID to get enriched GO terms for both lists.
- Background: uniprot accession IDs from Cb_v1_blastx_sp_imac.tab
- List for infected: uniprot accession IDs from “infected” tab deg.annot.xlsx
- List for uninfected: uniprot accession IDs from “uninfected” tab deg.annot.xlsx
DAVID resulted in the following tables:
Revigo
From the two enrichedGO-davidout files listed directly above, I took the GO IDs, and the associated Fold Enrichments (for infected and uninfected separately) in order to get some visualizations in revigo, and I saved the revigo output tables.
Infected:
Uninfected:
Some interesting things from this:
- There’s a lot more DEGs in infected - we’re getting Hematodinium stuff likely!
- In the infected DEGs, there’s a lot of genes that are to do with asexual reproduction (dinoflagellates reproduce asexually); cilium and flagellum movement (dinoflagellates move via flagella/cilia!); __
edgeR
stuff from yesterday:
GitHub Issue #793
Jupyter notebook: 1111-blastx-uniprot-file-merging-for-EdgeR.ipynb
Work done on Hummingbird.
My plan:
- Finish up with the above linked notebook –> it will result in a file that will work in
edgeR
- Use R to perform tasks based on this notebook from EIMD 2019: NonzosteraEdgeR_7_15.R –> will result in tables that can be used to create heatmaps
- Use R to perform code to create heatmaps modeledd from notebook from EIMD 2019: 2019-07-15-Gene-Expression-Heatmaps.Rmd
There’s likely lots more that I can and will do, but this is what I’m aware of and thinking about at the moment. But, again, I won’t do this stuff for GSS.