Post detailing results from OrthoFinder comparing the protein lists from P. helianthoides, P. ochraceus, and D. imbricata.
R code: project-pycno-multispecies-2023/code/37-orthofinder.rmd
All OrthoFinder results live on Grace Crandall’s laptop- /Users/graciecrandall/Documents/
What is OrthoFinder?
New paper published: OrthoFinder: improved phylogenetic orthology inference with enhanced accuracy and scalability.
Emms, D.M., Liu, Y., Belcher, L. et al. OrthoFinder: improved phylogenetic orthology inference with enhanced accuracy and scalability. Nat Methods (2026). https://doi.org/10.1038/s41592-026-03126-6
OrthoFinder GitHub Repo (see readme.md file for the repository for more info here)
I think I need to re-run OrthoFinder, but it’s still a BETA release, so I’ll stick with the current latest (Version 2.5.5) that is the latest downloadable non-BETA version of OrthoFinder.
How it works:
You take protein lists from the species of interest (in my case: Pycnopodia helianthoides, Pisaster ochraceus and Dermasterias imbricata), and run them using OrthoFinder (R Code detailing how I ran it on my laptop using command line here).
Tutorial here: tutorials.html
What are Orthogroups and Orthologs?
Source: https://plantsandpython.github.io/PlantsAndPython/P_9_COMPARATIVE_GENOMICS/1_Practice.html
What will OrthoFinder tell us in relation to this Multi-Species Experiment?
The Multi-Species experiment involved three sea star species (Pycnopodia helianthoides, Pisaster ochraceus and Dermasterias imbricata) co-housed in tanks - 1 individual per species (each tank had three sea stars, one of each species).
OrthoFinder will tell us
Running OrthoFinder:
2026-06-24
R code with notes on how I run it: code/27-orthofinder.Rmd)
Note - I’m re-running OrthoFinder because I know it’s quick and it helps me to remember what all the files and results are when I go through the process again. The original OrthoFinder work that I did in December is this R code (which is essentially what I am re-doing now): code/37-orthofinder.rmd
Downloand and install OrthoFinder
Download and install instructions
In terminal on my laptop, I ran wget https://github.com/davidemms/OrthoFinder/releases/latest/download/OrthoFinder.tar.gz to get the latest version of OrthoFinder downloaded. Previous version I used was OrthoFinder version 2.5.5 Copyright (C) 2014 David Emms.
Extract the package:
tar xzvf OrthoFinder.tar.gz
cd OrthoFinder/
It prompts you to then change directories into OrthoFinder:
cd OrthoFinder/. run pwd: /Users/graciecrandall/OrthoFinder.
Run ` orthofinder -h` to see the help file.
OrthoFinder version 2.5.5 Copyright (C) 2014 David Emms. The newest release (v3.0.1b1) is still BETA.
DIAMOND
I have this downloaded from a while ago. Version: diamond version 2.1.16.
Running OrthoFinder
Files Needed
Key files multispecies: Key-Files
Files needed:
Pycno protein fasta:
https://gannet.fish.washington.edu/seashell/bu-github/paper-pycno-sswd-2021-2022/data/augustus.hints.aa
Head:
>g3450.t1
MNYRTDDEIYEDEVDEETKLHHRIDERGVNGTDSRDEPTRVPPAKQLKIQAPVLASRLQL
EAVRRPHPPPLHPPQIHLYLEPHLEALQQRCDSYKGIVG*
>g3451.t1
MKINKEKSKVMHLSRCKTPTDEHLPHYTLQETQSKLFCRLKGLRDYNFRSFILAFLQYNP
SDLISVSDAFLHPFLARTSCSKVPCPLLPKMKIRRRQLGITHHRKTT*
Pisaster protein fasta:
https://github.com/grace-ac/project-pycno-multispecies-2023/blob/main/data/Pisaster_ochraceus_v2_gene_models/augustus.hints.aa.gz
Head:
>g19253.t1
XPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNP
NPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNP
NPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNP
NPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNPNP
Dermasterias protein fasta:
https://github.com/grace-ac/project-pycno-multispecies-2023/blob/main/data/Newest_Derm_files/Dermasterias_imbricata/augustus.hints.aa.gz
Head:
>g7941.t1
MQYLPFRGVLCVFWGFLLIETFRPANAGDFALVADLRNGTIYAGSIGQSLADIAPLPLTG
VIRPLAVEYDPVEKMVYWTDVNSLPSPKITRAHVNGSGQMTLVDQLHLPDGLALDVESRL
VYWTDGVLGYIGRTRMNGTGARETIVVGLDQPRAIITDSGFIYWTDWGNSSRIERAGLDG
SNRTTLITGNLVWPNGLFKDGNNLYWCDAKLDKIERSDLLGNNREIVIDLTSYPQIHPFD
LAVYDEYIYWTDWGYTTLIRVHTSGRGEQNYGPSVFQQSGGLHIQKEPNYCNSSPCQNGA
ICIDVINGFSCICPSEHQGITRSENPSGSGPCVNGGTCTTIPGGFTCQCPAGYDGPTCTI
EFIVKKRDREIAGKCKRLH*
Running OrthoFinder:
Then in terminal,
(base) Graces-MacBook-Air-81:Documents graciecrandall$ mkdir orthofinder2026
(base) Graces-MacBook-Air-81:Documents graciecrandall$ cd orthofinder2026/
(base) Graces-MacBook-Air-81:orthofinder2026 graciecrandall$ ls
here - copy and paste the three aa files into the folder (make sure named for each species)
(base) Graces-MacBook-Air-81:orthofinder2026 graciecrandall$ ls derm-protein.faa pisaster-protein.faa pycno-protein.faa
*Note: I previously renamed the file suffixes from .aa to .faa.
Run:
orthofinder -f . -S diamond -t 16
Started 2026-06-24 14:42:05
Ended:
2026-06-24 15:15:10 : Done orthologues
Results:
/Users/graciecrandall/Documents/orthofinder2026/OrthoFinder/Results_Jun24/
CITATION: When publishing work that uses OrthoFinder please cite: Emms D.M. & Kelly S. (2019), Genome Biology 20:238
If you use the species tree in your work then please also cite: Emms D.M. & Kelly S. (2017), MBE 34(12): 3267-3278 Emms D.M. & Kelly S. (2018), bioRxiv https://doi.org/10.1101/267914
Results:
Online resource for exploring results: Tutorial: exploring-orthofinders-results.html
Results live on Grace Crandall’s laptop in: /Users/graciecrandall/Documents/orthofinder2026/OrthoFinder/Results_Jun24/
I make some results tables in this R code: project-pycno-multispecies-2023/code/27-orthofinder.Rmd
Results data tables:
All are in project-pycno-multispecies-2023/output/27-orthofinder
- Orthogroups.tsv: table copied straight from OrthoFinder output on my laptop. A table of all the orthogroups across all three species’ protein lists.
- shared_orthologs_all_species-long.tsv: long format (columns: Orthogroup; Species; Protein) table of the orthogroups that are shared across ALL THREE SPECIES
- shared_orthologs_all_species-wide.tsv: wide format (columns: Orthogroup; Dermasterias; Pisaster; Pycnopodia) table of orthogroups that are shared across ALL THREE SPECIES
- GOSlim table of Orthogroups shared across all three species:
- P. helianthoides: Annotated Orthogroup list with GO and blast: pycno_orthogroups_GO_annotations.tab
- P. ochraceus: Annotated Orthogroup list with GO and blast: pisaster_orthogroups_GO_annotations.tab
- D. imbricata: Annotated Orthogroup list with GO and blast: derm_orthogroups_GO_annotations.tab
- P. helianthoides: GOSlim for all P. helianthoides orthogroups: GOSlim_all_pycno_orthogroups.tab
- P. ochraceus: GOSlim for all P. ochraceus orthogroups: GOSlim_all_pisaster_orthogroups.tab
- D. imbricata: GOSlim for all D. imbricata orthogroups: GOSlim_all_derm_orthogroups.tab
Summary table from OrthoFinder:
Input files to OrthoFinder are the protein lists from each species.
Also note, that there are 12,547 orthogroups shared across all three species.
| Dermasterias imbricata | Pisaster ochraceus | Pycnopodia helianthoides | |
|---|---|---|---|
| Number of genes | 25030 | 35696 | 26581 |
| Number of genes in orthogroups | 21692 | 30464 | 22253 |
| Number of unassigned genes | 3338 | 5232 | 4328 |
| Percentage of genes in orthogroups | 86.7 | 85.3 | 83.7 |
| Percentage of unassigned genes | 13.3 | 14.7 | 16.3 |
| Number of orthogroups containing species | 14691 | 16503 | 15488 |
| Percentage of orthogroups containing species | 79.9 | 89.8 | 84.3 |
| Number of species-specific orthogroups | 896 | 1183 | 552 |
| Number of genes in species-specific orthogroups | 3176 | 6375 | 1669 |
| Percentage of genes in species-specific orthogroups | 12.7 | 17.9 | 6.3 |
Results figures:
Venn Diagram of ALL Orthogroups shared across all the species:

P. helianthoides orthogroups in proportion to all shared orthogroups (12547) across all three species

Top 10 GOSlim terms for the orthogroups unique to P. helianthoides. Y-axis is proportion of those orthogroups related to the 12,547 shared orthogroups across all three species. X-axis is the 10 top GOSlim terms - top meaning most counts. Color-coded by Term.
P. ochraceus orthogroups in proportion to all shared orthogroups (12547) across all three species

Top 10 GOSlim terms for the orthogroups unique to P. ochraceus. Y-axis is proportion of those orthogroups related to the 12,547 shared orthogroups across all three species. X-axis is the 10 top GOSlim terms - top meaning most counts. Color-coded by Term.
D. imbricata orthogroups in proportion to all shared orthogroups (12547) across all three species

Top 10 GOSlim terms for the orthogroups unique to D. imbricata. Y-axis is proportion of those orthogroups related to the 12,547 shared orthogroups across all three species. X-axis is the 10 top GOSlim terms - top meaning most counts. Color-coded by Term.
Proportion plot notes:
All three figures are very similar… each species has about 70 GOSlim terms, but that’s a lot to include in a figure.
Not sure of how else I can visualize these differences.
I’m also not accounting for any DEGs in these figures –> these are just orthogroups based on each species’ genome protein lists.