The construction and analysis of pan-genomes remains a young and rapidly changing field. Nonetheless, well-documented and reproducible pipelines are needed, but rarely found.
Among the many considerations when choosing or developing an analysis pipeline is the quality of the assembly and/or annotations that are included in the analysis. Poor quality assemblies or annotations are likely to skew the results and introduce errors. Nonetheless, there may be important information in these lower-quality datasets, or high-quality datasets may not exist for some members of the pan-genome analysis, so finding ways to include them without lowering the overall quality of the analysis results is likely important.
A good review of the state of pan-genome research as of 2020 can be found in Vernikos GS. A Review of Pangenome Tools and Recent Studies. 2020 May 1. In: Tettelin H, Medini D, editors. The Pangenome: Diversity, Dynamics and Evolution of Genomes. Cham (CH): Springer; 2020. Available here: https://www.ncbi.nlm.nih.gov/books/NBK558826/
A slide deck from 2022 summarizing pan-genome analysis and visualization software can be seen here.
[Add a note about use of assembly metrics?]