class: middle, center, title-slide count: false
.huge.blue[Matthew Feickert]
.large[(University of Wisconsin-Madison)]
[email protected]
Software Citation and Recognition in HEP Workshop 2022
November 23rd, 2022
.middle-logo[]
.kol-1-2.large[
- In Tuesday's session, Daniel Katz already gave very nice high level overview of software citation .bold[principles] and .bold[tools]
- This is an .red[opinionated] summary of the tooling landscape and examples of workflows
- Full disclosure: Opinions formed from pyhf development and from Scikit-HEP community discussions (c.f. Eduardo's talk).
- Meant to be recommendations to software developers on making your work as .bold[easy to cite as possible]
- These recommendations can transfer to experiment software as well
]
.kol-1-2[
.center.width-100[] .center[Daniel Katz's talk] ]
- These recommendations can transfer to experiment software as well
]
.kol-1-2[
.kol-1-2.large[
- The easiest, but least robust way: If you have a particular citation that you want people to use, put it .bold[everywhere]
- Version control repository README
- Online software documentation (landing page, how to cite page)
- Package distribution websites (e.g. PyPI)
- Having single source of truth for citations: version control repository that all other sources derive from.
- Make your citation preferences clear to the world and SEO. Do not rely on people emailing to ask (they shouldn't have to). ] .kol-1-2[ .center.width-100[] .center[pyhf's "Use and Citations" page in documentation] ]
.kol-3-5.large[
- Adopt the Citation File Format as a common standard and add a
CITATION.cff
to project repository- Human- and machine-readable file format in YAML
- Has well defined, versioned schema
- Convertible to other citation formats (BibTeX, CodeMeta, EndNote, RIS, schema.org, Zenodo, APA)
- Supported by GitHub, Zenodo, and Zotero!
- Web tool initializer for easily creating first
CITATION.cff
- Tooling for validation .tiny[
$ python -m pip install cffconvert
$ cffconvert --validate
Citation metadata are valid according to schema version 1.2.0.
] ] .kol-2-5[ .tiny[
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: Druskat
given-names: Stephan
orcid: https://orcid.org/0000-0003-4925-7248
title: "My Research Software"
version: 2.0.4
doi: 10.5281/zenodo.1234
date-released: 2021-08-11
]
.center[Example of minimal CITATION.cff
]
.large[
- As plain text, very easy to update version information when cutting a release
- Can use tool control of version update to make it easier
- Example:
tbump
$ tbump <version target>
- Example:
- Also possible to have automated version bump workflows using continuous integration
- (Jumping ahead a slide) What about the Zenodo DOI?
- For simplicity, use the project level DOI and not the version level DOI ]
.smaller[
cff-version: 1.2.0
message: "Please cite the following works when using this software."
type: software
...
title: "mylibrary: v1.2.3"
version: 1.2.3
doi: 10.5281/zenodo.1123456
repository-code: "https://github.com/myorg/mylibrary/releases/tag/v1.2.3"
url: "https://mylibrary.readthedocs.io/en/v1.2.3/"
]
- Open source (but your files can be closed access)
- Versioned archival of everything: code, documents, data products, data sets ] .kol-1-2[ .center.width-75[] ]
- Everything on Zenodo has a DOI
- Provides both a .bold[project] DOI (resolves to latest) and .bold[version specific] DOI
- Enable it to automatically preserve work from GitHub (can also directly upload, but lose out on automation)
- Benefit from having a DOI for .bold[every version] regardless of software paper landscape state
- Once you have a DOI, put it .bold[everywhere] (again)
- Recommend sharing the project DOI and letting users select a specific version if they want it
.center.large[CITATION.cff used by Zenodo importer to fully define Zenodo archive metadata]
.kol-1-2[ .center.width-85[] ] .kol-1-2[ .center.width-110[] ]
.kol-1-2[
.center.width-100[]
]
.kol-1-2[
.center.width-100[]
]
.kol-2-3.large[
- In addition to providing standard formats, providing users a language API or CLI API to get the citation information for the version of the tool is helpful
- User doesn't have to check if the information they find online matches their version.
- Historically, this was done by printing a banner with citation or copyright information when the library is used
- This should .bold[not] be done now. This creates noise for users and if multiple tools did this your terminal would get filled.
- Most libraries that used to do this have now abandoned this approach.
- Opinion: There are tools in broader scientific ecosystem that provide citation information for their dependencies as well. While very conscientious, I think this is .bold[unnecessary] and can be confusing to users.
]
.kol-1-3[
# CLI API
$ mytool --citation
$ mytool --cite
# Python API
import mytool
mytool.utils.citation()
.center.large[Example APIs] ]
.kol-2-3.huge[
- Build community practices on top of .bold[established standards]
- If citation of your software is important to you, .bold[make it easy] for a user to find your citation information
- Modern standards like
CITATION.cff
allow for .bold[single source of citation information] that can be exported as needed - Long term archives + FAIR practices
class: end-slide, center
.huge[Backup]
As mentioned, these opinions have been formed from developing pyhf, and the citation count for the JOSS paper has increased each year.
class: end-slide, center count: false
The end.