Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset request: aC_JCP2023 #10

Open
jvita opened this issue Aug 26, 2023 · 2 comments
Open

Dataset request: aC_JCP2023 #10

jvita opened this issue Aug 26, 2023 · 2 comments
Labels
dataset request Suggest a dataset to include in Colabfit Exchange duplicate This issue or pull request already exists

Comments

@jvita
Copy link
Member

jvita commented Aug 26, 2023

Contribute content

Contributor/requester

Contact information about the person contributing/requesting the data. Used for communication purposes.

Name: Josh Vita
Email: [email protected]

Dataset

Any information necessary to help the ColabFit find and access the data, and to correctly cite relevant material. The "name" and "description" will be used when publishing to the ColabFit exchange, and should be human-readable. Author list should include full first names, unless the author is normally attributed by initials. Links should include relevant publications and online location of dataset, if available.

Name: aC_JCP2023

Authors: Emi Minamitani, Ippei Obayashi, Koji Shimizu, Satoshi Watanabe

Links:

Description:
The amorphous carbon dataset was generated using ab initio calculations with VASP software. We utilized the LDA exchange-correlation functional and the PAW potential for carbon. Melt-quench simulations were performed to create amorphous and liquid-state structures. A simple cubic lattice of 216 carbon atoms was chosen as the initial state. Simulations were conducted at densities of 1.5, 1.7, 2.0, 2.2, 2.4, 2.6, 2.8, 3.0, 3.2, 3.4, and 3.5 g/cm3 to produce a variety of structures. The NVT ensemble was employed for all melt-quench simulations, and the density was adjusted by modifying the size of the simulation cell. A time step of 1 fs was used for the simulations. For all densities, only the Γ points were sampled in the k-space. To increase structural diversity, six independent simulations were performed.

In the melt-quench simulations, the temperature was raised from 300 K to 9000 K over 2 ps to melt carbon. Equilibrium molecular dynamics (MD) was conducted at 9000 K for 3 ps to create a liquid state, followed by a decrease in temperature to 5000 K over 2 ps, with the system equilibrating at that temperature for 2 ps. Finally, the temperature was lowered from 5000 K to 300 K over 2 ps to generate an amorphous structure.

During the melt-quench simulation, 30 snapshots were taken from the equilibrium MD trajectory at 9000 K, 100 from the cooling process between 9000 and 5000 K, 25 from the equilibrium MD trajectory at 5000 K, and 100 from the cooling process between 5000 and 300 K. This yielded a total of 16,830 data points.

Data for diamond structures containing 216 atoms at densities of 2.4, 2.6, 2.8, 3.0, 3.2, 3.4, and 3.5 g/cm3 were also prepared. Further data on the diamond structure were obtained from 80 snapshots taken from the 2 ps equilibrium MD trajectory at 300 K, resulting in 560 data points.

To validate predictions for larger structures, we generated data for 512-atom systems using the same procedure as for the 216-atom systems. A single simulation was conducted for each density. The number of data points was 2,805 for amorphous and liquid states

Calculations

Details regarding how the data was computed in order to improve reproducibility. Provide as much information as possible. Input files are highly encouraged. Additional details might include functional, basis set, energy cutoff, k-point grid, reference energy, etc.

Method: DFT
Software: VASP
Additional details: LDA XC-functional, PAW potential
Files: None

Included properties

See the current list of ColabFit property definitions. If you believe your data does not match one of the existing definitions, then you must submit a new property definition following the template provided in the examples folder.

Name Units Notes
potential-energy eV Appear to be supercell energies
atomic-forces eV/A
free-energy eV Appear to be supercell energies

Configurations

Basic information explaining the types of configurations in the dataset, and how they are organized.
Elements should be listed by chemical symbol

Elements: C
Number of configurations: 20,195
Storage format: ASE

Naming convention

If your configurations have names, please describe where their names can be found (e.g., as a field in an ASE.Atoms.info dictionary).

Names can be generated by assigning indices to the configurations, prepended with their full path. For example: 216atom_amorphous/batch1/0, 216atom_amorphous/batch1/1, 216atom_amorphous/batch1/....

Configuration sets

Configuration sets are used to define a conceptual grouping over a collection of atomic configurations. Configuration sets are constructed via regex filtering on specified keys.

Key Regex Description
name 216atom_crystal/* Diamond structures containing 216 atoms at densities of 2.4, 2.6, 2.8, 3.0, 3.2, 3.4, and 3.5 g/cm3.
name 216atom_amorphous/* Trajectories from melt-quench simulations for configurations with 216 atoms
name 512atom_amorphous/* Trajectories from melt-quench simulations for configurations with 512 atoms

Configuration labels

Configuration labels can be attached to your data to improve interpretability. This is done via regex matching on specified keys.

Key Regex Label
process crystal diamond, crystal
process quench1 amorphous
process quench2 amorphous
process buffer_low amorphous
process buffer_high amorphous

Distribution License

The license under which the content will be distributed (e.g. Creative Commons Zero)

Creative Commons Attribution 4.0 International

@jvita
Copy link
Member Author

jvita commented Aug 26, 2023

Some notes that I took while I was going through this process:

  • We should specify a format for titles of issues/PRs
  • We should have a recommended format for DS names. e.g., _
  • It could be helpful if there was the option for a "short" DS description. We'd like the descriptions to be as rich as possible, but a shorter version would help with browsing.
  • Add a section for storage format (XYZ, HDF5, JSON, ...)
  • Change CS/CO regex matching to allow matching over a key other than the CO name
  • It was easy to mess up the formatting when inputting information. e.g., accidentally adding my information into the quote blocks
  • It might be helpful if there was a way to specify information that the contributor isn't sure if it's true or not. For example, I'm not 100% sure the units are eV and eV/A.

@jvita jvita added the dataset request Suggest a dataset to include in Colabfit Exchange label Nov 3, 2023
@gpwolfe gpwolfe added the duplicate This issue or pull request already exists label Nov 3, 2023
@gpwolfe
Copy link
Collaborator

gpwolfe commented Nov 3, 2023

This is a duplicate, but keeping open for comments regarding issues template

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dataset request Suggest a dataset to include in Colabfit Exchange duplicate This issue or pull request already exists
Projects
None yet
Development

No branches or pull requests

2 participants