Skip to content

Improvements for featurizer #24

@PicoCentauri

Description

@PicoCentauri

I just talked with @sofiia-chorna about two enhancements of the PET-MAD featurizer:

1. Default None for environments

When running

import ase.io
import chemiscope
from pet_mad.explore import PETMADFeaturizer

featurizer = PETMADFeaturizer(version="latest")

frames = ase.io.read("dataset.xyz", ":")

I would naively use (as written in the chemiscope docs) only pass the list of Atoms as

features = featurizer(frames)

However, this will raise an Error because the function requires the second argument

https://github.com/lab-cosmo/pet-mad/blob/821776092c5b75ecd7a71f5084f13ddf5373cd0e/src/pet_mad/explore/_featurizer.py#L134

I would default environments to None to improve the usability.

2. Convenience function for FULL feature vector

The current feature vector returned by the featurize uses sketchmap and has three dimensions. I think it would be very useful for doing sample selection using FPS/CUR and friends to have an option to get the full feature vector. We could allow passing an additional parameter like reduce to set what should be done with the features before returned. I could, for example, imagine these

featurizer(frames, reduce="sketchmap")  # current behaviour
featurizer(frames, reduce=None)  # full features
featurizer(frames, reduce="pca")  # two 2-feature vector

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions