Skip to content

Removing plantcv.utils module #1833

@joshqsumner

Description

@joshqsumner

Is your feature request related to a problem? Please describe.
We talked in dev meeting about removing the utils subpackage to simplify some of the dependencies in plantcv and remove stuff that isn't really used.

Describe the solution you'd like

There are 3 modules in plantcv.utils:

cli:

  • this just wraps the other functions to be usable on the command line.
  • Maybe this could just go away but I think at least the sample_images thing could make sense to run from command line since that is a natural place to go to move files around. I don't think json2csv or tabulate_bayes_classes need to be called on cli. If tabulate_bayes_classifier does need to be called on cli then that is still possible in the learn module which already has a cli module. The tutorial calls it from python without that seeming clunky, but we could preserve the cli option pretty easily.

sample_images:

  • This I would want to rework a lot. It should keep a commmand line utility but I would split it out into 2 helpers, one for a file path input and one/two for a plantcv.parallel.workflowconfig/jupyterconfig class object.
    • path input would do what it does now.
    • config input would find images that would be selected for the workflow.
    • both I think would rebuild the directory structure in your dest_path (could be a bool argument?).
    • I might be misreading these but it looks like the existing helpers are almost just pulled from plantcv.parallel.parsers, in which case I could probably apply nearly the same logic as in inspect_dataset and just import those functions since I'd be moving this to parallel (hard to imagine a non-parallel application, and if you have one then just load parallel and call this function, that isn't overly burdensome)

converters:

  • I think tabulate bayes classes can just be dropped into plantcv.learn as it is.
  • json2csv could move to parallel or to main plantcv.
    • Benefit of putting it in parallel is that it's simpler for the current use case.
    • Benefit of putting it in main plantcv is that we could still import it to parallel and we could wrap it in another function to collect non-parallel outputs (parallel.process_results for main plantcv when you've run a jupyter notebook a few times) and turn them into a csv.
      • I think moving this to parallel first might make more sense so that the PR for this wouldn't mess with main plantcv. Then later on we can have a different PR to make a plantcv.process_results and accompanying plantcv.json2csv helper function which maybe then get imported to parallel if that seems like it makes sense.

Describe alternatives you've considered
We could leave it as is, utils isn't a huge burden, it is just a little tangential and v5 is a good enough time to do it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    new featureNew feature ideas and solutions

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions