submission of PP: allow list upload of existing lines #756

wjurkowski · 2017-11-22T12:58:16Z

In case of new PP that is highly overlapping with existing PP manual selection is overly time consuming and repetitive.

Currently an attempt to upload mixed (new and existing) list produce an error:
"Ignored row for BnASSYST-229 since a plant line with that name is already present in BIP.Please use the 'Plant line list' field to add this existing plant line to the submitted population."

Optionally, file-based submission could allow updating PL record with accessions of new projects

kammerer · 2017-12-11T13:29:45Z

@Nuanda @wjurkowski Do you remember if there was any reason for not allowing selection of existing plant lines in such way (other than time constraint)?

Nuanda · 2017-12-11T13:51:27Z

I strongly advise reading #494 from top to bottom, just to refresh memory. There are some posts discussing this. The main issue arose from new/existing PL/PA treatment, so the decision was to let the user add existing PLs inside the manual form, and submit only new PLs here.

I guess some kind of simple heuristics will be needed if BIP is to accept existing PLs in the CSV. The most obvious one is to ignore all other columns related to a given PL if this PL is found (using its name) in the database. This reflects the behavior of the manual method, but may be counter-intuitive ("I've just uploaded this PL with a different attribute and now BIP lists it with another value for that attribute?").

kammerer · 2017-12-11T15:42:57Z

I think it makes sense to ignore other columns and print a warning about that row. Otherwise, we could reject rows containing any data inconsistent with previously existing content but it probably would not improve efficiency without substantial changes to the UI (allowing to resolve inconsistencies directly).

wjurkowski · 2017-12-12T09:54:56Z

To recap previous discussions the complexity arises from the fact that given plant line 1) could be present in different population types and 2) could refer to different seed material used in different trials years apart. The meta-data we expect to be provided could be divided into two types depending on type of relationship (1 to 1 or 1 to many). A) Taxonomy term/Species Crop type Plant variety Female parental line Male parental line Genetic status sequence In this case we expect unique relationships i.e. there should be one specific value associated with the PL. Let's assume a new set of values is submitted. If relevant fields are so far empty - populate them. For instance, in a population without provided information about parental lines some of the lines could actually have known parents therefore in a new submission these parents could be added. If not empty, either ignore the new content or compare content and stop if conflict exists (ask to curate/double check content). Then, user - BIP discussions could help correcting the existing content when substantial evidence is provided, or in most cases, I presume, changes in the new data would resolve the issue and allow the user to resubmit. B) Accession and related meta-data by 'definition' will vary between projects simply because stocked seeds will have different project-specific identifiers assigned. In fact they might undergo different real changes depending on number of production cycles, conditions, natural irradiation, interactions with microbiome etc, so it actually make sense to keep one PL - many PA relationship. PL - PA consistency is typically only important within specific project. PL - PA conflict with previous projects (e.g. old PA accidentally reused) will not have any impact as data for meta-analysis could be connected by PL only. Of course the situation would change if PA would have clear relationship with seed identifiers (coming from well managed seed bank) - we should check for PL - PA correctness in such a case. Still even in this case sequence (e.g. SRA identifier) will be much more robust way to track provenience of genetic material across projects. In case of submission of PL accompanied with PA data we could just add it without checking against existing content for the above reasons. On the other hand, to be strict we should check against existing content. 1) if PA exists in relationship with different PL. 2) PA meta-data is different. In both cases, submission should stop until conflict resolved. Ad.1: Most likely PA reuse is incorrect due to human error; Ad.2: If someone is using existing PA they should keep original ownership. If they claim new seed ownership it indicates that a new PA should be generated. Finally, if someone is not bothered with existing content, and they wish to not change anything but just use existing PL in bulk, they would leave all additional columns empty. This will equal to selecting existing PL manually but would be more time-efficient. I hope this makes sense and I covered all aspects.

…

On 11 December 2017 at 15:42, Tomasz Szymczyszyn ***@***.***> wrote: I think it makes sense to ignore other columns and print a warning about that row. Otherwise, we could reject rows containing any data inconsistent with previously existing content but it probably would not improve efficiency without substantial changes to the UI (allowing to resolve inconsistencies directly). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#756 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADVH8L7Nj6jcOgYXJ7vW78jjCZsyV-WWks5s_U2CgaJpZM4QnW7Z> .

kammerer · 2018-02-28T16:20:46Z

Partially solved by #781. Existing plant lines can be added by file upload. However, they cannot yet be amended with new data. The uploaded data must match existing data.

wjurkowski added the enhancement label Nov 22, 2017

wjurkowski assigned kammerer Nov 22, 2017

kammerer mentioned this issue Feb 22, 2018

Existing plant lines upload #781

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

submission of PP: allow list upload of existing lines #756

submission of PP: allow list upload of existing lines #756

wjurkowski commented Nov 22, 2017

kammerer commented Dec 11, 2017

Nuanda commented Dec 11, 2017

kammerer commented Dec 11, 2017

wjurkowski commented Dec 12, 2017 via email

kammerer commented Feb 28, 2018

submission of PP: allow list upload of existing lines #756

submission of PP: allow list upload of existing lines #756

Comments

wjurkowski commented Nov 22, 2017

kammerer commented Dec 11, 2017

Nuanda commented Dec 11, 2017

kammerer commented Dec 11, 2017

wjurkowski commented Dec 12, 2017 via email

kammerer commented Feb 28, 2018