-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
submission of PP: allow list upload of existing lines #756
Comments
@Nuanda @wjurkowski Do you remember if there was any reason for not allowing selection of existing plant lines in such way (other than time constraint)? |
I strongly advise reading #494 from top to bottom, just to refresh memory. There are some posts discussing this. The main issue arose from new/existing PL/PA treatment, so the decision was to let the user add existing PLs inside the manual form, and submit only new PLs here. I guess some kind of simple heuristics will be needed if BIP is to accept existing PLs in the CSV. The most obvious one is to ignore all other columns related to a given PL if this PL is found (using its name) in the database. This reflects the behavior of the manual method, but may be counter-intuitive ("I've just uploaded this PL with a different attribute and now BIP lists it with another value for that attribute?"). |
I think it makes sense to ignore other columns and print a warning about that row. Otherwise, we could reject rows containing any data inconsistent with previously existing content but it probably would not improve efficiency without substantial changes to the UI (allowing to resolve inconsistencies directly). |
To recap previous discussions the complexity arises from the fact that
given plant line 1) could be present in different population types and 2)
could refer to different seed material used in different trials years
apart.
The meta-data we expect to be provided could be divided into two types
depending on type of relationship (1 to 1 or 1 to many).
A)
Taxonomy term/Species
Crop type
Plant variety
Female parental line
Male parental line
Genetic status sequence
In this case we expect unique relationships i.e. there should be one
specific value associated with the PL.
Let's assume a new set of values is submitted. If relevant fields are so
far empty - populate them. For instance, in a population without provided
information about parental lines some of the lines could actually have
known parents therefore in a new submission these parents could be added.
If not empty, either ignore the new content or compare content and stop if
conflict exists (ask to curate/double check content). Then, user - BIP
discussions could help correcting the existing content when substantial
evidence is provided, or in most cases, I presume, changes in the new data
would resolve the issue and allow the user to resubmit.
B)
Accession and related meta-data by 'definition' will vary between projects
simply because stocked seeds will have different project-specific
identifiers assigned. In fact they might undergo different real changes
depending on number of production cycles, conditions, natural irradiation,
interactions with microbiome etc, so it actually make sense to keep one PL
- many PA relationship.
PL - PA consistency is typically only important within specific project. PL
- PA conflict with previous projects (e.g. old PA accidentally reused) will
not have any impact as data for meta-analysis could be connected by PL
only. Of course the situation would change if PA would have clear
relationship with seed identifiers (coming from well managed seed bank) -
we should check for PL - PA correctness in such a case. Still even in this
case sequence (e.g. SRA identifier) will be much more robust way to track
provenience of genetic material across projects.
In case of submission of PL accompanied with PA data we could just add it
without checking against existing content for the above reasons. On the
other hand, to be strict we should check against existing content. 1) if PA
exists in relationship with different PL. 2) PA meta-data is different. In
both cases, submission should stop until conflict resolved. Ad.1: Most
likely PA reuse is incorrect due to human error; Ad.2: If someone is using
existing PA they should keep original ownership. If they claim new seed
ownership it indicates that a new PA should be generated.
Finally, if someone is not bothered with existing content, and they wish to
not change anything but just use existing PL in bulk, they would leave all
additional columns empty. This will equal to selecting existing PL manually
but would be more time-efficient.
I hope this makes sense and I covered all aspects.
…On 11 December 2017 at 15:42, Tomasz Szymczyszyn ***@***.***> wrote:
I think it makes sense to ignore other columns and print a warning about
that row. Otherwise, we could reject rows containing any data inconsistent
with previously existing content but it probably would not improve
efficiency without substantial changes to the UI (allowing to resolve
inconsistencies directly).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#756 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ADVH8L7Nj6jcOgYXJ7vW78jjCZsyV-WWks5s_U2CgaJpZM4QnW7Z>
.
|
Partially solved by #781. Existing plant lines can be added by file upload. However, they cannot yet be amended with new data. The uploaded data must match existing data. |
In case of new PP that is highly overlapping with existing PP manual selection is overly time consuming and repetitive.
Currently an attempt to upload mixed (new and existing) list produce an error:
"Ignored row for BnASSYST-229 since a plant line with that name is already present in BIP.Please use the 'Plant line list' field to add this existing plant line to the submitted population."
Optionally, file-based submission could allow updating PL record with accessions of new projects
The text was updated successfully, but these errors were encountered: