A trial analysis API client stub #681

Nuanda · 2016-11-29T11:29:54Z

When finished it fixes #680

@teatree1212 per your request I started that with a client stub. Currently it finds the PT (by name), gets all TDs, PSUs and TSes related to that PT, and outputs a CSV document which pretty much resembles the one available at https://bip.earlham.ac.uk/trial_scorings/5.

Please take it from here and add further data. When in doubt how to query an individual table, see the 'Q' marker in the docs, or the definition of permitted_params inside individual model classes in the sources, to see what field is available for filtering.

teatree1212 · 2016-11-29T15:32:02Z

thanks! In this case, it would be also good to have the option of having it as a .json file as output. I will feed some objects into a workflow I am developing.

There is this Active model serialiser gem which to my understanding reduces the .json output. I think this would be good as some of the objects are not necessary. Have you used this before?

teatree1212 · 2016-11-29T17:14:26Z

When running the script, I get an error:
trial_analysis.rb:114:in block (3 levels) in <main>': undefined method []' for nil:NilClass (NoMethodError)
from trial_analysis.rb:114:in map' from trial_analysis.rb:114:in block (2 levels) in '
from trial_analysis.rb:112:in each' from trial_analysis.rb:112:in block in '
from /Users/hildegaa/.rvm/rubies/ruby-2.2.0/lib/ruby/2.2.0/csv.rb:1157:in generate' from trial_analysis.rb:110:in '
to have some output to look at, I went through the script and displayed the response using
puts JSON.pretty_generate(response)orJSON.pretty_generate(trait_scores) after the API queries, to see what has been queried, and commenting out everything else below with =begin ...=end.
I am starting to understand what things are doing.

I won't have time to do anything else until Monday. But be prepared for questions then (:

Nuanda · 2016-11-30T15:31:01Z

Are you sure you haven't changed that file? Mine has 111 lines and yours, guessing from the error output, has at least 114. Anyway, please try with the trial name 'whri_2005_GE2_02' on the BIP public server - it should work (just run it right now and it worked fine).

Nuanda · 2016-11-30T15:38:17Z

Re. JSON output - first you'd need to decide how to structure the output (by PSUs? by TDs?). Active Model is not available in the vanilla client script, unless you load it, but then again it can't be used as what you get are rather hashes and not the active models. But I think using JSON generator methods, as you do, is the way to go.

teatree1212 · 2016-12-05T14:52:20Z

You are right, I must have changed the script before running it.
To be super sure, I went back to master and checked out the remote branch again, to have your version. I get the same error:

N80569:client_example hildegaa$ ruby trial_analysis.rb U.Nottm_2016_RIPRleafminerals_REMLmeans api-key

Finding the Plant Trial

Found, plant_trial_id = 47

Loading all Trait Scores for this Plant Trial.

Progress: ......................................................
10780 Trait Scores loaded

Finding Trait Descriptors

The Trait Descriptors scored in this Plant Trial: ["Leaf Silver concentration", "Leaf Aluminium concentration", "Leaf Arsenic concentration", "Leaf Boron concentration", "Leaf Calcium concentration", "Leaf Cadmium concentration", "Leaf Chromium concentration", "Leaf Caesium concentration", "Leaf Copper concentration", "Leaf Iron concentration", "Leaf Potassium concentration", "Leaf Magnesium concentration", "Leaf Manganese concentration", "Leaf Molybdenum concentration", "Leaf Sodium concentration", "Leaf Nickel concentration", "Leaf Phosphorus concentration", "Leaf Lead concentration", "Leaf Rubidium concentration", "Leaf Sulphur concentration", "Leaf Selenium concentration", "Leaf Strontium concentration", "Leaf Titanium concentration", "Leaf Uranium concentration", "Leaf Vanadium concentration", "Leaf Zinc concentration", "mineral and ion content related trait", "cobalt concentration"]

Iterating through Plant Scoring Units
Generating output CSV to STDOUT
trial_analysis.rb:105:in block (3 levels) in <main>': undefined method []' for nil:NilClass (NoMethodError)
from trial_analysis.rb:105:in map' from trial_analysis.rb:105:in block (2 levels) in '
from trial_analysis.rb:104:in each' from trial_analysis.rb:104:in block in '
from /Users/hildegaa/.rvm/rubies/ruby-2.2.0/lib/ruby/2.2.0/csv.rb:1157:in generate' from trial_analysis.rb:102:in '

but I don't get an error when using your suggested trial.

I will now use the rest of my day to add to the script and test it on your trial name.

teatree1212 · 2016-12-05T17:18:30Z

I am failing in adding more columns to the current csv-stout.. could you have a look and give me a hint? I was trying to add the Plant_Accession_name as a column between the Plant Scoring Unit and the Trait scores.

Re. JSON output, I thought, I would start with plant_trials, then PSU, and within PSU, have PA and PL as well as TS and TD( and associated). does that make sense you think?

Nuanda · 2016-12-06T11:46:52Z

Done - see the changes. One more thing about the PAs is that you probably don't get them all, as the API returns first 50 hits by default (you can max it out to 200, but not beyond - an anti-attack measure). So you need to implement a loop - see one for trait scores, try to do something similar for PAs.

teatree1212 · 2016-12-06T13:29:23Z

thanks, will have a look.
would it be possible to get the active_support gem ? I keep reading that this is the best way to .select or .extract key- value pairs from a hash.

Nuanda · 2016-12-06T13:36:53Z

I guess you can do that, but remember that it introduces an external dependency - it means, all script users will need to install at least rubygems and the activesupport gem, along its own dependencies. For their sake, I'd advise to try to stay with vanilla ruby.

teatree1212 · 2016-12-06T14:13:52Z

okay..... I don't manage to select multiple keys ( e.g. score_valyes and value_type)..
I am also looping over the entire hash in a not very pretty manner. I think ruby can do better, but I don't..

Pleeeaase have a look for me, am at the banging my head stage (:

see recent commit

teatree1212 · 2016-12-06T14:41:48Z

with regards to your commit "showing PAs in the CSV output"
how to you make sure you link the correct plant_ accession name to the correct scoring unit name? and how do you call the plant_accession?
I don't really understand what the "data" object is I think.

Nuanda · 2016-12-06T16:00:49Z

PA - PSU link. When I get PSUs from BIP (step 4), they include their FKey value for related PAs (the FKey column name is plant_accession_id). I make sure to save them in the outputs hash. Then, in step 7, I find (detect in Ruby nomenclature) the correct PA using this FKey value - see current line 132. Having the correct PA, I extract its plant_accession column value in line 134.
outputs is a Hash (or an https://en.wikipedia.org/wiki/Associative_array) with keys and values. I use PSU.scoring_unit_name as keys and Hashes (yes, it's a Hash of Hashes, or a nested Hash - quite common in Ruby) as values. These values may have further multiple keys, to record data about a given PSU. At the moment we have plant_accession_id (why - see above) and trait_scores, containing all trait score objects for a given PSU.

teatree1212 · 2016-12-12T15:21:00Z

could you have a look at '5. -Finding Plant Accessions... '
I tried it in the manner you trait_scores loop, but I only get an empty JSON output, when printing it to stdout. When running the commented out section, I get 50 objects, as you said, but no more. Nothing comes up in the .csv.
Could you please have a look @Nuanda ?

Nuanda · 2016-12-13T09:27:38Z

@teatree1212 for demonstration I tried a different technique this time. Notable changes:

removed unnecessary second PSU loop and used the former one to record all PA_ids
uniq! removes all duplicates from an array 'in situ'
each_slice is a nice utility that chops an array into pieces - I use chunks of 200 as this is the maximum for a single BIP API get request
you can use similar technique to retrieve all PLs in step 6.

teatree1212 · 2016-12-13T16:47:19Z

a few notes on the next commit:
100 plant_lines are now being called, but not using the each_slice or uniq! function.

-data['plant_line_id'] doesn't exist or is empty or something in between (:
which is why it is not being displayed in the .csv

-data is part of the ouputs - hash, and as no connection is established between PL and PSU, there is no plant_line information in the output hash in general.

-I don't yet understand how to add the PLs to the outputs hash, as I find that that table is "too far away" ( via Plant_accessions)

to do:
apply Nuanda's similar technique to PLs in step 6.

to be able to generate a meaningful json output:

add accession_name to outputs
add plant_line and sequence_id to outputs.

teatree1212 · 2016-12-19T11:31:42Z

Notes:

outputs.each do |scoring_unit_name, data|

scoring_unit_name = hash
data = key
key contains keys and values itself.

Need to add plant_lines hashes to PSU.scoring_unit_name keys

http://www.slideshare.net/harkamalsingh355/ruby-data-types-and-objects
slide 9

teatree1212 · 2016-12-19T17:08:05Z

@Nuanda

Problem 1:
I have managed to link the entire PA-object with the PSU.scoring_unit_name key. What I intended to use this for is in the same fashion as you did for the trait_descriptors, to map the trait_descriptors object to the values of the data[trait_scores'], where the trait_descriptor_id is similar.
I get the following error undefined method `[]' for nil:NilClass (NoMethodError)
( my least favourite error message in Ruby)
Can you have a look and tell me what is wrong please?

Problem 2:
what I ideally want to create a more sophisticated outputs hash, so that I can get a nice JSON output with what I need for the workflow I am building.
-- having it look like this:
[
{"plant_trial_name": "whri_2005_GE2_02",
{
[
"su.WHRI2006_A215.P4_01": {
"plant_accession_id": 1459,
"trait_scores": [
{
"score_value": "4.629",
"trait_descriptor" : "leaf Silver content"
},
{
"score_value": "0.6052",
"Trait_descriptor": "leaf Iron content"
}
],
"plant_accession_name": "whri2005_A215",
"plant_line_name":"a_name",
"sequence_identifier": "SRRXXXX"
},
"su.WHRI2006_A215.P4_01": {
.....
}
]
}

I was trying to build something in line 162/3, Where i want to put the content of the plant_accessions['plant_accession'] into outputs[plant_scoring_unit['scoring_unit_name']]['plant_accession_name']
Can you explain to my why it is not working with the detect -function?

Nuanda · 2016-12-19T23:19:23Z

Dear @teatree1212

I don't think you need the loop in step "7." since the PA you are looking for is set in line 181 (in the plant_accession variable, in the context of each PSU you loop through)
Also please note an important difference between detect and select array methods - the former one return the first matching element (or nil, if not found), the latter one return all matching elements (or an empty array, if nothing was found) - even if there is only 1 matching element, select still returns an array, not just the element
Regarding "Problem 1" - I'm not sure I follow. What I do for TDs in line 190 is there because we need to traverse all TDs (for this given PT) per each PSU, since each PSU has (probably) scores for each TD. Remember lines 181-191 are executed in context of a single PSU and they, in fact, output a single CSV row - representing this PSU. So, this is probably not what you want to do for PAs.
If you need to output more columns about the PA (organisation, year produced, or similar) you can use the technique from line 185 (the ternary ?: operator is there to make sure the script does not trip on PSUs with no PA assigned).
If you need to output PL columns, you can add another detect right after the mentioned line 181 in this vein: plant_line = plant_accession ? plant_lines.detect{ |pl| pl['id'] == plant_accession['plant_line_id'] } : nil. Then, things like [plant_line ? plant_line['sequence_id']: ''] should start to work.
If this is not what you are after, please tell me exactly what do you want to do.
"Problem 2" - one problem at a time ;).

Nuanda · 2017-02-15T08:33:56Z

@teatree1212 Annemarie - since @kammerer created #702 I am closing this pull request. I will, however, leave the git branch in case you'd like to use it further.

a trial analysis API client stub

4ea83c3

Nuanda added the feature label Nov 29, 2016

Nuanda assigned teatree1212 Nov 29, 2016

teatree1212 assigned Nuanda and unassigned teatree1212 Nov 29, 2016

Nuanda assigned teatree1212 and unassigned Nuanda Nov 30, 2016

add Accession and Line queries

747c8b6

teatree1212 assigned Nuanda and unassigned teatree1212 Dec 5, 2016

showing PAs in the CSV output

829ea79

Nuanda assigned teatree1212 and unassigned Nuanda Dec 6, 2016

subhash exercise

146bc2b

teatree1212 assigned Nuanda Dec 6, 2016

Annemarie_Eckes added 2 commits December 12, 2016 12:20

adding plant_accession loop

a72d0e2

alter plant accession loop

e6d6483

Tomasz Gubała added 2 commits December 13, 2016 09:55

removed superfluous syntax

f3afcd3

full pa info

e6c90c1

more PLs are being selected

d2ee2d6

link PA-object and PSU in outputs

f5ab74e

changing name for one key in outputs

beb0a6e

kammerer added 6 commits February 3, 2017 10:57

Add modified trial analysis export script for York

3ad26d0

Skip unscored samples from the output CSV for York

67cc69e

Output plant line instead of accession in trial scoring CSV for York

ea94c7c

Relax Ruby version requirement to 2.2.x

ba94f70

Update York script messages

9da1a51

Tidy up York script a bit

f873040

kammerer mentioned this pull request Feb 14, 2017

Add a script for fetching trial scoring in TSV format for YorkAT #702

Merged

Nuanda closed this Feb 15, 2017

kammerer deleted the 680_trial_analysis_script branch January 18, 2018 19:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A trial analysis API client stub #681

A trial analysis API client stub #681

Nuanda commented Nov 29, 2016

teatree1212 commented Nov 29, 2016

teatree1212 commented Nov 29, 2016

Nuanda commented Nov 30, 2016

Nuanda commented Nov 30, 2016

teatree1212 commented Dec 5, 2016

teatree1212 commented Dec 5, 2016

Nuanda commented Dec 6, 2016

teatree1212 commented Dec 6, 2016

Nuanda commented Dec 6, 2016

teatree1212 commented Dec 6, 2016

teatree1212 commented Dec 6, 2016

Nuanda commented Dec 6, 2016

teatree1212 commented Dec 12, 2016 •

edited

Loading

Nuanda commented Dec 13, 2016

teatree1212 commented Dec 13, 2016

teatree1212 commented Dec 19, 2016 •

edited

Loading

teatree1212 commented Dec 19, 2016 •

edited

Loading

Nuanda commented Dec 19, 2016

Nuanda commented Feb 15, 2017

A trial analysis API client stub #681

A trial analysis API client stub #681

Conversation

Nuanda commented Nov 29, 2016

teatree1212 commented Nov 29, 2016

teatree1212 commented Nov 29, 2016

Nuanda commented Nov 30, 2016

Nuanda commented Nov 30, 2016

teatree1212 commented Dec 5, 2016

teatree1212 commented Dec 5, 2016

Nuanda commented Dec 6, 2016

teatree1212 commented Dec 6, 2016

Nuanda commented Dec 6, 2016

teatree1212 commented Dec 6, 2016

teatree1212 commented Dec 6, 2016

Nuanda commented Dec 6, 2016

teatree1212 commented Dec 12, 2016 • edited Loading

Nuanda commented Dec 13, 2016

teatree1212 commented Dec 13, 2016

teatree1212 commented Dec 19, 2016 • edited Loading

teatree1212 commented Dec 19, 2016 • edited Loading

Nuanda commented Dec 19, 2016

Nuanda commented Feb 15, 2017

teatree1212 commented Dec 12, 2016 •

edited

Loading

teatree1212 commented Dec 19, 2016 •

edited

Loading

teatree1212 commented Dec 19, 2016 •

edited

Loading