Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct the code to handle single item outcome factors #38

Open
roman-bm opened this issue Sep 21, 2016 · 18 comments
Open

Correct the code to handle single item outcome factors #38

roman-bm opened this issue Sep 21, 2016 · 18 comments

Comments

@roman-bm
Copy link

Dear Nicholas,
I have just receive the following error:

predictionMetrics <- validatePredict(data, smMatrix, mmMatrix,noFolds=10)
Error in dependentMatrix[testIndexes, ] : incorrect number of dimensions

how to sort it out?
Thanks.
regards,
roman

@roman-bm
Copy link
Author

PS in the validatePredict function ->> block #Segment your data by fold using the which() function, rows 64-65:
depTestData <- dependentMatrix[testIndexes, ]
depTrainData <- dependentMatrix[-testIndexes, ]

returns "Error in dependentMatrix[testIndexes, ] : incorrect number of dimensions"

My "testIndex" and "dependentMatrix" are as follows:

testIndexes
[1] 1 2 3 4 5 6 7 8 9
dependentMatrix
[1] 9.81 11.00 4.77 10.14 10.27 1.39 10.40 2.48 2.20 8.77 10.36 3.89 11.83 1.61 3.81 11.33 8.81 8.94 10.09 4.23 3.09
[22] 9.12 9.43 11.36 10.26 9.23 10.58 4.95 4.09 9.54 10.49 9.37 2.20 9.40 2.08 9.84 4.68 9.27 8.76 9.70 10.68 5.75
[43] 9.56 3.14 10.28 10.29 10.53 11.54 9.32 9.10 10.33 5.43 11.25 10.98 3.83 6.62 8.58 10.48 9.30 9.86 9.89 11.31 2.83
[64] 7.93 10.41 9.69 7.71 9.92 8.69 4.49 6.62 10.68 9.93 5.88 3.81 8.68 9.37 5.05 3.76 9.90 11.52 1.39 10.01

Could you please advise why?
many thanks in advance.
regards,
roman

@NicholasDanks
Copy link
Collaborator

Hi Roman

Thanks for your interest, and apologies for the slow response to your queries. I will get on top of this ASAP and should have an answer for you tomorrow morning, China time.

Kind regards,
Nick

Sent from my Samsung device

-------- Original message --------
From: roman-bm [email protected]
Date: 2016/09/23 17:40 (GMT+08:00)
To: ISS-Analytics/pls-predict [email protected]
Subject: Re: [ISS-Analytics/pls-predict] Error in dependentMatrix[testIndexes, ] : incorrect number of dimensions (#38)

PS in the validatePredict function ->> block #Segment your data by fold using the which() function, rows 64-65:
depTestData <- dependentMatrix[testIndexes, ]
depTrainData <- dependentMatrix[-testIndexes, ]

returns "Error in dependentMatrix[testIndexes, ] : incorrect number of dimensions"

My "testIndex" and "dependentMatrix" are as follows:

testIndexes
[1] 1 2 3 4 5 6 7 8 9
dependentMatrix
[1] 9.81 11.00 4.77 10.14 10.27 1.39 10.40 2.48 2.20 8.77 10.36 3.89 11.83 1.61 3.81 11.33 8.81 8.94 10.09 4.23 3.09
[22] 9.12 9.43 11.36 10.26 9.23 10.58 4.95 4.09 9.54 10.49 9.37 2.20 9.40 2.08 9.84 4.68 9.27 8.76 9.70 10.68 5.75
[43] 9.56 3.14 10.28 10.29 10.53 11.54 9.32 9.10 10.33 5.43 11.25 10.98 3.83 6.62 8.58 10.48 9.30 9.86 9.89 11.31 2.83
[64] 7.93 10.41 9.69 7.71 9.92 8.69 4.49 6.62 10.68 9.93 5.88 3.81 8.68 9.37 5.05 3.76 9.90 11.52 1.39 10.01

Could you please advise why?
many thanks in advance.
regards,
roman

You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
#38 (comment)

@NicholasDanks
Copy link
Collaborator

Hi Roman
I cannot reproduce your error.Please try the following:
What

  1. Run the example.R code 2. When you encounter your error, copy and paste the console output from the time you declare your variables(arguments to validatePredict), smMatrix and mmMatrix etc.
    This will help me to problem-solve.
    Alternatively, send me your data and model and I will fit it and return it to you so that you can play with it.
    Kind regards,Nick

Date: Fri, 23 Sep 2016 02:40:13 -0700
From: [email protected]
To: [email protected]
Subject: Re: [ISS-Analytics/pls-predict] Error in dependentMatrix[testIndexes, ] : incorrect number of dimensions (#38)

PS in the validatePredict function ->> block #Segment your data by fold using the which() function, rows 64-65:

depTestData <- dependentMatrix[testIndexes, ]

depTrainData <- dependentMatrix[-testIndexes, ]

returns "Error in dependentMatrix[testIndexes, ] : incorrect number of dimensions"

My "testIndex" and "dependentMatrix" are as follows:

testIndexes

[1] 1 2 3 4 5 6 7 8 9

dependentMatrix

[1] 9.81 11.00 4.77 10.14 10.27 1.39 10.40 2.48 2.20 8.77 10.36 3.89 11.83 1.61 3.81 11.33 8.81 8.94 10.09 4.23 3.09

[22] 9.12 9.43 11.36 10.26 9.23 10.58 4.95 4.09 9.54 10.49 9.37 2.20 9.40 2.08 9.84 4.68 9.27 8.76 9.70 10.68 5.75

[43] 9.56 3.14 10.28 10.29 10.53 11.54 9.32 9.10 10.33 5.43 11.25 10.98 3.83 6.62 8.58 10.48 9.30 9.86 9.89 11.31 2.83

[64] 7.93 10.41 9.69 7.71 9.92 8.69 4.49 6.62 10.68 9.93 5.88 3.81 8.68 9.37 5.05 3.76 9.90 11.52 1.39 10.01

Could you please advise why?

many thanks in advance.

regards,

roman


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

    Virus-free. www.avast.com

@roman-bm
Copy link
Author

Hello Nick,

many thanks for the reply. There is only one error (Error in dependentMatrix[testIndexes, ] : incorrect number of dimensions) so far, when colling
predictionMetrics <- validatePredict(russett, smMatrix, mmMatrix,noFolds=10)

(I use your variables names with my data)

Please find the scripts with the models and a dataset on Google Drive:
https://drive.google.com/drive/folders/0B7hmTYx_ULtscXV4cEtmSzFWUEE?usp=sharing

Many thanks in advance.

Best regards,
roman

@NicholasDanks
Copy link
Collaborator

Hi Roman
Thanks for your data and helping us to improve our function.
First: You've picked up a bug in the function. The function is expecting an outcome factor with more than 1 item measuring it and the code I wrote is therefore expecting a matrix. It therefore doesn't handle the single item construct. I will correct this.
Second: Are you sure that PLS is the best solution for this dataset? Maybe an alternate data mining technique could be best-used to predict on this data.
Third: If you are just using this data to play around and get a feeling for this function, please feel free to use another dataset with multiple items on the outcome factor. You are also welcome to fork and modify the code too - we always appreciate collaborators.
I will let you know when I have attended to the bug.
Kind regards,Nick

Date: Sat, 24 Sep 2016 04:59:50 -0700
From: [email protected]
To: [email protected]
CC: [email protected]; [email protected]
Subject: Re: [ISS-Analytics/pls-predict] Error in dependentMatrix[testIndexes, ] : incorrect number of dimensions (#38)

Hello Nick,

many thanks for the reply. There is only one error (Error in dependentMatrix[testIndexes, ] : incorrect number of dimensions) so far, when colling

predictionMetrics <- validatePredict(russett, smMatrix, mmMatrix,noFolds=10)

(I use your variables names with my data)

Please find the scripts with the models and a dataset on Google Drive:

https://drive.google.com/drive/folders/0B7hmTYx_ULtscXV4cEtmSzFWUEE?usp=sharing

Many thanks in advance.

Best regards,

roman


You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.

    Virus-free. www.avast.com

@NicholasDanks NicholasDanks changed the title Error in dependentMatrix[testIndexes, ] : incorrect number of dimensions Correct the code to handle single item outcome factors Sep 26, 2016
@roman-bm
Copy link
Author

Hello Nick,

many thanks for the reply. And great that you have indentified the bug.
Unfortunately, this should be done in the PLS way as a part of the project.

It will be highly appreciated if you have time and possibility to fix this bug.
Thank you for understanding.

Best regards,
roman

@roman-bm
Copy link
Author

Hello Nick,

I've failed to fix the error.
I know that you are very busy but could I ask you please to be so kind to correct it?

Thank you.

Best regards,
roman

@soumyaray
Copy link
Collaborator

@roman-bm: sorry i haven't been following this discussion too closely, but out of curiousity: do you have mediating factors in the model you wish to ultimately test? I ask because the pls-predict algorithm currently does not handle mediators too well (we will be adding that feature later this year).

@roman-bm
Copy link
Author

@soumyaray thank you for the message. rather no.
the model is quite simple. you can find the structure of the model shared on Google drive https://drive.google.com/drive/folders/0B7hmTYx_ULtscXV4cEtmSzFWUEE?usp=sharing

@soumyaray
Copy link
Collaborator

@roman-bm the document illustrating the model contains a mediator: PRE-RELEASE COMMUNITY BUZZ. I'm afraid our algorithm currently cannot handle this model appropriately. Given that fixing the one-item factor issue will take some time, and also that we will not be able to handle mediators until much later, I would suggest a different route.

You could take the factor scores from your estimated model (from R or SmartPLS) and run a linear regression or logistic regression on the factor scores. If you set up such a model in R, you can run the model through R's predict(..) function. Those predictions would be a bit better than the PLS predictions our system would generate, even if it could generate predictions right now.

I apologize for letting you down on this, but I sense that time is pressing for your project. My suggestion above would let you generate very good predictions, if indeed predictions is all you need at this time.

@roman-bm
Copy link
Author

@soumyaray thank you very much for your reply.
I need out-of-sample prediction to test a model predictibility. I was struggling with a cross-validation.
Could I ask you please whether I did it in the right way? it doesn't look to be ok...
File new.R is the new estimates along with cross validation. MANY thanks in advance.

@NicholasDanks
Copy link
Collaborator

@roman-bm I did a quick patch job on your bug and I got the code working.
Please change branch to single_item_bug and use the libraries in this branch for now. I will test fully and resolve by this weekend.

@roman-bm
Copy link
Author

@NicholasDanks Thank you Nick! predictionMetrics doesn't fully work but returns the statistics, right?

@NicholasDanks
Copy link
Collaborator

@roman it returns the kfold cross validated RMSE MAPE and MAD. You can specify k for number of folds. If you want straight point predictions use Plspredict and if you want the prediction intervals use predictionInterval. My example also has a nice pair of visualizations to use but these might need tweaking to suit your data. Xlim ylim etc. 

Sent from my Samsung device

-------- Original message --------
From: roman-bm [email protected]
Date: 2016/09/28 20:06 (GMT+08:00)
To: ISS-Analytics/pls-predict [email protected]
Cc: Nicholas Danks [email protected], Mention [email protected]
Subject: Re: [ISS-Analytics/pls-predict] Correct the code to handle single item outcome factors (#38)

@NicholasDanks Thank you Nick! predictionMetrics doesn't fully work but returns the statistics, right?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

@roman-bm
Copy link
Author

@NicholasDanks Many thanks!!

@soumyaray
Copy link
Collaborator

@roman-bm : hope this works for you! But please do keep in mind that if you have mediation in your model, the results might change in future versions of pls-predict.

@roman-bm
Copy link
Author

roman-bm commented Oct 6, 2016

@NicholasDanks are the statistics for each of these 10 folders saved in PLSSAD, PLSSAPE, PLSSSE matrices? thank you.

@roman-bm
Copy link
Author

roman-bm commented Oct 6, 2016

@NicholasDanks I mean, how to provide the statistics for each of the 10-folders estimations?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants