-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About csvLoader.loadDataSource #342
Comments
Tribuo has a row-wise view of data, and doesn't provide a data frame style interface. If you want something more like a dataframe in Java then I think JTablesaw is supposed to be good for that, but I've not used it much. |
Hi there, thanks for your quick reply. P. |
You can inspect the examples after they have been loaded to make sure the pipeline is valid. I recommend looking at We don't currently support loading from JTablesaw into Tribuo because we can't capture the necessary provenance & reproducibility information out of a tablesaw dataset. It would be pretty useful to have though, but due to the provenance issues we've not got around to it. |
Hi, thanks again. Yes, to have something like JTablesaw, and have that first load the CSV and then pass it onto like the CSVDataSource, I think would be really good, because you can pass on the responsibility of the "integrity" of the data to the Data Science person, because they are the subject matter experts, and they should be able to look into the DataFrame(in this case JTablesaw) and then decide that the data is in proper shape to pass into the CSVDataSource data structure. Allowing for "Human Intervention" especially at the Data-source part of the Data Pipeline, is very valuable to allow the Data Science person more control in the Data Quality aspect of the Data Pipeline. This type or kind, should be an option and should be available in Tribuo. So just wanted to elaborate on my thinking on this. P |
Hi there,
From this tutorial on regression:
https://github.com/oracle/tribuo/blob/main/tutorials/regression-tribuo-v4.ipynb
var wineSource = csvLoader.loadDataSource(Paths.get("winequality-red.csv"),"quality");
This wineSource, is a data structure, but don't see enough documentation.
I am assuming that wineSource here, is a tabular data structure, and hoping that it is similar to Python Pandas DataFrame.
If that is the case, is there a Print-Method, so one can print to the terminal to see the data.
There is not much out there on this.
Kind Regards,
Pablo
The text was updated successfully, but these errors were encountered: