Replies: 6 comments 10 replies
-
Did I understand this right - the intention is to keep |
Beta Was this translation helpful? Give feedback.
-
This is a great idea! Even though those lists above are long, I'd venture that 80% of the use cases of Scikit Learn are one of Models:
Model Selection
And All preprocessors (which are mostly done in Danfo)
So I think we can be productive here, and move some of the machine learning into the JS space. Also, I think we should probably call this repo "Scikit.js" if it's not already taken. Then we win the "marketing" battle of having people easily find the repo when they search on npm or Google. Anywho, the best plan of action might just be creating the new repo and everybody taking a different Estimator above. |
Beta Was this translation helpful? Give feedback.
-
@dcrescim and anyone who's interested, the new base repo is up (https://github.com/opensource9ja/scikit.js). I ported encoders and scalers we have in Danfo.js as well. You can start adding issues and feature ideas. See code style guide here |
Beta Was this translation helpful? Give feedback.
-
@risenW this a cool idea, and base on the list from @dcrescim won't it be cool to go with https://github.com/imgcook/datacook since they are doing something related and the scope is still small. or is the projection of this https://github.com/opensource9ja/scikit.js different from that of datacook? |
Beta Was this translation helpful? Give feedback.
-
@steveoni @risenW Holy moly, I love this https://github.com/imgcook/datacook project! Anywho, after taking a more in-depth look through datacook, I think it's amazing but I have some ideas that I think would make it even better. Maybe we can even hop on a Zoom call, so I can pitch 'em. The main points are as follows: 1. I think the name should be related to scikit-learn somehow. Both the project name, and the npm name.I don't think this is a minor point. I think there are a large contingent of users that google things like "Scikit learn js" or "scikit in javascript" and we would want to capture that attention, and collect users and contributors. The network effect is real, and the more people that find it on NPM, or Google will drastically increase the quality of the code as new users and contributors are drawn to it which will make it even better which will .... you get the idea. A never-ending virtuous cycle of improvement is contingent on usage. In terms of failing the "search problem", I'm living proof of that. I just found out datacook exists today (and it's great), and I've routinely searched the above queries from time to time, and was never pointed in the direction of datacook (despite it having nearly identical high quality Estimators). 2. I think it should have a near identical api with scikit-learnObviously a large majority of users who do machine learning do it in python. The 3 biggest libraries for machine learning are probably TensorFlow, PyTorch, and Scikit-Learn. While the first two have ways of taking your model and shipping it to the browser / mobile phone (Tensorflow.js), the 3rd does not. And I think there is a real need for that. Something like 87% of all machine learning models die in a Jupyter Notebook . They don't make it to production. The reason is that it is hard to ship these models to the client. Imagine for a second that you work at a company that trains models in scikit-learn and you are tasked with creating a JS version because it needs to exist on the client. Wouldn't it be nice to have the same "get_params", "set_params" API that exists on every Estimator in scikit-learn? Imagine how awesome it would be to train a model in python then call "get_params" to export the coefficients. Then create the identical Estimator in our library and simply take the coefficients and call "set_params" in the JS Estimator. Voila! This all depends on whether our estimators have the same "get_params", "set_params" functions. Looking at datacook, the LinearRegressor doesn't have that yet. That's a simple fix though. But why stop there?Imagine a user who, like our hypothetical above, trains models in python and then ships them to JS. Let's even imagine that he found out about that slick "get_params", "set_params" trick. I'm sure he is happy for awhile but I'm also sure that they would get tired of copying/pasting coefficients between languages. So he begins to peruse our docs, and lo and behold, the API is near identical. Meaning that he can take his training code in python (which let's estimate is hopefully under 300 lines of preprocessing, validation, and model building), and within an hour is able to completely use the equivalent classes in JS. BAMMM!! We just got ourselves another user and a useful one! This person will try to smooth the rough edges between JS and python. And they'll try to make things faster because they are used to faster speeds in python (mostly because they have more access to low level libraries). So great! The virtuous cycle of improvement continues. So to make all of these concerns concrete with an example, let's consider the LinearRegressor in datacook. If it had the exact same methods as the one in Scikit-Learn, namely fit, predict, get_params, set_params, and score, I'm sure more folks would jump at the opportunity to use it and it would be easier to grow the library. What do you guys think? |
Beta Was this translation helpful? Give feedback.
-
Do you have a linear algebra library to take dependency on for managing matrix operation, or you want to stick to vanilla JavaScript operation? If you have any linear algebra to depend on. Please let us know as it can have impact of possible contributions. |
Beta Was this translation helpful? Give feedback.
-
A new repo for ML-related stuff. This will consolidate all ML features liken Scalers, Encodings, Estimators into a separate repo-backed by Danfo DataFrame and Tensorflow.js Tensors, and it can be managed from there.
We can model it like Scikit-learn, which we already do with Scalers and Encodings.
Feature ideas can be:
General ideas and contributions are welcome here.
cc @GantMan @javierluraschi @steveoni @callmekatootie
Beta Was this translation helpful? Give feedback.
All reactions