-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated files to include the notebooks and updated .yml file #2
base: main
Are you sure you want to change the base?
Conversation
Hi @iamleonie, please review the pull request and let me know if there are any changes that I need to make. |
Hi @tdubon, could you please add the following to the PR description:
Could you please remove the binary files from the PR (.DS_store and pycache) I will review the PR in detail tomorrow. |
Hi @iamleonie, sure. I deleted the named files and here is the additional information you requested: The original demo import.py was integrated into two Jupyter notebooks. The differences are as follows:
The original .yml file was updated to include the information needed for the new module. Let me know if you have any questions or need any other information. |
Added text to Usage and Example queries sections.
fixed list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @tdubon,
Thank you for your contribution. As these demo projects are not intended to be in Jupyter Notebook format but in runnable application format, it would be great if we could convert your Jupyter Notebooks into python files so other contributors, e.g. who are working on the frontend, can build on your work.
So, we would like to keep the helper.py and import.py files, move the contents of your Jupyter Notebooks there and remove the Jupyter Notebooks, that would be great.
I apologize for the additional effort. Thank you for your understanding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't delete these files as other contributors are building their solution on these files.
Docker_Weaviate.ipynb
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These demos do not intend to have Jupyter Notebooks. Instead, we are aiming to have standalone demo application.
Since you did a great job at describing each step, maybe it would be nice to add your explanations as comments in the import.py and helper.py files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To add to the comments from @iamleonie - I see that certain files (like helper.py
and import.py
have been removed).
The notebook as it is will throw an error at import helper
because helper.py
is missing. That will be remedied by restoring those files - but, just as a reminder, it's good to check that the notebook runs from start to finish.
Docker_Weaviate.ipynb
Outdated
"1. Run your virtual environment: conda activate /Users/your_path/environment_name OR source path_to_your_VR/bin/activate\n", | ||
"2. Download and run the yml image doc in this repo\n", | ||
"3. Run docker-compose up -d\n", | ||
"4. Run pip install -r requirements.txt" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you create a requirements.txt? It would be nice if you could commit it as well.
Docker_Weaviate.ipynb
Outdated
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"In the following cells we load the locally stored data (in json format) and create a function definition for an add_podcast object. \n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be great to move to import.py as a descriptive comment.
Docker_Weaviate.ipynb
Outdated
"source": [ | ||
"In the cell below we define the batch and the uuid.\n", | ||
"\n", | ||
"Batch definition is helpful because it's \"a way of importing/creating objects and references in bulk using a single API request to the Weaviate server.\" " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be great to move to import.py as a descriptive comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI a good starting batch is ~50-100 or so. A 'batch' sends data objects in groups to speed up import, so a batch size of 1 removes the benefit os using batches.
Typically only time you might use a batch size of 1 is to troubleshoot.
Docker_Weaviate.ipynb
Outdated
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Next you implement the pipeline and query your data, such as semantic search, generative search, question/answering. In this example we use nearText with the module text2vec-openai which implments text-embedding-ada-002. " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great if you could create a file called "query.py" and add this part there.
Docker_Weaviate.ipynb
Outdated
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"client.schema.delete_all()\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest using client.schema.delete_class("Podcast")
Docker_Weaviate.ipynb
Outdated
" \"title\": d[\"title\"],\n", | ||
" \"transcript\": d[\"transcript\"]\n", | ||
" }\n", | ||
" podcast_uuid = generate_uuid5('podcast', d[\"title\"] + d[\"transcript\"])\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
podcast_uuid
here does not get used. Recommend using it like so:
batch.add_data_object(
data_object=properties,
class_name= "Podcast",
uuid=podcast_uuid
)
Docker_Weaviate.ipynb
Outdated
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"#Question answering - search \n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI the query here is a semantic search. Question answering is a separate feature. So I would recommend updating the comment here.
Embedded_Weaviate.ipynb
Outdated
@@ -0,0 +1,221 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see the comments on the Docker based file as I think they apply here also.
Docker_Weaviate.ipynb
Outdated
"metadata": {}, | ||
"source": [ | ||
"In your terminal: \n", | ||
"1. Run your virtual environment: conda activate /Users/your_path/environment_name OR source path_to_your_VR/bin/activate\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the language here needs to be improved.
The canonical conda syntax is conda activate myenv
where myenv
can be the name or the path (source: https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#activating-an-environment).
Also, this line is confusing (path_to_your_VR
) - what is VR?
Instructions 2 and 3 are confusing as they look like parts of the same instruction. If they cloned the repo, they would not need to separately download this file.
I would suggest something like:
1. Create and activate a virtual environment, for example using conda or venv
2. Install the required libraries with `pip install -r requirements.txt`
3. Run Weaviate using Docker, for example with `docker-compose up -d`
Hi @iamleonie and @databyjp, I've reflected the changes that have been communicated in the data_import.py, query.py, and README files. Please note that the helper.py and import.py files should be deleted for the reasons mentioned before. Finally, please note that this is the last that I am able to contribute to the project. Any further changes will have to be delegated to someone else, if you so choose. Thanks again |
Thank you for incorporating the feedback. I will review the PR shortly. Please note that this may take a few days. |
Hi @tdubon, thank you for implementing most of the requested changes. I did an in-depth review of your changes and from my point of view, I would still require the following changes to merge this PR:
As you mentioned that you won't be able to make any further modifications please let us know how you'd like to proceed. |
(TO DO) | ||
|
||
## Setup instructions | ||
1. Set-up Weaviate: `docker-compose up -d`* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd actally prefer to keep this
|
||
![Screenshot 2022-03-29 191123](https://user-images.githubusercontent.com/72981484/160694464-38a49b47-cd8f-4492-ae25-1cffaa7d85c2.jpg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this have to be removed?
|
||
message = str(item["title"]) + ' imported' | ||
helper.log(message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't remove the logging functionality. This is a very helpful output in the console.
urllib3==2.0.6 | ||
validators==0.22.0 | ||
wcwidth==0.2.8 | ||
weaviate-client==3.24.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need all the above packages? I am assuming weaviate-client is sufficient and the rest could be removed?
Hi, I updated the files but I need help testing them as I don't have any OpenAI credits to get the final output.
Refers to issue: #1