Updated files to include the notebooks and updated .yml file #2

tdubon · 2023-10-10T18:11:46Z

Hi, I updated the files but I need help testing them as I don't have any OpenAI credits to get the final output.

Refers to issue: #1

…cast /import.py

tdubon · 2023-10-11T14:25:57Z

Hi @iamleonie, please review the pull request and let me know if there are any changes that I need to make.

iamleonie · 2023-10-11T16:11:16Z

Hi @tdubon, could you please add the following to the PR description:

a short description of the changes you made
link to the issue this PR is referring to

Could you please remove the binary files from the PR (.DS_store and pycache)

I will review the PR in detail tomorrow.

…ocker

tdubon · 2023-10-11T17:27:31Z

Hi @iamleonie, sure. I deleted the named files and here is the additional information you requested:

The original demo import.py was integrated into two Jupyter notebooks. The differences are as follows:

method of connecting to the server - one notebook uses the embedded Weaviate client and the other uses docker.
text2vec_openai's vectorization module instead of the transformer, which requires GPU
Narrative explanations of each step.

The original .yml file was updated to include the information needed for the new module.

Let me know if you have any questions or need any other information.

Added text to Usage and Example queries sections.

fixed list

iamleonie

Hi @tdubon,
Thank you for your contribution. As these demo projects are not intended to be in Jupyter Notebook format but in runnable application format, it would be great if we could convert your Jupyter Notebooks into python files so other contributors, e.g. who are working on the frontend, can build on your work.

So, we would like to keep the helper.py and import.py files, move the contents of your Jupyter Notebooks there and remove the Jupyter Notebooks, that would be great.

I apologize for the additional effort. Thank you for your understanding.

Embedded_Weaviate.ipynb

helper.py

iamleonie · 2023-10-12T07:56:47Z

import.py

Please don't delete these files as other contributors are building their solution on these files.

docker-compose.yml

iamleonie · 2023-10-12T07:59:48Z

Docker_Weaviate.ipynb

These demos do not intend to have Jupyter Notebooks. Instead, we are aiming to have standalone demo application.
Since you did a great job at describing each step, maybe it would be nice to add your explanations as comments in the import.py and helper.py files?

To add to the comments from @iamleonie - I see that certain files (like helper.py and import.py have been removed).

The notebook as it is will throw an error at import helper because helper.py is missing. That will be remedied by restoring those files - but, just as a reminder, it's good to check that the notebook runs from start to finish.

iamleonie · 2023-10-12T08:00:59Z

Docker_Weaviate.ipynb

+    "1. Run your virtual environment:  conda activate /Users/your_path/environment_name OR source path_to_your_VR/bin/activate\n",
+    "2. Download and run the yml image doc in this repo\n",
+    "3. Run docker-compose up -d\n",
+    "4. Run pip install -r requirements.txt"


Did you create a requirements.txt? It would be nice if you could commit it as well.

Docker_Weaviate.ipynb

iamleonie · 2023-10-12T08:01:56Z

Docker_Weaviate.ipynb

+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In the following cells we load the locally stored data (in json format) and create a function definition for an add_podcast object. \n",


This would be great to move to import.py as a descriptive comment.

iamleonie · 2023-10-12T08:02:03Z

Docker_Weaviate.ipynb

+   "source": [
+    "In the cell below we define the batch and the uuid.\n",
+    "\n",
+    "Batch definition is helpful because it's \"a way of importing/creating objects and references in bulk using a single API request to the Weaviate server.\" "


This would be great to move to import.py as a descriptive comment.

FYI a good starting batch is ~50-100 or so. A 'batch' sends data objects in groups to speed up import, so a batch size of 1 removes the benefit os using batches.

Typically only time you might use a batch size of 1 is to troubleshoot.

iamleonie · 2023-10-12T08:03:01Z

Docker_Weaviate.ipynb

+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Next you implement the pipeline and query your data, such as semantic search, generative search, question/answering. In this example we use nearText with the module text2vec-openai which implments text-embedding-ada-002. "


It would be great if you could create a file called "query.py" and add this part there.

databyjp · 2023-10-12T09:40:24Z

Docker_Weaviate.ipynb

+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "client.schema.delete_all()\n",


Suggest using client.schema.delete_class("Podcast")

databyjp · 2023-10-12T09:47:31Z

Docker_Weaviate.ipynb

+    "                \"title\": d[\"title\"],\n",
+    "                \"transcript\": d[\"transcript\"]\n",
+    "            }\n",
+    "            podcast_uuid = generate_uuid5('podcast', d[\"title\"] + d[\"transcript\"])\n",


podcast_uuid here does not get used. Recommend using it like so:

batch.add_data_object( data_object=properties, class_name= "Podcast", uuid=podcast_uuid )

databyjp · 2023-10-12T09:51:11Z

Docker_Weaviate.ipynb

+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#Question answering - search \n",


FYI the query here is a semantic search. Question answering is a separate feature. So I would recommend updating the comment here.

databyjp · 2023-10-12T09:53:29Z

Embedded_Weaviate.ipynb

@@ -0,0 +1,221 @@
+{


Please see the comments on the Docker based file as I think they apply here also.

databyjp · 2023-10-12T10:01:58Z

Docker_Weaviate.ipynb

+   "metadata": {},
+   "source": [
+    "In your terminal:  \n",
+    "1. Run your virtual environment:  conda activate /Users/your_path/environment_name OR source path_to_your_VR/bin/activate\n",


I think the language here needs to be improved.

The canonical conda syntax is conda activate myenv where myenv can be the name or the path (source: https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#activating-an-environment).

Also, this line is confusing (path_to_your_VR) - what is VR?

Instructions 2 and 3 are confusing as they look like parts of the same instruction. If they cloned the repo, they would not need to separately download this file.

I would suggest something like:

1. Create and activate a virtual environment, for example using conda or venv 2. Install the required libraries with `pip install -r requirements.txt` 3. Run Weaviate using Docker, for example with `docker-compose up -d`

tdubon · 2023-10-17T14:04:07Z

Hi @iamleonie and @databyjp, I've reflected the changes that have been communicated in the data_import.py, query.py, and README files. Please note that the helper.py and import.py files should be deleted for the reasons mentioned before.

Finally, please note that this is the last that I am able to contribute to the project. Any further changes will have to be delegated to someone else, if you so choose.

Thanks again

iamleonie · 2023-10-19T06:48:54Z

Thank you for incorporating the feedback. I will review the PR shortly. Please note that this may take a few days.

iamleonie · 2023-10-23T18:56:33Z

Hi @tdubon, thank you for implementing most of the requested changes.

I did an in-depth review of your changes and from my point of view, I would still require the following changes to merge this PR:

Revert the usage of text2vec_openai's vectorization to transformer. From what I can see in the original docker-compose.yml, GPU usage is disabled and thus the usage of the transformer doesn't require GPU.
Remove new files podcast_ds2.json and accompanying data_import.py: From what I understand data_import.py is very similar to import.py with the difference that the reduce podcast_ds2.json file is used instead of the original podcast_ds.json file. Having a reduced file for local testing can be helpful but I would not commit it to the repository.
I will leave some smaller comments directly in the code.

As you mentioned that you won't be able to make any further modifications please let us know how you'd like to proceed.

README.md

iamleonie · 2023-10-23T18:57:57Z

README.md

-(TO DO)
-
-## Setup instructions
-1. Set-up  Weaviate: `docker-compose up -d`*


I'd actally prefer to keep this

README.md

iamleonie · 2023-10-23T18:58:40Z

README.md


-![Screenshot 2022-03-29 191123](https://user-images.githubusercontent.com/72981484/160694464-38a49b47-cd8f-4492-ae25-1cffaa7d85c2.jpg)  


Why does this have to be removed?

README.md

import.py

iamleonie · 2023-10-23T19:02:50Z

import.py


-            message = str(item["title"]) + ' imported'
-            helper.log(message)


Please don't remove the logging functionality. This is a very helpful output in the console.

query.py

iamleonie · 2023-10-23T19:03:59Z

requirements.txt

+urllib3==2.0.6
+validators==0.22.0
+wcwidth==0.2.8
+weaviate-client==3.24.2


Do we really need all the above packages? I am assuming weaviate-client is sufficient and the rest could be removed?

tdubon and others added 9 commits October 4, 2023 08:30

This Jupyter notebook is intended to replace DEMO-semantic-search-pod…

be09de9

…cast /import.py

Edits to the text and added nearText example.

3897004

Deleted text2vec-cohere

d57349b

Draft update to README.md

3dd4ba5

Changed the key parameter name

e57224c

Updated README.md

b551da9

Update Embedded_Weaviate.ipynb

2784616

Add files via upload

147908b

Updated docker-compose.yml for new module

562aaea

tdubon closed this Oct 11, 2023

tdubon reopened this Oct 11, 2023

iamleonie self-requested a review October 11, 2023 16:11

tdubon added 5 commits October 11, 2023 10:15

Edited notebook to include an example connecting to the server with d…

f4c0897

…ocker

Deleted .py files that are no longer needed

850fac7

Deleted pychace and .DS_Store files

129260b

Delete .DS_Store file

98e6b98

Second attempt to delete .DS_Store

1fecf2b

tdubon and others added 4 commits October 11, 2023 10:29

Adding Embedded_Weaviate.ipynb file back to repository

b02f19e

Removed my key

6bea8a2

Update README.md

5943e46

Added text to Usage and Example queries sections.

Update README.md

de50725

fixed list

iamleonie requested changes Oct 12, 2023

View reviewed changes

databyjp reviewed Oct 12, 2023

View reviewed changes

Embedded_Weaviate.ipynb Outdated

@@ -0,0 +1,221 @@

{

Copy link

databyjp Oct 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see the comments on the Docker based file as I think they apply here also.

databyjp reviewed Oct 12, 2023

View reviewed changes

tdubon added 7 commits October 12, 2023 09:09

added helper.py and import.py file back to repo

6085913

Updated files based on reviewer feedback

a071916

added requirements file

db7fe4e

Updated files based on reviewer feedback

02b67b6

Updated files based on reviewer feedback

c4904ba

removed keys from .yml file

235a6f4

removed keys

0dde0a9

tdubon added 3 commits October 18, 2023 11:27

Revised files to address test failures

62bc444

deleted __pycache__

6f28974

deleted .DS_Store file

e645981

iamleonie requested changes Oct 26, 2023

View reviewed changes

tdubon added 2 commits October 26, 2023 11:22

Changed the files to use text2vec-transformers

1202d73

added helper function

abe034d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updated files to include the notebooks and updated .yml file #2

Updated files to include the notebooks and updated .yml file #2

tdubon commented Oct 10, 2023 •

edited by iamleonie

Loading

tdubon commented Oct 11, 2023

iamleonie commented Oct 11, 2023

tdubon commented Oct 11, 2023

iamleonie left a comment

iamleonie Oct 12, 2023

iamleonie Oct 12, 2023

databyjp Oct 12, 2023

iamleonie Oct 12, 2023

iamleonie Oct 12, 2023

iamleonie Oct 12, 2023

databyjp Oct 12, 2023

iamleonie Oct 12, 2023

databyjp Oct 12, 2023

databyjp Oct 12, 2023

databyjp Oct 12, 2023

databyjp Oct 12, 2023

databyjp Oct 12, 2023 •

edited

Loading

tdubon commented Oct 17, 2023

iamleonie commented Oct 19, 2023

iamleonie commented Oct 23, 2023

iamleonie Oct 23, 2023

iamleonie Oct 23, 2023

iamleonie Oct 23, 2023

iamleonie Oct 23, 2023


		![Screenshot 2022-03-29 191123](https://user-images.githubusercontent.com/72981484/160694464-38a49b47-cd8f-4492-ae25-1cffaa7d85c2.jpg)


		message = str(item["title"]) + ' imported'
		helper.log(message)

Updated files to include the notebooks and updated .yml file #2

Are you sure you want to change the base?

Updated files to include the notebooks and updated .yml file #2

Conversation

tdubon commented Oct 10, 2023 • edited by iamleonie Loading

tdubon commented Oct 11, 2023

iamleonie commented Oct 11, 2023

tdubon commented Oct 11, 2023

iamleonie left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

databyjp Oct 12, 2023 • edited Loading

Choose a reason for hiding this comment

tdubon commented Oct 17, 2023

iamleonie commented Oct 19, 2023

iamleonie commented Oct 23, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tdubon commented Oct 10, 2023 •

edited by iamleonie

Loading

databyjp Oct 12, 2023 •

edited

Loading