Skip to content

Commit

Permalink
version 0.0.1
Browse files Browse the repository at this point in the history
  • Loading branch information
r-cemper committed Mar 21, 2024
1 parent 697a3b2 commit 691feee
Show file tree
Hide file tree
Showing 12 changed files with 3,216 additions and 70 deletions.
6 changes: 0 additions & 6 deletions .iris_init

This file was deleted.

81 changes: 61 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,27 +1,27 @@
# mini-docker
The package creates a very basic IRIS instance in Docker
It's a proposal for an instance independent of IPM versions.
## Description
This repository provides a generic development environment
for coding productively with InterSystems ObjectScript.
This template:
* Runs InterSystems IRIS Community Edition in a docker container
* besides ZPM it includes WEBTERMINAL and PASSWORDLESS package
* the namespace defaults to USER
* any additional setting is provided by additional package related installation
### Usage
The container is built directly from **intersystemsdc/iris-community** without any Dockerfile
- **bscript.sh** runs BEFORE IRIS is started
- **script.sh** is executed AFTER the start of IRIS and executes **iris.script** by default
- changing of port mapping happens in **docker-compose.yml**
# Vector-inisde-IRIS
This is an attempt to run a vector search demo comletely in IRIS
There are no external tools an all you need is a Terminal / console
and the managment portal
Special thanks to [Alvin Ryanputra](https://community.intersystems.com/user/alvin-ryanputra)
as his package [iris-vector-search](https://openexchange.intersystems.com/package/iris-vector-search)
was inspiration and source fot test data.
My package is based on IRIS 2024.1 release and requires attention to your processor.

I attempted to write the demo in pure ObjectScript only the calcualtion of
the description vector is done in embedded Python.
Calculatiion of a vection with 384 dimension over 2247 records takes time.
My Docker containe took 01:53:14 to geerate it.
So I adjuted that step to be reentrant to allow pausing generation.
Every 50 records you get an offer to have a stop.

Any suggestion for enhacements are very welcome,

### Prerequisites
Make sure you have [git](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git) and [Docker desktop](https://www.docker.com/products/docker-desktop) installed.
### Installation
Clone/git pull the repo into any local directory
```
$ git clone https://github.com/r-cemper/mini-docker.git
$ git clone https://github.com/rcemper/Vector-inside-IRIS.git
```
To build and start the container run:
```
Expand All @@ -41,5 +41,46 @@ To access IRIS System Management Portal
http://localhost:42773/csp/sys/UtilHome.csp
```
### How to use it
This presents OEX package [xxxxxxx]() using the actual IPM module
All user documentation is found there in the [original repo]()
From terminal just start
```
USER>do ^A.DemoV
Test Vector Search
=============================
1 - Initialize Tables
2 - Generate Data
3 - VECTOR_COSINE
4 - VECTOR_DOT_PRODUCT
5 - Create Scotch
6 - Load Scotch.csv
7 - generate VECTORs
8 - VECTOR search
Select Function or * to exit : 8
Default search:
Let's look for a scotch that costs less than $100,
and has an earthy and creamy taste
change price limit [100]: 50
change phrase [earthy and creamy taste]: earthy
calculating search vector
Total below $50: 222
ID price name
1990 40 Wemyss Vintage Malts 'The Peat Chimney,' 8 year old, 40%
1785 39 The Famous Jubilee, 40%
1868 40 Tomatin, 15 year old, 43%
2038 45 Glen Grant, 10 year old, 43%
1733 29 Isle of Skye, 8 year old, 43%
5 Rows(s) Affected
```
You see basic functionalities of Vectors in steps 1..4
Step 5..8 is the search example
6 import of test data is straight Objeect Script
SQL LOAD DATA was far to sensible for irregularities in the input CSV

I suggest to follow the examplea also in MGMT portal to see how Vectors operate


82 changes: 82 additions & 0 deletions SQLSyntax.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@

# Using Vectors in IRIS SQL

### Note: Please refer to the internal confluence page for updated docs.

## VECTOR (type, length)
**Optional parameters:**

- `type` - Optional, defaults to DOUBLE. The datatype of elements allowed to be stored in the vector. Can be DECIMAL, DOUBLE, INTEGER, TIMESTAMP, or STRING.
- `length` - Optional, can be specified only if type is also specified. An integer for the number of elements allowed to be stored in the vector. If specified, length restriction for INSERT INTO the vector column will be imposed.

### Creating a table with vector columns:
```sql
CREATE TABLE Test.Demo (vec1 VECTOR(DOUBLE,3))
CREATE TABLE Test.Demo (vec1 VECTOR(DOUBLE))
CREATE TABLE Test.Demo (vec1 VECTOR)
```
### Inserting into a table with vector columns:
```sql
INSERT INTO Test.Demo (vec1) VALUES ('0.1,0.2,0.3')
```
This query will succeed following any of the above three table creations. It will default to the table's vector type.

### Selecting from a table with vector columns:
```sql
SELECT * FROM Test.Demo
```


## SQL Functions

### TO_VECTOR (input, type, length)
**Parameters:**

- `input` - String value (VARCHAR) representing the vector contents in either of the supported input formats, "val1,val2,val3" (recommended), or "[ val1,val2, val3]"
- `type` - Optional, defaults to DOUBLE. The datatype of elements in the array, can be DECIMAL, DOUBLE, INTEGER, TIMESTAMP, or STRING.
- `length` - Optional. When specified, input will be padded with NULL values or truncated to the specified length, such that the result is a VECTOR of the specified length. The two-argument version of this function simply returns a vector with as many elements as the supplied list.

**Returns:** the corresponding vector to be added to tables or used in other vector operations.

**Example:**
```sql
INSERT INTO Test.Demo (vec1) VALUES (TO_VECTOR('0.1,0.2,0.3',DOUBLE, 3))
```
### VECTOR_COSINE (vec1, vec2)
**Parameters:**

- `vec1, vec2` - vectors

**Returns:** a double value of the cosine distance between the two vectors, taking value from -1 to 1.

**Example:**
```sql
SELECT * FROM Test.Demo WHERE (VECTOR_COSINE(vec1, TO_VECTOR('0.4,0.5,0.6')) < 0)
```
### VECTOR_DOT_PRODUCT (vec1, vec2)
**Parameters:**

- `vec1, vec2` - vectors

**Returns:** a double value of the dot product of two vectors.

**Example:**
```sql
SELECT * FROM Test.Demo WHERE (VECTOR_DOT_PRODUCT(vec1, TO_VECTOR('0.4,0.5,0.6')) > 10)
SELECT * FROM Test.Demo WHERE (VECTOR_DOT_PRODUCT(vec1, vec1) > 10)
```
## Nearest Neighbor Search
Getting the top 3 most similar vectors (to an input vector) from a table

**Using Cosine Similarity:**
```sql
SELECT TOP 3 * FROM Test.Demo ORDER BY VECTOR_COSINE(vec1, TO_VECTOR('0.2,0.4,0.6', DOUBLE)) DESC
```
**Using Dot Product:**
```sql
SELECT TOP 3 * FROM Test.Demo ORDER BY VECTOR_DOT_PRODUCT(vec1, TO_VECTOR('0.2,0.4,0.6', DOUBLE)) DESC
```
Note that we use 'DESC', since a higher magnitude for dot product/cosine similarity means the vector is more similar.

This can be combined with 'WHERE' clauses to add filters on other columns.

1 change: 1 addition & 0 deletions ascript.sh
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
cd /home/irisowner/dev
python3 -m pip install --target /usr/irissys/mgr/python sentence_transformers
iris view
iris session iris < iris.script
exit 0
2 changes: 0 additions & 2 deletions bscript.sh

This file was deleted.

9 changes: 2 additions & 7 deletions docker-compose.yml
Original file line number Diff line number Diff line change
@@ -1,14 +1,9 @@
version: '3.6'
services:
iris:
image: intersystemsdc/iris-community
# image: intersystemsdc/irishealth-community
image: intersystemsdc/iris-community:preview
restart: no
command: /iris-main
- -a /home/irisowner/dev/ascript.sh
- -b /home/irisowner/dev/bscript.sh
- --ISCAgent false

command: -a /home/irisowner/dev/ascript.sh
ports:
- 41773:1972
- 42773:52773
Expand Down
6 changes: 2 additions & 4 deletions iris.script
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,8 @@
// this should be the place for individual application code.

zn "USER"
zpm "install MDX2JSON"
zpm "install samples-bi"
do EnableDeepSee^%SYS.cspServer("/csp/user/")
zpm "load /home/irisowner/dev/ -v":1
zpm "list"

write !,$ZV,!
halt

13 changes: 13 additions & 0 deletions module.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
<?xml version="1.0" encoding="UTF-8"?>
<Export generator="Cache" version="25">
<Document name="vector-inside.ZPM">
<Module>
<Name>vector-inside</Name>
<Description>Vector Search with COS+ePy</Description>
<Version>0.0.1</Version>
<Packaging>module</Packaging>
<SourcesRoot>src</SourcesRoot>
<Resource Name="A.PKG"/>
</Module>
</Document>
</Export>
Loading

0 comments on commit 691feee

Please sign in to comment.