Persisting vector of strings in the database #170
-
Hi Silvian, So if I were to concatenate the list of vectors to a single string, how can I search for a substring in the cottontail database? I.e. which RelationalOperator could I use? Best, Ribin |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 3 replies
-
Maybe @ppanopticon can chime in on how |
Beta Was this translation helpful? Give feedback.
-
Currently, the LIKE Operator in Cottontail DB is executed as if it was a SQL LIKE, i.e., only wildcards (% and _) are supported. Internally, these expressions are converted to RegEx or Lucene, depending on whether an Index is available. The RelationalOperators in Cineast (LIKE, NLIKE, RLIKE etc.) are all mapped to Cottontail DB LIKE operators, since there is currently no support for other types of LIKEs in Cottontail DB. If you want to query for a simple substring, an expression such as LIKE "%substring%" should actually do the trick. If we wanted to extend the support to, for example, full Lucene Query Parser Syntax support, we would need to add it to Cottontail DB first, which is very well possible and simply a matter of proper specification. |
Beta Was this translation helpful? Give feedback.
-
Thanks, using LIKE worked to query for a substring(even without the wildcards). But what would I need to implement to achieve the negation (i.e. not like) operator? Maybe it would help if I explain my concrete use case: |
Beta Was this translation helpful? Give feedback.
-
To summarize for future reference: There are two options:
The solution which was best for the specific usecase was to go with a different schema, which made queries and persisting massively simpler |
Beta Was this translation helpful? Give feedback.
To summarize for future reference: There are two options:
AbstractTextRetriever
, which is to useAttributeType.TEXT
for the specific column and then useselector.getFulltextRows(k, attributename, config, terms...)
which allows you to use the full spectrum of text retrieval features provided by Lucene (which is what cottontail uses for the text retrieval). This needs a textindex, which is automatically created when you use the entitycreator. It also means you need to optimize the entity after your import / extraction has finished.The solution which was best for the specific usecase was to go w…