Skip to content

Commit

Permalink
(EAI-483): ingest:all command deletes pages and embeddings not in dat…
Browse files Browse the repository at this point in the history
…a sources (#562)

* ingest:all command deletes pages and embeddings not in data sources, default behavior is soft delete pages

* refactor tests to use mongodb-memory-server, add tests for delete in ingest:all command
  • Loading branch information
yakubova92 authored Dec 13, 2024
1 parent 7f4e4a8 commit 1dd81fc
Show file tree
Hide file tree
Showing 10 changed files with 585 additions and 138 deletions.
221 changes: 221 additions & 0 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

20 changes: 16 additions & 4 deletions packages/mongodb-rag-core/src/contentStore/EmbeddedContent.ts
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,21 @@ export interface EmbeddedContent {
chunkAlgoHash?: string;
}

export type DeleteEmbeddedContentArgs = {
/**
The page for which to delete embedded content.
*/
page?: Page;
/**
The names of the data sources for which to delete embedded content.
*/
dataSources?: string[];
/**
If true, delete pages that do NOT match the data sources in the query.
*/
inverseDataSources?: boolean;
};

/**
Data store of the embedded content.
*/
Expand All @@ -72,10 +87,7 @@ export type EmbeddedContentStore = VectorStore<EmbeddedContent> & {
/**
Delete all embedded content for the given page and/or data sources.
*/
deleteEmbeddedContent(args: {
page?: Page;
dataSources?: string[];
}): Promise<void>;
deleteEmbeddedContent(args: DeleteEmbeddedContentArgs): Promise<void>;

/**
Replace all embedded content for the given page with the given embedded content.
Expand Down
Loading

0 comments on commit 1dd81fc

Please sign in to comment.