Skip to content

Commit

Permalink
QA release for 2023-08-24 (#106)
Browse files Browse the repository at this point in the history
* Implement page titles and prepending frontmatter to chunks (#64)

* Implement page titles and prepending frontmatter to chunks

* Implement updating front matter + tests

* Leave title undefined if not specified by snooty data

* (DOCSP-31639): fix timeout err in tests (#70)

fix test err

* (DOCSP-32066): Q&A server reduce amount of backoff on embedding (#68)

reduce amount of backoff on embedding

* Update Chat UI Packaging (#71)

* Switch css to Emotion

* Reduce ChatGPT API flakiness + faster timeout when flaky (#69)

* reduce flakiness + faster timeout when flaky

* implement CB feedback

* (DOCSP-31925): Support tables in parsed snooty MD  (#67)

* handle tables as HTML tables

* start tests

* nested tables

* cleanup

* Fix list parsing within tables

Use a state enum rather than bool for parent table

* Update ingest/src/snootyAstToMd.ts

Co-authored-by: Nick Larew <[email protected]>

* Implement multiple header rows (??)

---------

Co-authored-by: Chris Bush <[email protected]>
Co-authored-by: Chris Bush <[email protected]>
Co-authored-by: Nick Larew <[email protected]>

* (DOCSP-32070): LLM preprocessing on user queries (#72)

* make llm preprocessor

* Add todo

* add 1 word expansion

* add frontmatterUpdater() to chat-core

* append metadata to vector search query

* hook up preprocessor to chat

* test non-processing negative response

* trigger rebuild

* clean up

* frontmatter -> frontMatter

* rename

* rename

* implement CB feedback

* edits

* Add more context

* small fixes

* Refactor `ConversationService` class as `makeConversationService` func (#76)

refactor conversation service as make func

* (DOCSP-31111) [UI] publish react component to npm (#73)

* (DOCSP-31575): LLM qualitative testing framework (#74)

* start stubbing

* add check response quality func

* works in editor, but doesn't compile

* runs but doesn't look correct in editor

* working in VSCode and compiling

* all working

* working again with separate jest suite

* remove console log

* remove typechat from top lvl

* Add back to tsconfig

* remove await

* Fixes

* add test using framework

* comment out skipped tests to appease linter

* Add database index creation scripts (#75)

* (DOCSP-31106) [DEL] Set up TTL on user conversations

* (DOCSP-31622) [INGEST] Indexes for embedded_content and pages collections

* (DOCSP-32183 & DOCSP-32226): Automate test creation based on YAML file (#77)

* Add yaml tests

* hypothesis -> name

* correct path to yaml file

* create scripts project

* script to create test YAML files

* remove build dir

* (DOCSP-32217): Change chunker to chunk based on number of tokens + add max context tokens per message (#79)

* add max context tokens per message

* Trigger

* trigger

* strip frontmatter before sending docs to LLM

* refactor removeFrontMatter to use  package

* Refactor includeChunksForMaxTokensPossible() per CB feedback to use findIndex

* (DOCSP-31440, DOCSP-32075): Add tags to chunk metadata (#78)

* (DOCSP-31440): Add tags to chunk metadata

* (DOCSP-32075): Replace 'tags' with 'metadata' in embedded content document

- This probably has no practical effect, but would allow for more flexibility in future vector searching or chunk filtering.

* Fix seed-data for new format

* Update

* Remove explicit fetch import

* Define engines to remove EBADENGINE warning (#82)

* (DOCSP-32194): Fix drone and k8s for ingest/chat-server staging (#80)

* Rebuild services

* Trigger staging deploy

* Trigger ingest build

* fix build err

* clean up drone file + PR

* (DOCSP-32206): Strip comments from rst -> md (#84)

* (DOCSP-32434) [UI] Add error text for non-modal input (#87)

* (DOCSP-32253): Add more semantically relevant product names to metadata (#83)

* Rebuild services

* Trigger staging deploy

* Trigger ingest build

* clean up drone file + PR

* refactor with Page.metadata

* add test for arbitrary metadata

* update EmbeddedContent description

* update MongoDbUserQueryPreprocessorResponse for greater semantic meaning

* update pre-processor tests

* implement review feedback

* fix broken tests

* Fix lint err

* (DOCSP-32362): Improve conversation request logging  (#89)

improve request logging

* (DOCSP-31343): Add system diagram and info for ingest (#86)

* Add ingest system diagram

* Move diagram to README and add some info

* (DOCSP-32227): Improve OpenAPI spec ingestion (#88)

* checkpoint

* integrate changes w project

* clean up + add todo

* Draft atlas spec handling

* make func async for network call

* finish tests

* fix broken tests

* implement cb feedback + merge fixes

* Restore 'tags' and use 'tagsIn' only internally (#91)

* (DOCSP-32242): Handle tabbed Snooty content (#92)

* add tabs to page + handle chunking

* add more table-based delimiters

* CB feedback + clean up

* (DOCSP-32490) [UI] Link to current commit on GitHub (#90)

* (DOCSP-32243) Handle and test pages with page-level code block select (#94)

* handle and test pages with page-level codeblock select

* clean up comment

* add spacing around tables

* refactor per CB feedback

* Refactor to remove undefined case which should never occur

* (DOCSP-32155) [UI] UX Feedback (#81)

* (DOCSP-32363) [UI] handle cases where the LLM stops mid code example (#85)

* Drone fix (#97)

* update drone file for testing

* add trigger

* fix and trigger

* fix and trigger

* trigger chat server build

* cean up tmp changes

* remove triggers

* fix handing promise

* fix hanging promise in async recursive operation (#98)

fix handing promise

* fix handling of openapi specs (#99)

* (DOCSP-32247): Fix preprocessed content not added to DB (#100)

fix preprocessed content not added to DB

* (DOCSP-32104): Do not serve demo site in prod environment (#101)

only serve staging site if env not prod

* (DOCSP-32596): Set up CORS on the server (#103)

* draft CORS setup

* add tests

* (DOCSP-32452): Dev Center data source remove `<img>` and `<div>` tags + youtube directive (#93)

dev center data source remove img and div tags + youtube directive

* Update QA server DB (#105)

update QA server DB

---------

Co-authored-by: Chris Bush <[email protected]>
Co-authored-by: Nick Larew <[email protected]>
Co-authored-by: Chris Bush <[email protected]>
Co-authored-by: Nick Larew <[email protected]>
  • Loading branch information
5 people authored Aug 24, 2023
1 parent 5e6866a commit 569a9c6
Show file tree
Hide file tree
Showing 122 changed files with 41,134 additions and 1,074 deletions.
51 changes: 42 additions & 9 deletions .drone.yml
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,13 @@ trigger:
- promote
target:
- staging
paths:
include:
- chat-server/**/*
- chat-core/**/*
- chat-ui/**/*
branch:
- main

steps:
# Deploys docker image associated with staging build that triggered promotion
Expand All @@ -128,15 +135,12 @@ steps:
chart_version: 4.12.3
add_repos: [mongodb=https://10gen.github.io/helm-charts]
namespace: docs
release: chat-server
release: chat-server-staging
values: image.tag=git-${DRONE_COMMIT_SHA:0:7}-staging,image.repository=795250896452.dkr.ecr.us-east-1.amazonaws.com/docs/${DRONE_REPO_NAME}-chat-server,ingress.enabled=true,ingress.hosts[0]=chat-server.docs.staging.corp.mongodb.com
values_files: ["chat-server/environments/staging.yml"]
api_server: https://api.staging.corp.mongodb.com
kubernetes_token:
from_secret: staging_kubernetes_token
when:
branch:
- main
---
depends_on: ["test-all"]
kind: pipeline
Expand Down Expand Up @@ -200,6 +204,8 @@ trigger:
- promote
target:
- staging
ref:
- refs/tags/chat-server-qa-*

steps:
# Deploys docker image associated with staging build that triggered promotion
Expand All @@ -216,9 +222,6 @@ steps:
api_server: https://api.staging.corp.mongodb.com
kubernetes_token:
from_secret: staging_kubernetes_token
when:
ref:
- refs/tags/chat-server-qa-*

---
depends_on: ["test-all"]
Expand Down Expand Up @@ -365,6 +368,12 @@ trigger:
- promote
target:
- staging
paths:
include:
- ingest/**/*
- chat-core/**/*
branch:
- main

steps:
# Deploys docker image associated with staging build that triggered promotion
Expand All @@ -382,8 +391,7 @@ steps:
api_server: https://api.staging.corp.mongodb.com
kubernetes_token:
from_secret: staging_kubernetes_token
branch:
- main

---
depends_on: ["test-all"]
kind: pipeline
Expand Down Expand Up @@ -463,3 +471,28 @@ steps:
api_server: https://api.prod.corp.mongodb.com
kubernetes_token:
from_secret: prod_kubernetes_token
# ---
# Chat UI
# ---
---
kind: pipeline
name: publish-chat-ui

trigger:
event:
- tag
ref:
include:
- refs/tags/chat-ui-v*

steps:
- name: npm
image: plugins/npm
settings:
folder: chat-ui
username:
from_secret: mongodb_chatbot_ui_npm_username
email:
from_secret: mongodb_chatbot_ui_npm_email
password:
from_secret: mongodb_chatbot_ui_npm_token
89 changes: 67 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,32 +6,77 @@ Repo holding resources related to the MongoDB AI Chatbot. The Chatbot uses the M

### Staging

Changes are added to staging environment when you merge into `main` branch. Handled by the CI.
We run a staging server that uses the latest commit on the `main` branch. When
you merge new commits into `main`, a CI/CD pipeline automatically builds and
publishes the updated staging server and demo site.

### Production
### QA Server & Demo Site

This project uses `release-it` to create production releases.
We run a QA server that serves a specific build for testing before we release to
production.

#### `chat-server` and `ingest`
To publish to QA:

For `chat-server` and `ingest`, both of which are published to a MongoDB server environment,
production releases are triggered by creating a git tag prefaced with the package name (e.g. `chat-server-v{version-number}`).
1. Check out the `qa` branch and pull any upstream changes. Here, `upstream` is
the name of the `mongodb/docs-chatbot` remote repo.

```sh
git fetch upstream
git checkout qa
git pull upstream qa
```

2. Apply any commits you want to build to the branch. In many cases you'll just
build from the same commits as `main`. However, you might want to QA only a
subset of commits from `main`.

3. Add a tag to the latest commit on the `qa` branch using the following naming scheme: `chat-server-qa-<Build ID>`

```
git tag chat-server-qa-0.0.42 -a
```

4. Push the branch to this upstream GitHub repo

```sh
git push upstream qa
```

Once you've added the tag to the upstream repo, the Drone CI automatically
builds and deploys the branch to the QA server.

### Production Deployments

We use a tool called `release-it` to prepare production releases for the
`chat-server`, `ingest`, and `chat-ui` projects.

Production releases are triggered by creating a git tag prefaced with the
package name (e.g. `chat-server-v{version-number}`).

To create a new production release:

1. Pull latest code you want to release (probably `git pull upstream main`).
1. Checkout a new branch for your release. The branch should have the following naming convention:
`package-name@semver-for-release` (e.g `git checkout -b [email protected]`).
1. In the relevant package directory (e.g `chat-server`), run the command: `npm run release`
1. This will get the package ready for release, including creating a draft Github release.
The URL for the release draft is present in the output of CLI operation.
You can use this later.
1. Create a pull request for the branch. Get it reviewed using the standard review process.
1. Once the PR is approved and merged, publish the release draft corresponding to the changes in the PR.
You can find the release draft in the draft tag: <https://github.com/mongodb/docs-chatbot/releases>.
1. When the release is published, the Drone CI will pick up the corresponding git tag,
and trigger a deploy from it.

#### `chat-ui`

TBD
1. Pull latest code you want to release.

```sh
git pull upstream main
```

2. In the relevant package directory (e.g `chat-server`) run the release
command. This gets the package ready for a new release.

```sh
npm run release
```

When prompted create a draft Github release. The URL for the release draft is
present in the output of CLI operation. You can use this later.

3. Create a pull request for the branch. Get it reviewed using the standard
review process.

4. Once the PR is approved and merged, publish the draft release. You can find
the release draft in the draft tag:
<https://github.com/mongodb/docs-chatbot/releases>.

When the release is published, the Drone CI picks up the corresponding git tag
and then automatically builds and deploys the branch to production.
13 changes: 10 additions & 3 deletions chat-core/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,13 @@
"name": "chat-core",
"version": "0.0.1",
"description": "Common elements used by the chatbot components.",
"author": "MongoDB, Inc.",
"license": "MIT",
"keywords": [],
"engines": {
"node": ">=18",
"npm": ">=8"
},
"main": "./build/index.js",
"scripts": {
"clean": "rm -rf build",
Expand All @@ -10,8 +17,6 @@
"test": "jest",
"docs": "npx typedoc --excludePrivate --exclude '**/*.test.ts' --out docs src"
},
"author": "MongoDB, Inc.",
"license": "MIT",
"devDependencies": {
"@babel/preset-typescript": "^7",
"@babel/types": "^7",
Expand All @@ -35,13 +40,15 @@
"typescript": "^5"
},
"dependencies": {
"@azure/openai": "^1.0.0-beta.2",
"@azure/openai": "^1.0.0-beta.3",
"common-tags": "^1",
"dotenv": "^16.3.1",
"exponential-backoff": "^3.1.1",
"front-matter": "^4.0.2",
"mongodb": "^5.6.0",
"openai": "^3",
"winston": "^3",
"yaml": "^2.3.1",
"zod": "^3.21.4"
}
}
1 change: 1 addition & 0 deletions chat-core/src/CoreEnvVars.ts
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,5 @@ export const CORE_ENV_VARS = {
OPENAI_CHAT_COMPLETION_DEPLOYMENT: "",
OPENAI_CHAT_COMPLETION_MODEL_VERSION: "",
VECTOR_SEARCH_INDEX_NAME: "",
NODE_ENV: "",
};
16 changes: 12 additions & 4 deletions chat-core/src/DatabaseConnection.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,9 @@ describe("DatabaseConnection", () => {
body: "foo",
format: "md",
sourceName: "source1",
tags: [],
metadata: {
tags: [],
},
updated: new Date(),
url: "/x/y/z",
};
Expand Down Expand Up @@ -113,7 +115,9 @@ describe("DatabaseConnection", () => {
body: "foo",
format: "md",
sourceName: "source1",
tags: [],
metadata: {
tags: [],
},
updated: new Date(),
url: "/x/y/z",
};
Expand Down Expand Up @@ -153,7 +157,9 @@ describe("DatabaseConnection", () => {
body: "The Matrix (1999) comes out",
format: "md",
sourceName: "",
tags: [],
metadata: {
tags: [],
},
updated: new Date("1999-03-31"),
url: "matrix1",
},
Expand All @@ -162,7 +168,9 @@ describe("DatabaseConnection", () => {
body: "The Matrix: Reloaded (2003) comes out",
format: "md",
sourceName: "",
tags: [],
metadata: {
tags: [],
},
updated: new Date("2003-05-15"),
url: "matrix2",
},
Expand Down
4 changes: 3 additions & 1 deletion chat-core/src/DatabaseConnection.ts
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,9 @@ export const makeDatabaseConnection = async ({
}: MakeDatabaseConnectionParams): Promise<
DatabaseConnection & PageStore & EmbeddedContentStore
> => {
const client = await new MongoClient(connectionUri).connect();
const client = await new MongoClient(connectionUri, {
serverSelectionTimeoutMS: 30000,
}).connect();
const db = client.db(databaseName);
const embeddedContentCollection =
db.collection<EmbeddedContent>("embedded_content");
Expand Down
7 changes: 4 additions & 3 deletions chat-core/src/EmbeddedContent.ts
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ export interface EmbeddedContent {
sourceName: string;

/**
The original text.
The text represented by the vector embedding.
*/
text: string;

Expand All @@ -32,9 +32,10 @@ export interface EmbeddedContent {
updated: Date;

/**
Arbitrary tags associated with the content.
Arbitrary metadata associated with the content. If the content text has
metadata in Front Matter format, this metadata should match that metadata.
*/
tags?: string[];
metadata?: { tags?: string[]; [k: string]: unknown };

/**
The order of the chunk if this content was chunked from a larger page.
Expand Down
24 changes: 21 additions & 3 deletions chat-core/src/Page.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,20 +3,38 @@
*/
export type Page = {
url: string;

/**
A human-readable title.
*/
title?: string;

/**
The text of the page.
*/
body: string;
format: "md" | "txt";

format: PageFormat;

/**
Data source name.
*/
sourceName: string;

/**
Arbitrary tags.
Arbitrary metadata for page.
*/
tags: string[];
metadata?: {
/**
Arbitrary tags.
*/
tags?: string[];
[k: string]: unknown;
};
};

export type PageFormat = "md" | "txt" | "openapi-yaml";

export type PageAction = "created" | "updated" | "deleted";

/**
Expand Down
2 changes: 2 additions & 0 deletions chat-core/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ export * from "./integrations/mongodb";
export * from "./integrations/openai";
export * from "./services/logger";
export * from "./services/conversations";
export * from "./updateFrontMatter";
export * from "./removeFrontMatter";

// Everyone share the same mongodb driver version
export * from "mongodb";
8 changes: 7 additions & 1 deletion chat-core/src/integrations/openai.ts
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,13 @@ export class OpenAiChatClient implements OpenAiChatClientInterface {
this.deployment = deployment;
this.openAiClient = new OpenAIClient(
basePath,
new AzureKeyCredential(apiKey)
new AzureKeyCredential(apiKey),
{
retryOptions: {
maxRetries: 2,
maxRetryDelayInMs: 5000,
},
}
);
}

Expand Down
Loading

0 comments on commit 569a9c6

Please sign in to comment.