Skip to content

Commit 65bb368

Browse files
committed
new doc
1 parent c288403 commit 65bb368

File tree

3 files changed

+50
-0
lines changed

3 files changed

+50
-0
lines changed

202309/20230928_01.md

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
## TimescaleDB 发布基于DiskANN的增强向量索引
2+
3+
### 作者
4+
digoal
5+
6+
### 日期
7+
2023-09-28
8+
9+
### 标签
10+
PostgreSQL , PolarDB , embedding , diskann
11+
12+
----
13+
14+
## 背景
15+
https://www.timescale.com/blog/how-we-made-postgresql-the-best-vector-database/
16+
17+
Introducing Timescale Vector, PostgreSQL++ for production AI applications. Timescale Vector enhances pgvector with faster search, higher recall, and more efficient time-based filtering, making PostgreSQL your new go-to vector database. Timescale Vector is available today in early access on Timescale’s cloud data platform. Keep reading to learn why and how we built it. Then take it out for a ride: try Timescale Vector for free today, with a 90-day extended trial.
18+
19+
https://www.microsoft.com/en-us/research/project/project-akupara-approximate-nearest-neighbor-search-for-large-scale-semantic-search/
20+
21+
https://github.com/Microsoft/DiskANN
22+
23+
Deep Learning-based embeddings are used widely for “dense retrieval” in information retrieval, computer vision, NLP, amongst others, owing to capture diverse types of semantic information. This paradigm constructs embeddings so that semantically similar items are closer in a high dimensional metric space. The first step to enabling search and recommendation with such embeddings is to index the embeddings of the corpus and support approximate nearest-neighbor search (ANNS) a.k.a. Vector Search for query embeddings. While ANNS is a fundamental problem has been studied for decades, existing algorithms suffer from two main drawbacks: either their search accuracies are low, thereby affecting the quality of results downstream, or their memory (DRAM) footprint is enormous, making it hard to serve them at web scale.
24+
25+
In this project, we are designing algorithms to address the challenges of scaling ANNS for web and enterprise search and recommendation systems. Our goal is to build systems that serve trillions of points in a streaming setting cost effectively. Below is a summary of the associated research directions:
26+
27+
DiskANN:(opens in new tab) an ANNS algorithm which can achieve both high accuracy as well as low DRAM footprint, by suitably using auxilliary SSD storage, which is significantly more cost-effective than DRAM. Using DiskANN, we can index 5-10X more points per machine than the state-of-the-art DRAM-based solutions: e.g., DiskANN can index upto a billion vectors while achieving 95% search accuracy with 5ms latencies, while existing DRAM-based algorithms peak at 100-200M points for similar latency and accuracy.
28+
29+
号称可以轻松支持10亿级别向量, 索引相比pgvector hnsw占用空间小至十分之一, 性能略优于pgvector hnsw, build时间比pgvector略快. 当前仅支持timescaledb cloud版本体验.
30+
31+
32+
#### [期望 PostgreSQL|开源PolarDB 增加什么功能?](https://github.com/digoal/blog/issues/76 "269ac3d1c492e938c0191101c7238216")
33+
34+
35+
#### [PolarDB 云原生分布式开源数据库](https://github.com/ApsaraDB "57258f76c37864c6e6d23383d05714ea")
36+
37+
38+
#### [PolarDB 学习图谱: 训练营、培训认证、在线互动实验、解决方案、内核开发公开课、生态合作、写心得拿奖品](https://www.aliyun.com/database/openpolardb/activity "8642f60e04ed0c814bf9cb9677976bd4")
39+
40+
41+
#### [PostgreSQL 解决方案集合](../201706/20170601_02.md "40cff096e9ed7122c512b35d8561d9c8")
42+
43+
44+
#### [德哥 / digoal's github - 公益是一辈子的事.](https://github.com/digoal/blog/blob/master/README.md "22709685feb7cab07d30f30387f0a9ae")
45+
46+
47+
![digoal's wechat](../pic/digoal_weixin.jpg "f7ad92eeba24523fd47a6e1a0e691b59")
48+

202309/readme.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22

33
### 文章列表
44
----
5+
##### 20230928_01.md [《TimescaleDB 发布基于DiskANN的增强向量索引》](20230928_01.md)
56
##### 20230927_01.md [《DuckDB 发布新版本 0.9.0》](20230927_01.md)
67
##### 20230926_01.md [《PostgreSQL 17 preview - Add GUC: event_triggers . for temporarily disabling event triggers》](20230926_01.md)
78
##### 20230924_01.md [《使用LFS 存储git大文件, 下载存储在LFS的大文件》](20230924_01.md)

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -95,6 +95,7 @@ digoal's|PostgreSQL|文章|归类
9595

9696
### 所有文档如下
9797
----
98+
##### 202309/20230928_01.md [《TimescaleDB 发布基于DiskANN的增强向量索引》](202309/20230928_01.md)
9899
##### 202309/20230927_01.md [《DuckDB 发布新版本 0.9.0》](202309/20230927_01.md)
99100
##### 202309/20230926_01.md [《PostgreSQL 17 preview - Add GUC: event_triggers . for temporarily disabling event triggers》](202309/20230926_01.md)
100101
##### 202309/20230924_01.md [《使用LFS 存储git大文件, 下载存储在LFS的大文件》](202309/20230924_01.md)

0 commit comments

Comments
 (0)