Skip to content
This repository was archived by the owner on Jun 6, 2024. It is now read-only.

Commit 0ef4e25

Browse files
authored
Merge pull request #21 from cmu-db/final-presentation
Final presentation
2 parents 3e51cab + 7d9a9b1 commit 0ef4e25

21 files changed

+1171
-1
lines changed

proposal/15721-project.json

+21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
{
2+
"info": {
3+
"title": "Eggstrain & Async Buffer Pool Manager",
4+
"github": "https://github.com/cmu-db/15721-s24-ee1",
5+
"description": "An asynchronous vectorized push-based execution and asynchronous buffer pool manager written in Rust.",
6+
"students": [
7+
{
8+
"name": "Kyle Booker",
9+
"url": "https://www.linkedin.com/in/ktbooker/"
10+
},
11+
{
12+
"name": "Sarvesh Tandon",
13+
"url": "https://www.linkedin.com/in/sarvesh-tandon/"
14+
},
15+
{
16+
"name": "Connor Tsui",
17+
"url": "https://www.linkedin.com/in/connortsui/"
18+
}
19+
]
20+
}
21+
}

proposal/final-presentation.html

+247
Large diffs are not rendered by default.

proposal/final-presentation.md

+289
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,289 @@
1+
---
2+
marp: true
3+
theme: default
4+
#class: invert # Remove this line for light mode
5+
paginate: true
6+
---
7+
8+
# Eggstrain
9+
10+
Vectorized Push-Based inspired Execution Engine
11+
Asynchronous Buffer Pool Manager
12+
13+
<br>
14+
15+
## **Authors: Connor, Sarvesh, Kyle**
16+
17+
---
18+
19+
# Original Proposed Goals
20+
21+
- 75%: First 7 operators working + integration with other components
22+
- 100%: All operators listed above working
23+
- 125%: TPC-H benchmark working
24+
25+
---
26+
27+
# Design Goals
28+
29+
- Robustness
30+
- Modularity
31+
- Extensibility
32+
- Forward Compatibility
33+
34+
We made heavy use of `tokio` and `rayon` in our implementation.
35+
36+
---
37+
38+
# Refresher on Architecture
39+
40+
![bg right:60% 100%](./images/architecture.drawio.svg)
41+
42+
---
43+
44+
# Refresher on operators
45+
46+
- `TableScan`
47+
- `Filter`
48+
- `Projection`
49+
- `HashAggregation`
50+
- `HashJoin` (`HashProbe` + `HashBuild`)
51+
- `OrderBy`
52+
- `TopN`
53+
54+
---
55+
56+
# Example Operator Workflow
57+
58+
![bg right:70% 80%](./images/hashjoin.svg)
59+
60+
---
61+
62+
# Progress Towards Goals
63+
64+
- 100%: All operators implemented, excluding `HashJoin`
65+
- 125%: TPC-H benchmark working for Q1
66+
67+
---
68+
69+
# Execution Engine Benchmarks
70+
71+
Hardware:
72+
73+
- M1 Pro, 8 cores, 16GB RAM
74+
75+
---
76+
77+
![bg 90%](./images/csvreader.png)
78+
79+
---
80+
81+
# Correctness Testing and Code Quality Assessment
82+
83+
We tested correctness by comparing our results to the results of the same queries run in DataFusion.
84+
85+
Our code quality is high with respect to documentation, integration tests, and code review.
86+
87+
However, we lack unit tests for each operator. We instead tested operators integrated inside of queries.
88+
89+
---
90+
91+
# Problem: In Memory?
92+
93+
We found that we needed to spill data to disk to handle large queries.
94+
95+
However, to take advantage of our asynchronous architecture, we needed to implement an **asynchronous buffer pool manager.**
96+
97+
98+
---
99+
100+
# Recap: Buffer Pool Manager
101+
102+
A buffer pool manager manages synchronizing data between volatile memory and persistent storage.
103+
104+
* In charge of bringing data from storage into memory in the form of pages
105+
* In charge of synchronizing reads and writes to the memory-local page data
106+
* In charge of writing data back out to disk so it is synchronized
107+
108+
---
109+
110+
# Traditional Buffer Pool Manager
111+
112+
![bg right:50% 100%](images/traditional_bpm.png)
113+
114+
Traditional BPMs will use a global hash table that maps page IDs to memory frames.
115+
116+
* Source: _LeanStore: In-Memory Data Management Beyond Main Memory (2018)_
117+
118+
---
119+
120+
# Recap: Blocking I/O
121+
122+
Additionally, traditional buffer pool managers will use blocking reads and writes to send data between memory and persistent storage.
123+
124+
Blocking I/O is heavily reliant on the Operating System.
125+
126+
> The DBMS can almost always manage memory better than the OS
127+
128+
* Source: 15-445 Lecture 6 on Buffer Pools
129+
130+
---
131+
132+
# Recap: I/O System Calls
133+
134+
What happens when we issue a `pread()` or `pwrite()` call?
135+
136+
* We stop what we're doing
137+
* We transfer control to the kernel
138+
* _We are blocked waiting for the kernel to finish and transfer control back_
139+
* _A read from disk is *probably* scheduled somewhere_
140+
* _Something gets copied into the kernel_
141+
* _The kernel copies that something into userspace_
142+
* We come back and resume execution
143+
144+
---
145+
146+
# Blocking I/O for Buffer Pool Managers
147+
148+
Blocking I/O is fine for most situations, but might be a bottleneck for a DBMS's Buffer Pool Manager.
149+
150+
- Typically optimizations are implemented to offset the cost of blocking:
151+
- Pre-fetching
152+
- Scan-sharing
153+
- Background writing
154+
- `O_DIRECT`
155+
156+
---
157+
158+
# Non-blocking I/O
159+
160+
What if we could do I/O _without_ blocking? There exist a few ways to do this:
161+
162+
- `libaio`
163+
- `io_uring`
164+
- SPDK
165+
- All of these allow for _asynchronous I/O_
166+
167+
---
168+
169+
# `io_uring`
170+
171+
![bg right:50% 90%](images/linux_io.png)
172+
173+
This Buffer Pool Manager is going to be built with asynchronous I/O using `io_uring`.
174+
175+
* Source: _What Modern NVMe Storage Can Do, And How To Exploit It... (2023)_
176+
177+
---
178+
179+
# Asynchronous I/O
180+
181+
Asynchronous I/O really only works when the programs running on top of it implement _cooperative multitasking_.
182+
183+
* Normally, the kernel gets to decide what thread gets to run
184+
* Cooperative multitasking allows the program to decide who gets to run
185+
* Context switching between tasks is a _much more_ lightweight maneuver
186+
* If one task is waiting for I/O, we can cheaply switch to a different task!
187+
188+
---
189+
190+
# Eggstrain
191+
192+
The key thing here is that our Execution Engine `eggstrain` fully embraces asynchronous execution.
193+
194+
* Rust has first-class support for asynchronous programs
195+
* Using `async` libraries is almost as simple as plug-and-play
196+
* The `tokio` crate is an easy runtime to get set up
197+
* We can easily create a buffer pool manager in the form of a Rust library crate
198+
199+
---
200+
201+
# Goals
202+
203+
The goal of this system is to _fully exploit parallelism_.
204+
205+
* NVMe drives have gotten really, really fast
206+
* Blocking I/O simply cannot match the full throughput of an NVMe drive
207+
* They are _completely_ bottle-necked by today's software
208+
* If we can fully exploit parallelism in software _and_ hardware...
209+
* **We can actually get close to matching the speed of in-memory systems, _while using persistent storage_**
210+
211+
---
212+
213+
![bg 60%](images/modern_storage.png)
214+
215+
---
216+
217+
# Proposed Design
218+
219+
The next slide has a proposed design for a fully asynchronous buffer pool manager. The full (somewhat incomplete) writeup can be found [here](https://github.com/Connortsui20/async-bpm).
220+
221+
- Heavily inspired by LeanStore
222+
- Eliminates the global page table and uses tagged pointers to data
223+
- Even more inspired by this paper:
224+
- _What Modern NVMe Storage Can Do, And How To Exploit It: High-Performance I/O for High-Performance Storage Engines (2023)_
225+
- Gabriel Haas and Viktor Leis
226+
- The goal is to _eliminate as many sources of global contention as possible_
227+
228+
---
229+
230+
![bg 90%](images/bpm_design.png)
231+
232+
---
233+
234+
# BPM Benchmarks
235+
236+
Hardware:
237+
238+
* Cray/Appro GB512X - 32 Threads Xeon E5-2670 @ 2.60GHz, 64 GiB DDR3 RAM, 1x 240GB SSD, Gigabit Ethernet, QLogic QDR Infiniband
239+
* We will benchmark against RocksDB as a buffer pool manager
240+
241+
---
242+
243+
![bg 90%](./images/zip1.1dist.png)
244+
245+
---
246+
247+
![bg 90%](./images/20w80r.png)
248+
249+
<!--
250+
zipfian distribution, alpha = 1.01 -->
251+
252+
---
253+
254+
![bg 90%](./images/80w20r.png)
255+
256+
---
257+
258+
![bg 90%](./images/uniform20w80r.png)
259+
260+
---
261+
262+
![bg 90%](./images/uniform80w20r.png)
263+
264+
---
265+
266+
![bg 90%](./images/uniform5050.png)
267+
268+
<!-- zipfian distribution, alpha = 1.01 -->
269+
270+
<!-- ---
271+
272+
![bg 90%](./images/zip1.1.png)
273+
274+
zipfian distribution, alpha = 1.1 -->
275+
276+
<!-- ---
277+
278+
![bg 90%](./images/zip1.2.png)
279+
zipfian distribution, alpha = 1.2 -->
280+
281+
---
282+
283+
# Future Work
284+
285+
- Asynchronous BPM ergonomics and API
286+
- Proper `io_uring` polling and batch evictions
287+
- Shared user/kernel buffers and file descriptors (avoiding `memcpy`)
288+
- Multiple NVMe SSD support (Software-implemented RAID 0)
289+
- Optimistic hybrid latches

proposal/final-presentation.pdf

1.04 MB
Binary file not shown.

0 commit comments

Comments
 (0)