|
| 1 | +## 聊一聊双十一背后的技术 - 不一样的秒杀技术, 裸秒 |
| 2 | + |
| 3 | +### 作者 |
| 4 | +digoal |
| 5 | + |
| 6 | +### 日期 |
| 7 | +2016-11-17 |
| 8 | + |
| 9 | +### 标签 |
| 10 | +PostgreSQL , 秒杀 , 裸秒 , ad lock |
| 11 | + |
| 12 | +---- |
| 13 | + |
| 14 | +### 双十一背后的技术系列文章 |
| 15 | +[《聊一聊双十一背后的技术 - 物流, 动态路径规划》](https://yq.aliyun.com/articles/57857) |
| 16 | + |
| 17 | +[《聊一聊双十一背后的技术 - 分词和搜索》](https://yq.aliyun.com/articles/64240) |
| 18 | + |
| 19 | +[《聊一聊双十一背后的技术 - 不一样的秒杀技术, 裸秒》](https://yq.aliyun.com/articles/64351) |
| 20 | + |
| 21 | +## 背景 |
| 22 | +秒杀在商品交易中是一个永恒的话题,从双十一,到一票难求,比的仅仅是手快吗? |
| 23 | + |
| 24 | +其实对于交易平台来说,面对的不仅仅是人肉,还有很多脚本,外挂自动化的抢购系统,压力可想而知。 |
| 25 | + |
| 26 | +秒杀的优化手段很多,就拿数据库来说,有用排队机制的,有用异步消息的,有用交易合并的。 |
| 27 | + |
| 28 | +今天我要给大家介绍一种更极端的秒杀应对方法,裸秒。 |
| 29 | + |
| 30 | +(其实我很久以前就写过类似的文章,趁双十一跟大伙再练练) |
| 31 | + |
| 32 | +目前可能只有PostgreSQL可以做到裸秒,也即是说,来吧,强奸我吧,一起上。 有点淫荡,但是确实就是这么暴力。 |
| 33 | + |
| 34 | +PostgreSQL提供了一种ad lock,可以让用户尽情的释放激情,以一台32核64线程的机器为例,每秒可以获取、探测约130万次的ad lock。 |
| 35 | + |
| 36 | +试想一下,对单条记录的秒杀操作,达到了单机100万/s的处理能力后,秒杀算什么?100台机器就能处理1亿/s的秒杀请求,不行我的小心脏受不了了,下面听我娓娓道来。 |
| 37 | + |
| 38 | +## 秒杀场景简介 |
| 39 | +虽然秒杀已经很普遍了,但是出于文章的完整性,还是简单介绍一下秒杀的业务背景。 |
| 40 | + |
| 41 | +例如,Iphone的1元秒杀,如果我只放出1台Iphone,我们把它看成一条记录,秒杀开始后,谁先抢到(更新这条记录的锁),谁就算秒杀成功。 |
| 42 | + |
| 43 | +对数据库来说,秒杀瓶颈在于并发的对同一条记录的多次更新请求,只有一个或者少量请求是成功的,其他请求是以失败或更新不到记录而告终。 |
| 44 | + |
| 45 | +例如有100台IPHONE参与秒杀,并发来抢的用户有100万,对于数据库来说,最小粒度的为行锁,当有一个用户在更新这条记录时,其他的999999个用户是在等待中度过的,以此类推。 |
| 46 | + |
| 47 | +除了那100个幸运儿,其他的用户的等待都是无谓的,甚至它们不应该到数据库中来浪费资源。 |
| 48 | + |
| 49 | +传统的做法,使用一个标记位来表示这条记录是否已经被更新,或者记录更新的次数(几台Iphone)。 |
| 50 | + |
| 51 | +``` |
| 52 | +update tbl set xxx=xxx,upd_cnt=upd_cnt+1 where id=pk and upd_cnt+1<=5; -- 假设可以秒杀5台 |
| 53 | +``` |
| 54 | + |
| 55 | +这种方法的弊端: |
| 56 | + |
| 57 | +获得锁的用户在处理这条记录时,可能成功,也可能失败,或者可能需要很长时间,(例如数据库响应慢)在它结束事务前,其他会话只能等着。 |
| 58 | + |
| 59 | +等待是非常不科学的,因为对于没有获得锁的用户,等待是在浪费时间。 |
| 60 | + |
| 61 | +## 常用的秒杀优化手段 |
| 62 | +1\. 一般的优化处理方法是先使用for update nowait的方式来避免等待,即如果无法即可获得锁,那么就不等待。 |
| 63 | + |
| 64 | +``` |
| 65 | +begin; |
| 66 | +select 1 from tbl where id=pk for update nowait; -- 如果用户无法即刻获得锁,则返回错误。从而这个事务回滚。 |
| 67 | +update tbl set xxx=xxx,upd_cnt=upd_cnt+1 where id=pk and upd_cnt+1<=5; |
| 68 | +end; |
| 69 | +``` |
| 70 | + |
| 71 | +这种方法可以减少用户的等待时间,因为无法即刻获得锁后就直接返回了。 |
| 72 | + |
| 73 | +2\. 合并请求,即将多个更新合并到一个更新的请求,这种做法需要修改内核,同时会破坏ACID,因为如果合并后的请求失败了,会导致合并中的所有人的请求失败。(与分组提交不一样,分组提交是不会破坏ACID的)。 |
| 74 | + |
| 75 | +那么接下来我们看看AD LOCK。 |
| 76 | + |
| 77 | +## 什么是ad lock |
| 78 | +手册中的说明,AD LOCK是一种面向用户的轻量级锁,锁的目标是一个整型,分为事务级和会话级的锁,以及共享和排他锁。 |
| 79 | + |
| 80 | +在单个DB内,只要锁的整型值不一样,就可以获得锁,如果值一样,可以使用TRY来加锁,没有获得则立即返回FALSE。 |
| 81 | + |
| 82 | +https://www.postgresql.org/docs/current/static/functions-admin.html#FUNCTIONS-ADVISORY-LOCKS |
| 83 | + |
| 84 | +Table 9-87. Advisory Lock Functions |
| 85 | + |
| 86 | +Name| Return Type| Description |
| 87 | +---|---|--- |
| 88 | +pg_advisory_lock(key bigint)| void| Obtain exclusive session level advisory lock |
| 89 | +pg_advisory_lock(key1 int, key2 int)| void| Obtain exclusive session level advisory lock |
| 90 | +pg_advisory_lock_shared(key bigint)| void| Obtain shared session level advisory lock |
| 91 | +pg_advisory_lock_shared(key1 int, key2 int)| void| Obtain shared session level advisory lock |
| 92 | +pg_advisory_unlock(key bigint)| boolean| Release an exclusive session level advisory lock |
| 93 | +pg_advisory_unlock(key1 int, key2 int)| boolean| Release an exclusive session level advisory lock |
| 94 | +pg_advisory_unlock_all()| void Release| all session level advisory locks held by the current session |
| 95 | +pg_advisory_unlock_shared(key bigint)| boolean| Release a shared session level advisory lock |
| 96 | +pg_advisory_unlock_shared(key1 int, key2 int)| boolean| Release a shared session level advisory lock |
| 97 | +pg_advisory_xact_lock(key bigint)| void| Obtain exclusive transaction level advisory lock |
| 98 | +pg_advisory_xact_lock(key1 int, key2 int)| void| Obtain exclusive transaction level advisory lock |
| 99 | +pg_advisory_xact_lock_shared(key bigint)| void| Obtain shared transaction level advisory lock |
| 100 | +pg_advisory_xact_lock_shared(key1 int, key2 int)| void| Obtain shared transaction level advisory lock |
| 101 | +pg_try_advisory_lock(key bigint)| boolean| Obtain exclusive session level advisory lock if available |
| 102 | +pg_try_advisory_lock(key1 int, key2 int)| boolean| Obtain exclusive session level advisory lock if available |
| 103 | +pg_try_advisory_lock_shared(key bigint)| boolean| Obtain shared session level advisory lock if available |
| 104 | +pg_try_advisory_lock_shared(key1 int, key2 int)| boolean| Obtain shared session level advisory lock if available |
| 105 | +pg_try_advisory_xact_lock(key bigint)| boolean| Obtain exclusive transaction level advisory lock if available |
| 106 | +pg_try_advisory_xact_lock(key1 int, key2 int)| boolean| Obtain exclusive transaction level advisory lock if available |
| 107 | +pg_try_advisory_xact_lock_shared(key bigint)| boolean| Obtain shared transaction level advisory lock if available |
| 108 | +pg_try_advisory_xact_lock_shared(key1 int, key2 int)| boolean| Obtain shared transaction level advisory lock if available |
| 109 | + |
| 110 | +通常数据库支持的最小粒度的锁(指开放给用户的)是行锁,行锁相比LWLOCK,SPINLOCK等是非常重的,所以传统的行锁在秒杀中会成为非常大的瓶颈,包括锁的等待。 |
| 111 | + |
| 112 | +## ad lock的用途 |
| 113 | +ad lock的用途,除了我接下来要说的秒杀,其实还有很多用途,例如 |
| 114 | + |
| 115 | +并发的安全性检查, |
| 116 | + |
| 117 | +递归调用中用于UPSERT的场景, |
| 118 | + |
| 119 | +业务逻辑设计中用来确保原子操作等。 |
| 120 | + |
| 121 | +## ad lock的性能指标 |
| 122 | +因为AD LOCK很轻量化,不需要访问数据,不需要执行冗长的代码,所以很高效。 |
| 123 | + |
| 124 | +32核64线程机器测试可以达到131万次/s的锁请求。 |
| 125 | + |
| 126 | +``` |
| 127 | +vi test.sql |
| 128 | +\set id random(1,100000000) |
| 129 | +select pg_try_advisory_xact_lock(:id); |
| 130 | +
|
| 131 | +pgbench -M prepared -n -r -P 1 -f ./test.sql -c 96 -j 96 -T 100 |
| 132 | +
|
| 133 | +transaction type: ./test.sql |
| 134 | +scaling factor: 1 |
| 135 | +query mode: prepared |
| 136 | +number of clients: 96 |
| 137 | +number of threads: 96 |
| 138 | +duration: 100 s |
| 139 | +number of transactions actually processed: 131516823 |
| 140 | +latency average = 0.072 ms |
| 141 | +latency stddev = 0.070 ms |
| 142 | +tps = 1314529.211060 (including connections establishing) |
| 143 | +tps = 1315395.309707 (excluding connections establishing) |
| 144 | +script statistics: |
| 145 | + - statement latencies in milliseconds: |
| 146 | + 0.001 \set id random(1,100000000) |
| 147 | + 0.074 select pg_try_advisory_xact_lock(:id); |
| 148 | +``` |
| 149 | + |
| 150 | +## ad lock用于秒杀的例子 |
| 151 | +在数据库中,商品通常有唯一ID,我们可以对这个ID加锁,(当然,如果对不同的表这个ID有重叠的可能,我们可以加偏移量或者其他的手段来达到无冲突)。 |
| 152 | + |
| 153 | +加锁成功才会去对行加锁,执行更新,这样就能规避掉无效的行锁等待,以及冗长的查询代码。 |
| 154 | + |
| 155 | +使用 AD LOCK 对单条记录的并发更新处理QPS可以达到39.1万/s,被秒杀的商品很快就会变成售罄状态,不会再浪费数据库的资源。 |
| 156 | + |
| 157 | +``` |
| 158 | +create table test(id int primary key, crt_time timestamp); |
| 159 | +insert into test values (1); |
| 160 | +``` |
| 161 | + |
| 162 | +``` |
| 163 | +vi test.sql |
| 164 | +update test set crt_time=now() where id=1 and pg_try_advisory_xact_lock(1); |
| 165 | +
|
| 166 | +pgbench -M prepared -n -r -P 1 -f ./test.sql -c 64 -j 64 -T 100 |
| 167 | +
|
| 168 | +transaction type: ./test.sql |
| 169 | +scaling factor: 1 |
| 170 | +query mode: prepared |
| 171 | +number of clients: 64 |
| 172 | +number of threads: 64 |
| 173 | +duration: 100 s |
| 174 | +number of transactions actually processed: 39104368 |
| 175 | +latency average = 0.163 ms |
| 176 | +latency stddev = 0.216 ms |
| 177 | +tps = 391012.743072 (including connections establishing) |
| 178 | +tps = 391175.983419 (excluding connections establishing) |
| 179 | +script statistics: |
| 180 | + - statement latencies in milliseconds: |
| 181 | + 0.163 update test set crt_time=now() where id=1 and pg_try_advisory_xact_lock(1); |
| 182 | +``` |
| 183 | + |
| 184 | +此时数据库主机还有66.2%的空闲CPU资源可用使用。 |
| 185 | +``` |
| 186 | +top - 13:12:43 up 51 days, 18:41, 2 users, load average: 1.12, 0.97, 0.78 |
| 187 | +Tasks: 1463 total, 28 running, 1435 sleeping, 0 stopped, 0 zombie |
| 188 | +Cpu(s): 24.5%us, 9.3%sy, 0.0%ni, 66.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st |
| 189 | +Mem: 529321832k total, 235226420k used, 294095412k free, 903076k buffers |
| 190 | +Swap: 0k total, 0k used, 0k free, 62067636k cached |
| 191 | +``` |
| 192 | + |
| 193 | +### 对比传统的例子 |
| 194 | +传统的消除等待的做法是这样的,通过select for update nowait。 |
| 195 | + |
| 196 | +``` |
| 197 | +begin; |
| 198 | +select 1 from tbl where id=pk for update nowait; -- 如果用户无法即刻获得锁,则返回错误。从而这个事务回滚。 |
| 199 | +update tbl set xxx=xxx,upd_cnt=upd_cnt+1 where id=pk and upd_cnt+1<=5; |
| 200 | +end; |
| 201 | +``` |
| 202 | + |
| 203 | +在PG中,可以使用do语句,把以上合成到一个块里面操作。 |
| 204 | + |
| 205 | +使用传统的方法,每秒可以处理8.6万。 |
| 206 | + |
| 207 | +``` |
| 208 | +vi test.sql |
| 209 | +do language plpgsql $$ declare begin with t as (select * from test where id=1 for update nowait) update test set crt_time=now() from t where t.id=test.id; exception when others then return; end; $$; |
| 210 | +
|
| 211 | +pgbench -M prepared -n -r -P 1 -f ./test.sql -c 64 -j 64 -T 100 |
| 212 | +
|
| 213 | +transaction type: ./test.sql |
| 214 | +scaling factor: 1 |
| 215 | +query mode: prepared |
| 216 | +number of clients: 64 |
| 217 | +number of threads: 64 |
| 218 | +duration: 100 s |
| 219 | +number of transactions actually processed: 8591222 |
| 220 | +latency average = 0.744 ms |
| 221 | +latency stddev = 0.713 ms |
| 222 | +tps = 85888.823884 (including connections establishing) |
| 223 | +tps = 85924.666940 (excluding connections establishing) |
| 224 | +script statistics: |
| 225 | + - statement latencies in milliseconds: |
| 226 | + 0.744 do language plpgsql $$ declare begin with t as (select * from test where id=1 for update nowait) update test set crt_time=now() from t where t.id=test.id; exception when others then return; end; $$; |
| 227 | +``` |
| 228 | + |
| 229 | +CPU剩余54.5% |
| 230 | + |
| 231 | +``` |
| 232 | +top - 13:13:48 up 51 days, 18:42, 2 users, load average: 8.14, 2.69, 1.37 |
| 233 | +Tasks: 1464 total, 21 running, 1442 sleeping, 0 stopped, 1 zombie |
| 234 | +Cpu(s): 41.7%us, 3.8%sy, 0.0%ni, 54.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st |
| 235 | +Mem: 529321832k total, 235256052k used, 294065780k free, 903176k buffers |
| 236 | +Swap: 0k total, 0k used, 0k free, 62068308k cached |
| 237 | +``` |
| 238 | + |
| 239 | +## ad lock相比其他秒杀优化的优势 |
| 240 | + |
| 241 | + |
| 242 | +使用AD LOCK可以使得CPU开销最小化,等待最小化,从本文的测试CASE来看,单条记录的更新可以达到39.1万/s。 |
| 243 | + |
| 244 | +传统的手段只能达到8.6万/s。 |
| 245 | + |
| 246 | +使用AD LOCK不破坏ACID,单个请求单个事务,不影响其他的事务。 |
| 247 | + |
| 248 | +合并优化,本质上是破坏了ACID的,如果合并失败,会导致所有相关的请求失败。 |
| 249 | + |
| 250 | +如果你对PG感兴趣,可以再了解一下 |
| 251 | + |
| 252 | +[《德哥的PostgreSQL私房菜 - 史上最屌PG资料合集》](https://yq.aliyun.com/articles/59251) |
| 253 | + |
| 254 | +[Count](http://info.flagcounter.com/h9V1) |
| 255 | + |
0 commit comments