-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.html
executable file
·742 lines (713 loc) · 37.5 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
<!DOCTYPE HTML>
<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Ameer Haj-Ali</title>
<meta name="author" content="Ameer Haj-Ali">
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="shortcut icon" href="images/favicon/favicon.ico" type="image/x-icon">
<link rel="stylesheet" type="text/css" href="stylesheet.css">
</head>
<body>
<table style="width:100%;max-width:800px;border:0px;border-spacing:0px;border-collapse:separate;margin-right:auto;margin-left:auto;"><tbody>
<tr style="padding:0px">
<td style="padding:0px">
<table style="width:100%;border:0px;border-spacing:0px;border-collapse:separate;margin-right:auto;margin-left:auto;"><tbody>
<tr style="padding:0px">
<td style="padding:2.5%;width:63%;vertical-align:middle">
<p class="name" style="text-align: center;">
Ameer Haj-Ali
</p>
<p> <b>I am currently exploring starting a company in the AI space</b>.
<br>I love connecting with smart people. Please feel free to reach out!
</p>
<p>
<a href="https://www.anyscale.com/blog/cloud-infrastructure-for-llm-and-generative-ai-applications"></a>
Previously, I was part of the founding team of <a href="https://www.anyscale.com">Anyscale</a> where I helped grow the company from 0 to 150 and headed Platform, Infrastructure, and Gen AI organizations and the major releases, such as <a href="https://www.anyscale.com/blog/cloud-infrastructure-for-llm-and-generative-ai-applications">Multi-Cloud Infrastructure</a>, <a href="https://www.youtube.com/watch?v=Q1t9qeDJquI">General Availability</a>, <a href="https://www.youtube.com/watch?v=r-NYSeAXCko&t=1048s">LLM Endpoints</a>.
</p>
<p>
I completed my CS PhD in <a href="https://www.youtube.com/watch?v=6P1ldaiX20g">2 years</a> (the fastest in the university) at UC Berkeley in AI and System working with Professors <a href="https://people.eecs.berkeley.edu/~istoica/">Ion Stoica</a> and <a href="https://people.eecs.berkeley.edu/~krste/">Krste Asanovic</a>. I received the valedictorian honor from the Technion twice for my <a href="https://www.youtube.com/watch?v=r1YwUp-PA9M&t=21s">M.Sc.</a> and <a href="https://www.youtube.com/watch?v=IvArpBPUIhM&t=1s">B.Sc.</a>
</p>
<p style="text-align:center">
<a href="mailto:[email protected]">Email</a> /
<a href="data/AmeerHajAli-CV.pdf">CV</a> /
<a href="data/AmeerHajAli-bio.txt">Bio</a> /
<a href="https://scholar.google.co.il/citations?user=jJBqJxwAAAAJ&hl=en">Scholar</a> /
<a href="https://www.linkedin.com/in/ameer-haj-ali/">LinkedIn</a> /
<a href="https://twitter.com/aha_ml">Twitter</a> /
<a href="https://github.com/AmeerHajAli/">Github</a>
</p>
</td>
<td style="padding:2.5%;width:40%;max-width:40%">
<a href="images/hajali_ameer.jpg"><img style="width:100%;max-width:100%;object-fit: cover; border-radius: 50%;" alt="profile photo" src="images/hajali_ameer.jpg" class="hoverZoomLink"></a>
</td>
</tr>
</tbody></table>
<table style="width:100%;border:0px;border-spacing:0px;border-collapse:separate;margin-right:auto;margin-left:auto;"><tbody>
<tr>
<td style="padding:20px;width:100%;vertical-align:middle">
<h1>Professional Experience</h1>
<br>
I thrive in building and conducting lean, fast executing, high-performing teams to achieve ambitious goals.
</td>
</tr>
</tbody></table>
<table style="width:100%;border:0px;border-spacing:0px;border-collapse:separate;margin-right:auto;margin-left:auto;"><tbody>
<tr>
<td style="padding:20px;width:25%;vertical-align:middle">
<div class="zero">
<img src='images/anyscale.png' width=100%>
</div>
</td>
<td style="padding:20px;width:75%;vertical-align:middle">
<b>Founding Engineer. Head of Platform, Infrastructure & Endpoints Engineering, 2019-2024.</b>
<br>
<p> Helped grow the company from 0 to 150. Throughout my career at Anyscale, I built teams responsible for <a href="https://docs.ray.io/en/latest/serve/index.html">Ray Serve/Inference</a>, <a href="https://docs.ray.io/en/latest/cluster/getting-started.html">Ray Autoscaler and Cluster</a> (video <a href="https://www.youtube.com/watch?v=BJ06eJasdu4">here</a>), <a href="https://docs.ray.io/en/latest/cluster/running-applications/job-submission/ray-client.html">Ray Client</a>, <a href="https://docs.anyscale.com/services/get-started">Anyscale Services</a>, <a href="https://www.youtube.com/watch?v=r-NYSeAXCko&t=1048s">Gen AI</a>, <a href="https://www.anyscale.com/blog/cloud-infrastructure-for-llm-and-generative-ai-applications">Multi-cloud Infrastructure</a>, <a href="https://ray-project.github.io/kuberay/">KubeRay</a>, and proprietary <a href="https://github.com/vllm-project/vllm">vLLM</a>. </p>
</td>
</tr>
<tr>
<td style="padding:20px;width:25%;vertical-align:middle">
<div class="zero">
<img src='images/intel-labs.jpeg' width=100%>
</div>
</td>
<td style="padding:20px;width:75%;vertical-align:middle">
<p><b>AI Researcher in the Brain Inspired Computing Lab (internship), 2019.</b></p>
<p><a href="https://dl.acm.org/doi/10.1145/3368826.3377928">"NeuroVectorizer: End-to-End Vectorization with Deep Reinforcement Learning"</a>. Published in CGO 2020 (the premier conference in compilers).</p>
<p><a href="https://people.eecs.berkeley.edu/~krste/papers/RLDRM-netsoft2020.pdf">"RLDRM: Closed Loop Dynamic Intel RDT Resource Allocation with Deep Reinforcement Learning"</a>. Published in NetSoft 2020 (received <font color="red"><strong>Best Paper Award</strong></font>).</p>
<p>"A View on Deep Reinforcement Learning in System Optimization". Available on <a href="https://arxiv.org/abs/1908.01275">arXiv</a>.</p>
</td>
</tr>
<tr>
<td style="padding:20px;width:25%;vertical-align:middle">
<div class="zero">
<img src='images/nvidia.png' width=100%>
</div>
</td>
<td style="padding:20px;width:75%;vertical-align:middle">
<p> <b>Chip Design Engineer, 2015-2016.</b></p>
<p> Worked during my studies on creating design and automation tools that facilitated the formal and dynamic verification process. Worked especially with Python, scripting languages, C++, and Verilog.</p>
</td>
</tr>
</tbody></table>
<table style="width:100%;border:0px;border-spacing:0px;border-collapse:separate;margin-right:auto;margin-left:auto;"><tbody>
<tr>
<td style="padding:20px;width:100%;vertical-align:middle">
<h1>Publications</h1>
</td>
</tr>
</tbody></table>
<table style="width:100%;border:0px;border-spacing:0px;border-collapse:separate;margin-right:auto;margin-left:auto;"><tbody>
<tr>
<td style="padding:20px;width:25%;vertical-align:middle">
<div class="one">
<div>
<img src='images/gemmini.png' width=100%>
</div>
</td>
<td style="padding:20px;width:75%;vertical-align:middle">
<a href="https://github.com/ucb-bar/gemmini">
<span class="papertitle">Gemmini: Enabling Systematic Deep-Learning Architecture Evaluation via Full-Stack Integration</span>
</a>
<br>
Hasan Genc, Seah Kim, Alon Amid, <strong>Ameer Haj-Ali</strong>, Vighnesh Iyer, Pranav Prakash, Jerry Zhao, Daniel Grubb, Harrison Liew, Howard Mao, Albert Ou, Colin Schmidt, Samuel Ste, John Wright, Ion Stoica, Jonathan Ragan-Kelley, Krste Asanovic, Borivoje Nikolic, Yakun Sophia Shao
<br>
<em>DAC 2021</em>. Nominated for <font color="red"><strong>Best Paper Award</strong></font>.
<br>
<a href="https://github.com/ucb-bar/gemmini">project page</a>
/
<a href="https://www.youtube.com/watch?v=zhO8iUBpnCc">video</a>
/
<a href="https://www.youtube.com/watch?v=Q6gfthExSts">full tutorial</a>
/
<a href="https://dl.acm.org/doi/10.1109/DAC18074.2021.9586216">paper</a>
/
<a href="https://arxiv.org/abs/1911.09925">arXiv</a>
<p></p>
<p>
We present Gemmini, an open-source, systolic array based full-stack (hardware and software) DNN accelerator generator.</p>
</td>
</tr>
<tr>
<td style="padding:20px;width:25%;vertical-align:middle">
<div class="one">
<div>
<img src='images/tenset.png' width=100%>
</div>
</td>
<td style="padding:20px;width:75%;vertical-align:middle">
<a href="https://github.com/tlc-pack/tenset">
<span class="papertitle">TenSet: A Large-scale Program Performance Dataset for Learned Tensor Compilers</span>
</a>
<br>
Lianmin Zheng, Ruochen Liu, Junru Shao, Tianqi Chen, Joseph E. Gonzalez, Ion Stoica, <strong>Ameer Haj Ali</strong>
<br>
<em>NeurIPS 2021</em>.
<br>
<a href="https://github.com/tlc-pack/tenset">project page</a>
/
<a href="https://datasets-benchmarks-proceedings.neurips.cc/paper_files/paper/2021/file/a684eceee76fc522773286a895bc8436-Paper-round1.pdf">paper</a>
<p></p>
<p>
TenSet is a large-scale tensor program performance dataset featuring 52 million records across six hardware platforms, offering in-depth analysis on learning and evaluating cost models that can improve tensor compiler search times by up to tenfold.
</td>
</tr>
<tr>
<td style="padding:20px;width:25%;vertical-align:middle">
<div class="one">
<div>
<img src='images/protuner.png' width=100%>
</div>
</td>
<td style="padding:20px;width:75%;vertical-align:middle">
<a href="https://arxiv.org/abs/2005.13685">
<span class="papertitle">ProTuner: Tuning Programs with Monte Carlo Tree Search</span>
</a>
<br>
<strong>Ameer Haj-Ali</strong>, Hasan Genc, Qijing Huang, William Moses, John Wawrzynek, Krste Asanović, Ion Stoica
<br>
<a href="https://arxiv.org/abs/2005.13685">arXiv</a>
<p></p>
<p>
We show that Monte Carlo Tree Search (MCTS), when applied to the challenging task of tuning programs for deep learning and image processing using the Halide framework, outperforms the state-of-the-art beam search by evaluating complete schedules and incorporating real-time execution measurements.
</td>
</tr>
<tr>
<td style="padding:20px;width:25%;vertical-align:middle">
<div class="one">
<div>
<img src='images/ansor.png' width=100%>
</div>
</td>
<td style="padding:20px;width:75%;vertical-align:middle">
<a href="">
<span class="papertitle">Ansor: Generating High-Performance Tensor Programs for Deep Learning</span>
</a>
<br>
Lianmin Zheng, Chengfan Jia, Minmin Sun, Zhao Wu, Cody Hao Yu, <strong>Ameer Haj-Ali</strong>, Yida Wang, Jun Yang, Danyang Zhuo, Koushik Sen, Joseph Gonzalez, Ion Stoica
<br>
<em>OSDI 2020</em>.
<br>
<a href="https://www.youtube.com/watch?v=A2hJ_Mj02zk">Video</a>
/
<a href="https://www.usenix.org/system/files/osdi20-zheng.pdf">paper</a>
/
<a href="https://arxiv.org/abs/2006.06762">arXiv</a>
<p></p>
<p>
We introduce Ansor, a tensor program generation framework that surpasses existing methods by exploring a broader range of optimization combinations and fine-tuning them with evolutionary search and a learned cost model, significantly enhancing the execution performance of deep neural networks on various hardware platforms.
</td>
</tr>
<tr>
<td style="padding:20px;width:25%;vertical-align:middle">
<div class="zero">
<div>
<img src='images/neurovectorizer.png' width=100%>
</div>
</td>
<td style="padding:20px;width:75%;vertical-align:middle">
<a href="https://dl.acm.org/doi/abs/10.1145/3368826.3377928">
<span class="papertitle">NeuroVectorizer: End-to-End Vectorization with Deep Reinforcement Learning</span>
</a>
<br>
<strong>Ameer Haj-Ali</strong>, Nesreen Ahmed, Ted Willke, Sophia Shao, Krste Asanovic, Ion Stoica
<br>
<em>CGO 2020</em>.
<br>
<a href="https://dl.acm.org/doi/abs/10.1145/3368826.3377928">paper</a>
/
<a href="https://arxiv.org/abs/1909.13639">arXiv</a>
/
<a href="https://github.com/intel/neuro-vectorizer">code</a>
/
<a href="https://www.youtube.com/watch?v=GwnFmFh2phI&list=PLTPaZLQlNIHrdv_yu6myVGBWABj4rNY45&index=24&t=0s">video</a>
<p></p>
<p>
NeuroVectorizer is a framework that uses deep reinforcement learning to automate the vectorization process in compilers, significantly improving performance on modern processors. Work done in a summer internship at Intel Labs.
</p>
</td>
</tr>
<tr>
<td style="padding:20px;width:25%;vertical-align:middle">
<div class="one">
<div>
<img src='images/autophase.png' width=100%>
</div>
</td>
<td style="padding:20px;width:75%;vertical-align:middle">
<a href="https://proceedings.mlsys.org/book/292.pdf">
<span class="papertitle">AutoPhase: Juggling HLS Phase Orderings in Random Forests with Deep Reinforcement Learning</span>
</a>
<br>
<strong>Ameer Haj-Ali</strong>, Qijing Huang, William Moses, John Xiang, Krste Asanovic, John Wawrzynek, Ion Stoica
<br>
<em>MLSys 2020</em>.
<br>
<a href="https://proceedings.mlsys.org/book/292.pdf">paper</a>
/
<a href="https://arxiv.org/abs/2003.00671">arXiv</a>
/
<a href="https://github.com/ucb-bar/autophase">code</a>
/
<a href="https://www.youtube.com/watch?v=bl1J1gsGAcw&list=PLTPaZLQlNIHqLyiLUZe8Vrk1EwPfAHPVJ&index=37">video</a>
<p>
AutoPhase leverages deep reinforcement learning to efficiently explore phase ordering in high-level synthesis (HLS), achieving optimal performance for various applications by dynamically learning effective phase sequences.
</p>
</td>
</tr>
<tr>
<td style="padding:20px;width:25%;vertical-align:middle">
<div class="one">
<div>
<img src='images/autockt.png' width=100%>
</div>
</td>
<td style="padding:20px;width:75%;vertical-align:middle">
<a href="https://github.com/ksettaluri6/AutoCkt">
<span class="papertitle">AutoCkt: Deep Reinforcement Learning of Analog Circuit Designs</span>
</a>
<br>
Keertana Settaluri, <strong>Ameer Haj-Ali</strong>, Qijing Huang, Suhong Moon, Kourosh Hakhamaneshi, Ion Stoica, Krste Asanovic, Borivoje Nikolic
<br>
<em>DATE 2020</em>.
<br>
<a href="https://github.com/ksettaluri6/AutoCkt">code</a>
/
<a href="https://dl.acm.org/doi/10.5555/3408352.3408464">paper</a>
/
<a href="https://arxiv.org/abs/2001.01808">arXiv</a>
<p>
AutoCkt introduces a deep reinforcement learning-based approach to automate and optimize the design of analog circuits, demonstrating substantial improvements in design quality and efficiency.
</p>
</td>
</tr>
<tr>
<td style="padding:20px;width:25%;vertical-align:middle">
<div class="one">
<div>
<img src='images/rldrm.png' width=100%>
</div>
</td>
<td style="padding:20px;width:75%;vertical-align:middle">
<a href="https://people.eecs.berkeley.edu/~krste/papers/RLDRM-netsoft2020.pdf">
<span class="papertitle">RLDRM: Closed Loop Dynamic Cache Allocation with Deep Reinforcement Learning for Network Function Virtualization</span>
</a>
<br>
Bin Li, Yipeng Wang, Ren Wang, Charlie Tai, Ravi Iyer, Zhu Zhou, Andrew Herdrich, Tong Zhang, <strong>Ameer Haj-Ali</strong>, Ion Stoica, Krste Asanovic
<br>
<em>NetSoft 2020</em>. <font color="red"><strong>Best Paper Award</strong></font>.
<br>
<a href="https://people.eecs.berkeley.edu/~krste/papers/RLDRM-netsoft2020.pdf">paper</a>
<p></p>
<p>
RLDRM employs deep reinforcement learning to dynamically allocate cache resources in network function virtualization, enhancing system performance and adaptability in real-time network environments. Work done in a summer internship at Intel Labs.
</p>
</td>
</tr>
<tr>
<td style="padding:20px;width:25%;vertical-align:middle">
<div class="zero">
<div>
<img src='images/deep-rl-system-optimization.png' width=100%>
</div>
</td>
<td style="padding:20px;width:75%;vertical-align:middle">
<a href="https://arxiv.org/abs/1908.01275">
<span class="papertitle">A View on Deep Reinforcement Learning in System Optimization</span>
</a>
<br>
<strong>Ameer Haj-Ali</strong>, Nesreen Ahmed, Ted Willke, Joseph Gonzalez, Krste Asanovic, Ion Stoica
<br>
<em>arXiv preprint, 2019</em>.
<br>
<a href="https://arxiv.org/abs/1908.01275">arXiv</a>
<p></p>
<p>
This paper critically reviews and evaluates the application of deep reinforcement learning to system optimization, proposing key metrics for future assessments and discussing the method's relative effectiveness, challenges, and potential directions compared to traditional and heuristic approaches. Work done in a summer internship at Intel Labs.
</p>
</td>
</tr>
<tr>
<td style="padding:20px;width:25%;vertical-align:middle">
<div class="zero">
<div>
<img src='images/autophase-hls.png' width=100%>
</div>
</td>
<td style="padding:20px;width:75%;vertical-align:middle">
<a href="https://ieeexplore.ieee.org/document/8735549">
<span class="papertitle">AutoPhase: Compiler Phase-Ordering for HLS with Deep Reinforcement Learning</span>
</a>
<br>
<strong>Ameer Haj-Ali</strong>, Qijing Huang, William Moses, John Xiang, Ion Stoica, Krste Asanovic, John Wawrzynek
<br>
<em>FCCM, 2019</em>.
<br>
<a href="https://ieeexplore.ieee.org/document/8735549">paper</a>
/
<a href="https://arxiv.org/abs/1901.04615">arXiv</a>
/
<a href="https://github.com/ucb-bar/autophase">code</a>
/
<a href="https://www.youtube.com/watch?v=bl1J1gsGAcw&list=PLTPaZLQlNIHqLyiLUZe8Vrk1EwPfAHPVJ&index=37">video</a>
<p>This paper evaluates a deep reinforcement learning framework implemented in the LLVM compiler to optimize the order of optimization passes for high-level synthesis, achieving a significant enhancement in circuit performance and markedly faster results compared to state-of-the-art phase-ordering algorithms. </p>
</td>
</tr>
<tr>
<td style="padding:20px;width:25%;vertical-align:middle">
<div class="zero">
<div>
<img src='images/pim-image-processing.png' width=100%>
</div>
</td>
<td style="padding:20px;width:75%;vertical-align:middle">
<a href="https://link.springer.com/chapter/10.1007/978-981-13-8379-3_8">
<span class="papertitle">Memristor-Based Processing-in-Memory and Its Application On Image Processing</span>
</a>
<br>
<strong>Ameer Haj-Ali</strong>, Ronny Ronen, Rotem Ben-Hur, Nimrod Wald, Shahar Kvatinsky
<br>
<em>Elsevier, 2020</em>.
<br>
<a href="https://link.springer.com/chapter/10.1007/978-981-13-8379-3_8">chapter</a>
<p>This chapter overviews memristor-based logic techniques in in-memory computing (IMC), exemplified through a case study on Memristor Aided loGIC (MAGIC) in a memristive Memory Processing Unit (mMPU), demonstrating enhanced performance and energy efficiency in image processing tasks compared to other advanced memristive logic systems.
</p>
</td>
</tr>
<tr>
<td style="padding:20px;width:25%;vertical-align:middle">
<div class="zero">
<div>
<img src='images/mmpu.png' width=100%>
</div>
</td>
<td style="padding:20px;width:75%;vertical-align:middle">
<a href="https://link.springer.com/chapter/10.1007/978-981-13-8379-3_8">
<span class="papertitle">mMPU - a Real Processing-in-Memory Architecture to Combat the von Neumann Bottleneck</span>
</a>
<br>
Nishil Talati, Rotem Ben-Hur, Nimrod Wald, <strong>Ameer Haj-Ali</strong>, John Reuben, Shahar Kvatinsky
<br>
<em>Springer, 2020</em>.
<br>
<a href="https://link.springer.com/chapter/10.1007/978-981-13-8379-3_8">chapter</a>
<p>
This chapter introduces the memristive Memory Processing Unit (mMPU), which integrates computation within memory cells using Memristor Aided loGIC (MAGIC) to address the von Neumann bottleneck, detailing the system's architecture and demonstrating how MAGIC can execute arbitrary Boolean functions for processing-in-memory applications.
</p>
</td>
</tr>
<tr>
<td style="padding:20px;width:25%;vertical-align:middle">
<div class="zero">
<div>
<img src='images/simpler-magic.png' width=100%>
</div>
</td>
<td style="padding:20px;width:75%;vertical-align:middle">
<a href="https://ieeexplore.ieee.org/document/8781866">
<span class="papertitle">SIMPLER MAGIC: Synthesis and Mapping of In-Memory Logic Executed in a Single Row to Improve Throughput</span>
</a>
<br>
Rotem Ben-Hur, Ronny Ronen, <strong>Ameer Haj-Ali</strong>, Debjyoti Bhattacharjee, Adi Eliahu, Natan Peled, Shahar Kvatinsky
<br>
<em>TCAD, 2019</em>.
<br>
<a href="https://ieeexplore.ieee.org/document/8781866">paper</a>
<p>
This article introduces SIMPLER, an automatic framework that optimizes the execution of arbitrary combinational logic functions within a memristive memory using graph theory, logic design, and compiler technology, achieving substantial improvements in throughput, area efficiency, and parallel processing capabilities for in-memory computing.
</p>
</td>
</tr>
<tr>
<td style="padding:20px;width:25%;vertical-align:middle">
<div class="one">
<div>
<img src='images/memristor-synapse.png' width=100%>
</div>
</td>
<td style="padding:20px;width:75%;vertical-align:middle">
<a href="https://ieeexplore.ieee.org/document/8600725">
<span class="papertitle">Supporting the Momentum Training Algorithm Using a Memristor-Based Synapse</span>
</a>
<br>
Tzofnat Greenberg-Toledo, Roee Mazor, <strong>Ameer Haj-Ali</strong>, Shahar Kvatinsky
<br>
<em>TCAS-I, 2019</em>.
<br>
<a href="https://ieeexplore.ieee.org/document/8600725">paper</a>
<p>
This paper introduces a memristor-based synapse that enhances deep neural network (DNN) training by supporting the momentum algorithm, proposing two design approaches to improve the convergence and efficiency of training, with simulations showing significant speedups and energy reductions compared to GPU platforms.
</p>
</td>
</tr>
<tr>
<td style="padding:20px;width:25%;vertical-align:middle">
<div class="zero">
<div>
<img src='images/in-memory-processing.png' width=100%>
</div>
</td>
<td style="padding:20px;width:75%;vertical-align:middle">
<a href="https://www.computer.org/csdl/magazine/mi/2018/05/mmi2018050013/13WBGLOPCjC">
<span class="papertitle">Not in Name Alone: a Memristive Memory Processing Unit for Real In-Memory Processing</span>
</a>
<br>
<strong>Ameer Haj-Ali</strong>, Rotem Ben-Hur, Nimrod Wald, Ronny Ronen, Shahar Kvatinsky
<br>
<em>IEEE Micro, 2018</em>.
<br>
<a href="https://www.computer.org/csdl/magazine/mi/2018/05/mmi2018050013/13WBGLOPCjC">paper</a>
<p>
This paper presents the memristive Memory Processing Unit (mMPU), a processing-in-memory system that eliminates data transfer by performing computation directly within memory cells, leveraging its inherent parallelism to provide high throughput and energy efficiency for SIMD-based data-intensive applications.
</p>
</td>
</tr>
<tr>
<td style="padding:20px;width:25%;vertical-align:middle">
<div class="one">
<div>
<img src='images/imaging.png' width=100%>
</div>
</td>
<td style="padding:20px;width:75%;vertical-align:middle">
<a href="https://ieeexplore.ieee.org/document/8398398">
<span class="papertitle">IMAGING: In-Memory AlGorithms for Image processiNG</span>
</a>
<br>
<strong>Ameer Haj-Ali</strong>, Rotem Ben-Hur, Nimrod Wald, Ronny Ronen, Shahar Kvatinsky
<br>
<em>TCAS-I, 2018</em>.
<br>
<a href="https://ieeexplore.ieee.org/document/8398398">paper</a>
<p>
This paper proposes four in-memory algorithms for fixed-point multiplication using MAGIC gates, implemented within memristor-based memory cells to enhance latency, throughput, and area efficiency, enabling effective execution of complex operations like image convolution and optimized parallel processing in data-intensive applications.
</p>
</td>
</tr>
<tr>
<td style="padding:20px;width:25%;vertical-align:middle">
<div class="one">
<div>
<img src='images/fixed-point-multiplication.png' width=100%>
</div>
</td>
<td style="padding:20px;width:75%;vertical-align:middle">
<a href="https://ieeexplore.ieee.org/document/8351561">
<span class="papertitle">Efficient Algorithms for In-memory Fixed Point Multiplication Using MAGIC</span>
</a>
<br>
<strong>Ameer Haj-Ali</strong>, Rotem Ben-Hur, Nimrod Wald, Shahar Kvatinsky
<br>
<em>ISCAS, 2018</em>.
<br>
<a href="https://ieeexplore.ieee.org/document/8351561">paper</a>
<p>
This paper introduces algorithms for performing fixed-point multiplication within memristive memory cells using Memristor Aided Logic (MAGIC) gates, achieving a 1.8× improvement in latency and enhanced area efficiency that enables simultaneous executions, addressing the computational constraints of previous implementations.
</p>
</td>
</tr>
<tr>
<td style="padding:20px;width:25%;vertical-align:middle">
<div class="zero">
<div>
<img src='images/pim-challenges.png' width=100%>
</div>
</td>
<td style="padding:20px;width:75%;vertical-align:middle">
<a href="https://ieeexplore.ieee.org/document/8342275">
<span class="papertitle">Practical Challenges in Delivering the Promises of Real Processing-in-Memory Machines</span>
</a>
<br>
Nishil Talati, <strong>Ameer Haj-Ali</strong>, Rotem Ben-Hur, Nimrod Wald, Ronny Ronen, Pierre-Emmanuel Gaillardon, Shahar Kvatinsky
<br>
<em>DATE, 2018</em>.
<br>
<a href="https://ieeexplore.ieee.org/document/8342275">paper</a>
<p>
This paper evaluates the memristive Memory Processing Unit (mMPU) as a Processing-in-Memory (PiM) machine, analyzing its limitations in parallelism and internal data transfer, and demonstrates that these factors can increase execution times significantly, despite strategies to manage data movement within the device itself.
</p>
</td>
</tr>
<tr>
<td style="padding:20px;width:25%;vertical-align:middle">
<div class="zero">
<div>
<img src='images/memristive-logic.png' width=100%>
</div>
</td>
<td style="padding:20px;width:75%;vertical-align:middle">
<a href="https://ieeexplore.ieee.org/document/8106959">
<span class="papertitle">Memristive Logic: A Framework for Evaluation and Comparison</span>
</a>
<br>
John Reuben, Rotem Ben-Hur, Nimrod Wald, Nishil Talati, <strong>Ameer Haj-Ali</strong>, Pierre-Emmanuel Gaillardon, Shahar Kvatinsky
<br>
<em>PATMOS, 2017</em>.
<br>
<a href="https://ieeexplore.ieee.org/document/8106959">paper</a>
<p>
This paper introduces a framework for comparing memristive logic families by evaluating their statefulness, proximity to memory arrays, and computational flexibility, providing metrics for performance, energy efficiency, and area, and offering guidelines for a comprehensive assessment to facilitate the development of new logic families.
</p>
</td>
</tr>
<tr>
<td style="padding:20px;width:25%;vertical-align:middle">
<div class="zero">
<div>
<img src='images/memristive-taxonomy.png' width=100%>
</div>
</td>
<td style="padding:20px;width:75%;vertical-align:middle">
<a href="https://link.springer.com/chapter/10.1007/978-3-319-76375-0_37">
<span class="papertitle">A Taxonomy and Evaluation Framework for Memristive Logic</span>
</a>
<br>
John Reuben, Rotem Ben-Hur, Nimrod Wald, Nishil Talati, <strong>Ameer Haj-Ali</strong>, Pierre-Emmanuel Gaillardon, Shahar Kvatinsky
<br>
<em>Springer, 2017</em>.
<br>
<a href="https://link.springer.com/chapter/10.1007/978-3-319-76375-0_37">chapter</a>
<p>
This chapter outlines a framework for evaluating memristive logic families based on their statefulness, proximity to memory, and computational flexibility, using metrics for latency, energy efficiency, and area, and includes a case study on eight-bit addition to demonstrate the methodology and assess the potential for large-scale data computation.
</p>
</td>
</tr>
</tbody></table>
<table width="100%" align="center" border="0" cellspacing="0" cellpadding="20"><tbody>
<tr>
<td>
<h1>Miscellanea</h1>
</td>
</tr>
</tbody></table>
<table width="100%" align="center" border="0" cellpadding="20"><tbody>
<tr>
<td style="padding:20px;width:25%;vertical-align:middle">
<div class="one">
<img src="images/neurips.png", width=100%>
</div>
</td>
<td width="75%" valign="center">
<b>Area Chair</b>: NeurIPS 2024, NeurIPS 2023, NeurIPS 2022
<br>
<b>Conference & Journal Referee</b>: NeurIPS 2019, HPCA 2018, DATE 2018, VLSI-SoC 2018, ISCAS 2017, ISCAS 2016, CNNA 2016, TCAS-I, TCAS-II, TVLSI, Microelectronics Journal
</td>
</tr>
<tr>
<td style="padding:20px;width:25%;vertical-align:middle">
<div class="one">
<img src="images/gsi.png", width=100%>
</div>
</td>
<td width="75%" valign="center">
<b>Academia and Teaching</b>
<br>
Graduate PhD Admissions Committee
<br>
DARE (Diversifying Access to Research in Engineering) Admissions Committee.
<br>
Undergraduate project committee
<br>
Graduate Student Instructor (GSI), <a href="https://people.eecs.berkeley.edu/~jrs/189/">Introduction to Machine Learning (CS 189/289A)</a>
<br>
Head TA, Circuit Theory (700+ students, 044105).
<br>
Head TA, Electronic Switching Circuits (300+ students, 044147).
<br>
Supervisor of B.Sc. projects, VLSI Lab and Parallel Systems Lab (044167).
<br>
TA, MATLAB (044147).
</td>
</tr>
<tr>
<td style="padding:20px;width:25%;vertical-align:middle">
<div class="zero">
<img src="images/advisor.jpeg", width=100%>
</div>
</td>
<td width="75%" valign="center">
<b>Advised Students</b>
<br>
<a href="https://www.linkedin.com/in/ruochen99/">Chloe Liu</a> (First employment: graduate student at Stanford).
<br>
<a href="https://www.linkedin.com/in/ian-galbraith-73b052125/">Ian Galbraith</a> (First employment: software engineer at Twilio).
<br>
<a href="https://dfangshuo.github.io/">Fang Shuo Deng</a> (First employment: software engineer at Abnormal Security).
<br>
<a href="http://linkedin.com/in/stav-belogolovsky-400249134">Stav Belogolovsky</a> (First employment: Test and DFT Engineer at Arbe).
<br>
<a href="https://www.linkedin.com/in/amnon-wahle">Amnon Wahle</a> (First employment: Algorithm Research at BeyondMinds).
</td>
</tr>
<tr>
<td style="padding:20px;width:25%;vertical-align:middle">
<div class="one">
<img src="images/awards.webp", width=100%, height="100%">
</div>
</td>
<td width="75%" valign="center">
<b>Awards and Fellowships</b>
<br>
The person of the year in my home city (45,000 residents), Shefaraam, 2022.
<br>
Granted the EB1 + Green Card (Einstein Visa for Extraordinary Ability), USA, 2021.
<br>
Granted the O1 extraordinary ability Visa, USA, 2020.
<br>
The Valedictorian Honor (M.Sc.), Technion, 2019.
<br>
Open Gateway Fellowship, UC Berkeley, 2018.
<br>
The William Oldham Fellowship, UC Berkeley, 2018.
<br>
The Valedictorian Honor (B.Sc.), Technion, 2017.
<br>
Dean's scholarship for excellent graduate students, Technion, 2016.
<br>
Full tuition scholarship for M.Sc. studies, Technion , 2016-2018.
<br>
The System Architecture Labs Cluster Prize for outstanding undergraduate projects (received twice), Technion, 2016.
<br>
Excellence award from Apple for excellent scholastic achievements, Technion, 2016.
<br>
Member of the President's List of highest honors for excellent scholastic achievements in all undergraduate semesters (top 3%), Technion, 2013-2016.
<br>
Full tuition scholarship for B.Sc. studies, Technion, 2013-2016.
</td>
</tr>
<tr>
<td align="center" style="padding:20px;width:25%;vertical-align:middle">
<div class="one">
<img src="images/blogs.png", width=100%, height="100%">
</div>
</td>
<td width="75%" valign="middle">
<b>Blog Posts</b>
<br>
<a href="https://www.anyscale.com/blog/cloud-infrastructure-for-llm-and-generative-ai-applications">Cloud Infrastructure for LLM and Generative AI Applications</a>
<br>
<a href="https://www.anyscale.com/blog/anyscale-endpoints-fast-and-scalable-llm-apis">Anyscale Endpoints Preview: Fast, Cost-Efficient, and Scalable LLM APIs</a>
<br>
<a href="https://www.anyscale.com/blog/autoscaling-clusters-with-ray">Autoscaling clusters with Ray</a>
<br>
<a href="https://medium.com/distributed-computing-with-ray/easy-distributed-scikit-learn-training-with-ray-54ff8b643b33">Easy Distributed Scikit-Learn with Ray</a>
<br>
<a href="https://ameerhajali.medium.com/scale-ml-on-your-local-clusters-with-ray-2469c17bb8c9">Scale ML on Your Local Clusters with Ray</a>
</td>
</tr>
</tbody></table>
<table style="width:100%;border:0px;border-spacing:0px;border-collapse:separate;margin-right:auto;margin-left:auto;"><tbody>
<tr>
<td style="padding:0px">
<br>
<p style="text-align:right;font-size:small;">
<a href="http://jonbarron.info">Website template credits</a>.
</p>
</td>
</tr>
</tbody></table>
</td>
</tr>
</table>
</body>
</html>