index.html

<!DOCTYPE HTML>
<html lang="en">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

    <title>Ameer Haj-Ali</title>

    <meta name="author" content="Ameer Haj-Ali">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <link rel="shortcut icon" href="images/favicon/favicon.ico" type="image/x-icon">
    <link rel="stylesheet" type="text/css" href="stylesheet.css">
    
  </head>

  <body>
    <table style="width:100%;max-width:800px;border:0px;border-spacing:0px;border-collapse:separate;margin-right:auto;margin-left:auto;"><tbody>
      <tr style="padding:0px">
        <td style="padding:0px">
          <table style="width:100%;border:0px;border-spacing:0px;border-collapse:separate;margin-right:auto;margin-left:auto;"><tbody>
            <tr style="padding:0px">
              <td style="padding:2.5%;width:63%;vertical-align:middle">
                <p class="name" style="text-align: center;">
                  Ameer Haj-Ali
                </p>
                <p> <b>I am currently exploring starting a company in the AI space</b>. 
                  <br>I love connecting with smart people. Please feel free to reach out!
                </p>
                <p>
                  <a href="https://www.anyscale.com/blog/cloud-infrastructure-for-llm-and-generative-ai-applications"></a>
                  Previously, I was part of the founding team of <a href="https://www.anyscale.com">Anyscale</a> where I helped grow the company from 0 to 150 and headed Platform, Infrastructure, and Gen AI organizations and the major releases, such as <a href="https://www.anyscale.com/blog/cloud-infrastructure-for-llm-and-generative-ai-applications">Multi-Cloud Infrastructure</a>, <a href="https://www.youtube.com/watch?v=Q1t9qeDJquI">General Availability</a>, <a href="https://www.youtube.com/watch?v=r-NYSeAXCko&t=1048s">LLM Endpoints</a>.
                </p>
                <p>
                  I completed my CS PhD in <a href="https://www.youtube.com/watch?v=6P1ldaiX20g">2 years</a> (the fastest in the university) at UC Berkeley in AI and System working with Professors <a href="https://people.eecs.berkeley.edu/~istoica/">Ion Stoica</a> and <a href="https://people.eecs.berkeley.edu/~krste/">Krste Asanovic</a>. I received the valedictorian honor from the Technion twice for my <a href="https://www.youtube.com/watch?v=r1YwUp-PA9M&t=21s">M.Sc.</a> and <a href="https://www.youtube.com/watch?v=IvArpBPUIhM&t=1s">B.Sc.</a>
                </p>
                <p style="text-align:center">
                  <a href="mailto:hajali.ameer@gmail.com">Email</a> &nbsp;/&nbsp;
                  <a href="data/AmeerHajAli-CV.pdf">CV</a> &nbsp;/&nbsp;
                  <a href="data/AmeerHajAli-bio.txt">Bio</a> &nbsp;/&nbsp;
                  <a href="https://scholar.google.co.il/citations?user=jJBqJxwAAAAJ&hl=en">Scholar</a> &nbsp;/&nbsp;
                  <a href="https://www.linkedin.com/in/ameer-haj-ali/">LinkedIn</a> &nbsp;/&nbsp;
                  <a href="https://twitter.com/aha_ml">Twitter</a> &nbsp;/&nbsp;
                  <a href="https://github.com/AmeerHajAli/">Github</a>
                </p>
              </td>
              <td style="padding:2.5%;width:40%;max-width:40%">
                <a href="images/hajali_ameer.jpg"><img style="width:100%;max-width:100%;object-fit: cover; border-radius: 50%;" alt="profile photo" src="images/hajali_ameer.jpg" class="hoverZoomLink"></a>
              </td>
            </tr>
          </tbody></table>
          <table style="width:100%;border:0px;border-spacing:0px;border-collapse:separate;margin-right:auto;margin-left:auto;"><tbody>
            <tr>
            <td style="padding:20px;width:100%;vertical-align:middle">
              <h1>Professional Experience</h1>
              <br>
                I thrive in building and conducting lean, fast executing, high-performing teams to achieve ambitious goals.
              </td>
              </tr>
        </tbody></table>
        <table style="width:100%;border:0px;border-spacing:0px;border-collapse:separate;margin-right:auto;margin-left:auto;"><tbody>
              <tr>
                <td style="padding:20px;width:25%;vertical-align:middle">
                  <div class="zero">
                    <img src='images/anyscale.png' width=100%>
                  </div>
                </td>
                <td style="padding:20px;width:75%;vertical-align:middle">
                  <b>Founding Engineer. Head of Platform, Infrastructure & Endpoints Engineering, 2019-2024.</b>
                  <br>
                   <p> Helped grow the company from 0 to 150. Throughout my career at Anyscale, I built teams responsible for <a href="https://docs.ray.io/en/latest/serve/index.html">Ray Serve/Inference</a>, <a href="https://docs.ray.io/en/latest/cluster/getting-started.html">Ray Autoscaler and Cluster</a> (video <a href="https://www.youtube.com/watch?v=BJ06eJasdu4">here</a>), <a href="https://docs.ray.io/en/latest/cluster/running-applications/job-submission/ray-client.html">Ray Client</a>, <a href="https://docs.anyscale.com/services/get-started">Anyscale Services</a>, <a href="https://www.youtube.com/watch?v=r-NYSeAXCko&t=1048s">Gen AI</a>, <a href="https://www.anyscale.com/blog/cloud-infrastructure-for-llm-and-generative-ai-applications">Multi-cloud Infrastructure</a>, <a href="https://ray-project.github.io/kuberay/">KubeRay</a>, and proprietary <a href="https://github.com/vllm-project/vllm">vLLM</a>. </p>
                  </td>
              </tr>
              <tr>
                <td style="padding:20px;width:25%;vertical-align:middle">
                  <div class="zero">
                    <img src='images/intel-labs.jpeg' width=100%>
                  </div>
                </td>
                <td style="padding:20px;width:75%;vertical-align:middle">
                  <p><b>AI Researcher in the Brain Inspired Computing Lab (internship), 2019.</b></p>
                  <p><a href="https://dl.acm.org/doi/10.1145/3368826.3377928">"NeuroVectorizer: End-to-End Vectorization with Deep Reinforcement Learning"</a>. Published in CGO 2020 (the premier conference in compilers).</p>
                  <p><a href="https://people.eecs.berkeley.edu/~krste/papers/RLDRM-netsoft2020.pdf">"RLDRM: Closed Loop Dynamic Intel RDT Resource Allocation with Deep Reinforcement Learning"</a>. Published in NetSoft 2020 (received <font color="red"><strong>Best Paper Award</strong></font>).</p>
                  <p>"A View on Deep Reinforcement Learning in System Optimization". Available on <a href="https://arxiv.org/abs/1908.01275">arXiv</a>.</p>
                </td>
              </tr>
            <tr>
            <td style="padding:20px;width:25%;vertical-align:middle">
              <div class="zero">
                <img src='images/nvidia.png' width=100%>
              </div>
              </td>
            <td style="padding:20px;width:75%;vertical-align:middle">
              <p> <b>Chip Design Engineer, 2015-2016.</b></p>
              <p> Worked during my studies on creating design and automation tools that facilitated the formal and dynamic verification process. Worked especially with Python, scripting languages, C++, and Verilog.</p>
          </td>
          </tr>                 

        </tbody></table>
          <table style="width:100%;border:0px;border-spacing:0px;border-collapse:separate;margin-right:auto;margin-left:auto;"><tbody>
              <tr>
              <td style="padding:20px;width:100%;vertical-align:middle">
                <h1>Publications</h1>
              </td>
            </tr>
          </tbody></table>
  <table style="width:100%;border:0px;border-spacing:0px;border-collapse:separate;margin-right:auto;margin-left:auto;"><tbody>
    <tr>
      <td style="padding:20px;width:25%;vertical-align:middle">
        <div class="one">
          <div>
          <img src='images/gemmini.png' width=100%>
        </div>
      </td>
      <td style="padding:20px;width:75%;vertical-align:middle">
        <a href="https://github.com/ucb-bar/gemmini">
          <span class="papertitle">Gemmini: Enabling Systematic Deep-Learning Architecture Evaluation via Full-Stack Integration</span>
        </a>
        <br>
        Hasan Genc, Seah Kim, Alon Amid, <strong>Ameer Haj-Ali</strong>, Vighnesh Iyer, Pranav Prakash, Jerry Zhao, Daniel Grubb, Harrison Liew, Howard Mao, Albert Ou, Colin Schmidt, Samuel Ste, John Wright, Ion Stoica, Jonathan Ragan-Kelley, Krste Asanovic, Borivoje Nikolic, Yakun Sophia Shao
        <br>
        <em>DAC 2021</em>. Nominated for <font color="red"><strong>Best Paper Award</strong></font>.
        <br>
        <a href="https://github.com/ucb-bar/gemmini">project page</a>
        /
        <a href="https://www.youtube.com/watch?v=zhO8iUBpnCc">video</a>
        /
        <a href="https://www.youtube.com/watch?v=Q6gfthExSts">full tutorial</a>
        /
        <a href="https://dl.acm.org/doi/10.1109/DAC18074.2021.9586216">paper</a>
        /
        <a href="https://arxiv.org/abs/1911.09925">arXiv</a>
        <p></p>
        <p>
          We present Gemmini, an open-source, systolic array based full-stack (hardware and software) DNN accelerator generator.</p>
      </td>
    </tr>
	
    <tr>
      <td style="padding:20px;width:25%;vertical-align:middle">
        <div class="one">
          <div>
          <img src='images/tenset.png' width=100%>
        </div>
      </td>
      <td style="padding:20px;width:75%;vertical-align:middle">
        <a href="https://github.com/tlc-pack/tenset">
          <span class="papertitle">TenSet: A Large-scale Program Performance Dataset for Learned Tensor Compilers</span>
        </a>
        <br>
        Lianmin Zheng, Ruochen Liu, Junru Shao, Tianqi Chen, Joseph E. Gonzalez, Ion Stoica, <strong>Ameer Haj Ali</strong>
        <br>
        <em>NeurIPS 2021</em>.
        <br>
        <a href="https://github.com/tlc-pack/tenset">project page</a>
        /
        <a href="https://datasets-benchmarks-proceedings.neurips.cc/paper_files/paper/2021/file/a684eceee76fc522773286a895bc8436-Paper-round1.pdf">paper</a>
        <p></p>
        <p>
          TenSet is a large-scale tensor program performance dataset featuring 52 million records across six hardware platforms, offering in-depth analysis on learning and evaluating cost models that can improve tensor compiler search times by up to tenfold.
        </td>
    </tr>
    <tr>
      <td style="padding:20px;width:25%;vertical-align:middle">
        <div class="one">
          <div>
            <img src='images/protuner.png' width=100%>
          </div>
        </td>
      <td style="padding:20px;width:75%;vertical-align:middle">
        <a href="https://arxiv.org/abs/2005.13685">
          <span class="papertitle">ProTuner: Tuning Programs with Monte Carlo Tree Search</span>
        </a>
        <br>
        <strong>Ameer Haj-Ali</strong>, Hasan Genc, Qijing Huang, William Moses, John Wawrzynek, Krste Asanović, Ion Stoica
        <br>
        <a href="https://arxiv.org/abs/2005.13685">arXiv</a>
        <p></p>
        <p>
          We show that Monte Carlo Tree Search (MCTS), when applied to the challenging task of tuning programs for deep learning and image processing using the Halide framework, outperforms the state-of-the-art beam search by evaluating complete schedules and incorporating real-time execution measurements.
        </td>
    </tr>
    <tr>
      <td style="padding:20px;width:25%;vertical-align:middle">
        <div class="one">
          <div>
            <img src='images/ansor.png' width=100%>
          </div>
        </td>
      <td style="padding:20px;width:75%;vertical-align:middle">
        <a href="">
          <span class="papertitle">Ansor: Generating High-Performance Tensor Programs for Deep Learning</span>
        </a>
        <br>
        Lianmin Zheng, Chengfan Jia, Minmin Sun, Zhao Wu, Cody Hao Yu, <strong>Ameer Haj-Ali</strong>, Yida Wang, Jun Yang, Danyang Zhuo, Koushik Sen, Joseph Gonzalez, Ion Stoica
        <br>
        <em>OSDI 2020</em>.
        <br>
        <a href="https://www.youtube.com/watch?v=A2hJ_Mj02zk">Video</a>
        /
        <a href="https://www.usenix.org/system/files/osdi20-zheng.pdf">paper</a>
        /
        <a href="https://arxiv.org/abs/2006.06762">arXiv</a>
        <p></p>
        <p>
          We introduce Ansor, a tensor program generation framework that surpasses existing methods by exploring a broader range of optimization combinations and fine-tuning them with evolutionary search and a learned cost model, significantly enhancing the execution performance of deep neural networks on various hardware platforms.
 </td>
    </tr>
    <tr>
      <td style="padding:20px;width:25%;vertical-align:middle">
        <div class="zero">
          <div>
          <img src='images/neurovectorizer.png' width=100%>
        </div>
      </td>
      <td style="padding:20px;width:75%;vertical-align:middle">
        <a href="https://dl.acm.org/doi/abs/10.1145/3368826.3377928">
          <span class="papertitle">NeuroVectorizer: End-to-End Vectorization with Deep Reinforcement Learning</span>
        </a>
        <br>
        <strong>Ameer Haj-Ali</strong>, Nesreen Ahmed, Ted Willke, Sophia Shao, Krste Asanovic, Ion Stoica
        <br>
        <em>CGO 2020</em>.
        <br>
        <a href="https://dl.acm.org/doi/abs/10.1145/3368826.3377928">paper</a>
        /
        <a href="https://arxiv.org/abs/1909.13639">arXiv</a>
        /
        <a href="https://github.com/intel/neuro-vectorizer">code</a>
        /
        <a href="https://www.youtube.com/watch?v=GwnFmFh2phI&list=PLTPaZLQlNIHrdv_yu6myVGBWABj4rNY45&index=24&t=0s">video</a>
        <p></p>
        <p>
          NeuroVectorizer is a framework that uses deep reinforcement learning to automate the vectorization process in compilers, significantly improving performance on modern processors. Work done in a summer internship at Intel Labs.
        </p>
      </td>
    </tr>
    
    <tr>
      <td style="padding:20px;width:25%;vertical-align:middle">
        <div class="one">
          <div>
          <img src='images/autophase.png' width=100%>
        </div>
      </td>
      <td style="padding:20px;width:75%;vertical-align:middle">
        <a href="https://proceedings.mlsys.org/book/292.pdf">
          <span class="papertitle">AutoPhase: Juggling HLS Phase Orderings in Random Forests with Deep Reinforcement Learning</span>
        </a>
        <br>
        <strong>Ameer Haj-Ali</strong>, Qijing Huang, William Moses, John Xiang, Krste Asanovic, John Wawrzynek, Ion Stoica
        <br>
        <em>MLSys 2020</em>.
        <br>
        <a href="https://proceedings.mlsys.org/book/292.pdf">paper</a>
        /
        <a href="https://arxiv.org/abs/2003.00671">arXiv</a>
        /
        <a href="https://github.com/ucb-bar/autophase">code</a>
        /
        <a href="https://www.youtube.com/watch?v=bl1J1gsGAcw&list=PLTPaZLQlNIHqLyiLUZe8Vrk1EwPfAHPVJ&index=37">video</a>
        <p>
          AutoPhase leverages deep reinforcement learning to efficiently explore phase ordering in high-level synthesis (HLS), achieving optimal performance for various applications by dynamically learning effective phase sequences.
        </p>
      </td>
    </tr>
    
    <tr>
      <td style="padding:20px;width:25%;vertical-align:middle">
        <div class="one">
          <div>
          <img src='images/autockt.png' width=100%>
        </div>
      </td>
      <td style="padding:20px;width:75%;vertical-align:middle">
        <a href="https://github.com/ksettaluri6/AutoCkt">
          <span class="papertitle">AutoCkt: Deep Reinforcement Learning of Analog Circuit Designs</span>
        </a>
        <br>
        Keertana Settaluri, <strong>Ameer Haj-Ali</strong>, Qijing Huang, Suhong Moon, Kourosh Hakhamaneshi, Ion Stoica, Krste Asanovic, Borivoje Nikolic
        <br>
        <em>DATE 2020</em>.
        <br>
        <a href="https://github.com/ksettaluri6/AutoCkt">code</a>
        /
        <a href="https://dl.acm.org/doi/10.5555/3408352.3408464">paper</a>
        /
        <a href="https://arxiv.org/abs/2001.01808">arXiv</a>

        <p>
          AutoCkt introduces a deep reinforcement learning-based approach to automate and optimize the design of analog circuits, demonstrating substantial improvements in design quality and efficiency. 
        </p>
      </td>
    </tr>
    
    <tr>
      <td style="padding:20px;width:25%;vertical-align:middle">
        <div class="one">
          <div>
          <img src='images/rldrm.png' width=100%>
        </div>
      </td>
      <td style="padding:20px;width:75%;vertical-align:middle">
        <a href="https://people.eecs.berkeley.edu/~krste/papers/RLDRM-netsoft2020.pdf">
          <span class="papertitle">RLDRM: Closed Loop Dynamic Cache Allocation with Deep Reinforcement Learning for Network Function Virtualization</span>
        </a>
        <br>
        Bin Li, Yipeng Wang, Ren Wang, Charlie Tai, Ravi Iyer, Zhu Zhou, Andrew Herdrich, Tong Zhang, <strong>Ameer Haj-Ali</strong>, Ion Stoica, Krste Asanovic
        <br>
        <em>NetSoft 2020</em>. <font color="red"><strong>Best Paper Award</strong></font>.
        <br>
        <a href="https://people.eecs.berkeley.edu/~krste/papers/RLDRM-netsoft2020.pdf">paper</a>
        <p></p>
        <p>
          RLDRM employs deep reinforcement learning to dynamically allocate cache resources in network function virtualization, enhancing system performance and adaptability in real-time network environments. Work done in a summer internship at Intel Labs.
        </p>
      </td>
    </tr>
    
    <tr>
      <td style="padding:20px;width:25%;vertical-align:middle">
        <div class="zero">
          <div>
          <img src='images/deep-rl-system-optimization.png' width=100%>
        </div>
      </td>
      <td style="padding:20px;width:75%;vertical-align:middle">
        <a href="https://arxiv.org/abs/1908.01275">
          <span class="papertitle">A View on Deep Reinforcement Learning in System Optimization</span>
        </a>
        <br>
        <strong>Ameer Haj-Ali</strong>, Nesreen Ahmed, Ted Willke, Joseph Gonzalez, Krste Asanovic, Ion Stoica
        <br>
        <em>arXiv preprint, 2019</em>.
        <br>
        <a href="https://arxiv.org/abs/1908.01275">arXiv</a>
        <p></p>
        <p>
          This paper critically reviews and evaluates the application of deep reinforcement learning to system optimization, proposing key metrics for future assessments and discussing the method's relative effectiveness, challenges, and potential directions compared to traditional and heuristic approaches. Work done in a summer internship at Intel Labs.
        </p>
      </td>
    </tr>
    
    <tr>
      <td style="padding:20px;width:25%;vertical-align:middle">
        <div class="zero">
          <div>
          <img src='images/autophase-hls.png' width=100%>
        </div>
      </td>
      <td style="padding:20px;width:75%;vertical-align:middle">
        <a href="https://ieeexplore.ieee.org/document/8735549">
          <span class="papertitle">AutoPhase: Compiler Phase-Ordering for HLS with Deep Reinforcement Learning</span>
        </a>
        <br>
        <strong>Ameer Haj-Ali</strong>, Qijing Huang, William Moses, John Xiang, Ion Stoica, Krste Asanovic, John Wawrzynek
        <br>
        <em>FCCM, 2019</em>.
        <br>
        <a href="https://ieeexplore.ieee.org/document/8735549">paper</a>
        /
        <a href="https://arxiv.org/abs/1901.04615">arXiv</a>
        /
        <a href="https://github.com/ucb-bar/autophase">code</a>
        /
        <a href="https://www.youtube.com/watch?v=bl1J1gsGAcw&list=PLTPaZLQlNIHqLyiLUZe8Vrk1EwPfAHPVJ&index=37">video</a>
        <p>This paper evaluates a deep reinforcement learning framework implemented in the LLVM compiler to optimize the order of optimization passes for high-level synthesis, achieving a significant enhancement in circuit performance and markedly faster results compared to state-of-the-art phase-ordering algorithms. </p>
      </td>
    </tr>
    
    <tr>
      <td style="padding:20px;width:25%;vertical-align:middle">
        <div class="zero">
          <div>
          <img src='images/pim-image-processing.png' width=100%>
        </div>
      </td>
      <td style="padding:20px;width:75%;vertical-align:middle">
        <a href="https://link.springer.com/chapter/10.1007/978-981-13-8379-3_8">
          <span class="papertitle">Memristor-Based Processing-in-Memory and Its Application On Image Processing</span>
        </a>
        <br>
        <strong>Ameer Haj-Ali</strong>, Ronny Ronen, Rotem Ben-Hur, Nimrod Wald, Shahar Kvatinsky
        <br>
        <em>Elsevier, 2020</em>.
        <br>
        <a href="https://link.springer.com/chapter/10.1007/978-981-13-8379-3_8">chapter</a>
        <p>This chapter overviews memristor-based logic techniques in in-memory computing (IMC), exemplified through a case study on Memristor Aided loGIC (MAGIC) in a memristive Memory Processing Unit (mMPU), demonstrating enhanced performance and energy efficiency in image processing tasks compared to other advanced memristive logic systems.
        </p>
      </td>
    </tr>
    
    <tr>
      <td style="padding:20px;width:25%;vertical-align:middle">
        <div class="zero">
          <div>
          <img src='images/mmpu.png' width=100%>
        </div>
      </td>
      <td style="padding:20px;width:75%;vertical-align:middle">
        <a href="https://link.springer.com/chapter/10.1007/978-981-13-8379-3_8">
          <span class="papertitle">mMPU - a Real Processing-in-Memory Architecture to Combat the von Neumann Bottleneck</span>
        </a>
        <br>
        Nishil Talati, Rotem Ben-Hur, Nimrod Wald, <strong>Ameer Haj-Ali</strong>, John Reuben, Shahar Kvatinsky
        <br>
        <em>Springer, 2020</em>.
        <br>
        <a href="https://link.springer.com/chapter/10.1007/978-981-13-8379-3_8">chapter</a>
        <p>
          This chapter introduces the memristive Memory Processing Unit (mMPU), which integrates computation within memory cells using Memristor Aided loGIC (MAGIC) to address the von Neumann bottleneck, detailing the system's architecture and demonstrating how MAGIC can execute arbitrary Boolean functions for processing-in-memory applications.
        </p>
      </td>
    </tr>
    
    <tr>
      <td style="padding:20px;width:25%;vertical-align:middle">
        <div class="zero">
          <div>
          <img src='images/simpler-magic.png' width=100%>
        </div>
      </td>
      <td style="padding:20px;width:75%;vertical-align:middle">
        <a href="https://ieeexplore.ieee.org/document/8781866">
          <span class="papertitle">SIMPLER MAGIC: Synthesis and Mapping of In-Memory Logic Executed in a Single Row to Improve Throughput</span>
        </a>
        <br>
        Rotem Ben-Hur, Ronny Ronen, <strong>Ameer Haj-Ali</strong>, Debjyoti Bhattacharjee, Adi Eliahu, Natan Peled, Shahar Kvatinsky
        <br>
        <em>TCAD, 2019</em>.
        <br>
        <a href="https://ieeexplore.ieee.org/document/8781866">paper</a>
        <p>
          This article introduces SIMPLER, an automatic framework that optimizes the execution of arbitrary combinational logic functions within a memristive memory using graph theory, logic design, and compiler technology, achieving substantial improvements in throughput, area efficiency, and parallel processing capabilities for in-memory computing.
        </p>
      </td>
    </tr>
    
    <tr>
      <td style="padding:20px;width:25%;vertical-align:middle">
        <div class="one">
          <div>
          <img src='images/memristor-synapse.png' width=100%>
        </div>
      </td>
      <td style="padding:20px;width:75%;vertical-align:middle">
        <a href="https://ieeexplore.ieee.org/document/8600725">
          <span class="papertitle">Supporting the Momentum Training Algorithm Using a Memristor-Based Synapse</span>
        </a>
        <br>
        Tzofnat Greenberg-Toledo, Roee Mazor, <strong>Ameer Haj-Ali</strong>, Shahar Kvatinsky
        <br>
        <em>TCAS-I, 2019</em>.
        <br>
        <a href="https://ieeexplore.ieee.org/document/8600725">paper</a>
        <p>
          This paper introduces a memristor-based synapse that enhances deep neural network (DNN) training by supporting the momentum algorithm, proposing two design approaches to improve the convergence and efficiency of training, with simulations showing significant speedups and energy reductions compared to GPU platforms.
        </p>
      </td>
    </tr>
    
    <tr>
      <td style="padding:20px;width:25%;vertical-align:middle">
        <div class="zero">
          <div>
          <img src='images/in-memory-processing.png' width=100%>
        </div>
      </td>
      <td style="padding:20px;width:75%;vertical-align:middle">
        <a href="https://www.computer.org/csdl/magazine/mi/2018/05/mmi2018050013/13WBGLOPCjC">
          <span class="papertitle">Not in Name Alone: a Memristive Memory Processing Unit for Real In-Memory Processing</span>
        </a>
        <br>
        <strong>Ameer Haj-Ali</strong>, Rotem Ben-Hur, Nimrod Wald, Ronny Ronen, Shahar Kvatinsky
        <br>
        <em>IEEE Micro, 2018</em>.
        <br>
        <a href="https://www.computer.org/csdl/magazine/mi/2018/05/mmi2018050013/13WBGLOPCjC">paper</a>
        <p>
          This paper presents the memristive Memory Processing Unit (mMPU), a processing-in-memory system that eliminates data transfer by performing computation directly within memory cells, leveraging its inherent parallelism to provide high throughput and energy efficiency for SIMD-based data-intensive applications.
        </p>
      </td>
    </tr>
    
    <tr>
      <td style="padding:20px;width:25%;vertical-align:middle">
        <div class="one">
          <div>
          <img src='images/imaging.png' width=100%>
        </div>
      </td>
      <td style="padding:20px;width:75%;vertical-align:middle">
        <a href="https://ieeexplore.ieee.org/document/8398398">
          <span class="papertitle">IMAGING: In-Memory AlGorithms for Image processiNG</span>
        </a>
        <br>
        <strong>Ameer Haj-Ali</strong>, Rotem Ben-Hur, Nimrod Wald, Ronny Ronen, Shahar Kvatinsky
        <br>
        <em>TCAS-I, 2018</em>.
        <br>
        <a href="https://ieeexplore.ieee.org/document/8398398">paper</a>
        <p>
          This paper proposes four in-memory algorithms for fixed-point multiplication using MAGIC gates, implemented within memristor-based memory cells to enhance latency, throughput, and area efficiency, enabling effective execution of complex operations like image convolution and optimized parallel processing in data-intensive applications.
        </p>
      </td>
    </tr>
    
    <tr>
      <td style="padding:20px;width:25%;vertical-align:middle">
        <div class="one">
          <div>
          <img src='images/fixed-point-multiplication.png' width=100%>
        </div>
      </td>
      <td style="padding:20px;width:75%;vertical-align:middle">
        <a href="https://ieeexplore.ieee.org/document/8351561">
          <span class="papertitle">Efficient Algorithms for In-memory Fixed Point Multiplication Using MAGIC</span>
        </a>
        <br>
        <strong>Ameer Haj-Ali</strong>, Rotem Ben-Hur, Nimrod Wald, Shahar Kvatinsky
        <br>
        <em>ISCAS, 2018</em>.
        <br>
        <a href="https://ieeexplore.ieee.org/document/8351561">paper</a>
        <p>
          This paper introduces algorithms for performing fixed-point multiplication within memristive memory cells using Memristor Aided Logic (MAGIC) gates, achieving a 1.8× improvement in latency and enhanced area efficiency that enables simultaneous executions, addressing the computational constraints of previous implementations.
        </p>
      </td>
    </tr>
    
    <tr>
      <td style="padding:20px;width:25%;vertical-align:middle">
        <div class="zero">
          <div>
          <img src='images/pim-challenges.png' width=100%>
        </div>
      </td>
      <td style="padding:20px;width:75%;vertical-align:middle">
        <a href="https://ieeexplore.ieee.org/document/8342275">
          <span class="papertitle">Practical Challenges in Delivering the Promises of Real Processing-in-Memory Machines</span>
        </a>
        <br>
        Nishil Talati, <strong>Ameer Haj-Ali</strong>, Rotem Ben-Hur, Nimrod Wald, Ronny Ronen, Pierre-Emmanuel Gaillardon, Shahar Kvatinsky
        <br>
        <em>DATE, 2018</em>.
        <br>
        <a href="https://ieeexplore.ieee.org/document/8342275">paper</a>
        <p>
          This paper evaluates the memristive Memory Processing Unit (mMPU) as a Processing-in-Memory (PiM) machine, analyzing its limitations in parallelism and internal data transfer, and demonstrates that these factors can increase execution times significantly, despite strategies to manage data movement within the device itself.
        </p>
      </td>
    </tr>
    
    <tr>
      <td style="padding:20px;width:25%;vertical-align:middle">
        <div class="zero">
          <div>
          <img src='images/memristive-logic.png' width=100%>
        </div>
      </td>
      <td style="padding:20px;width:75%;vertical-align:middle">
        <a href="https://ieeexplore.ieee.org/document/8106959">
          <span class="papertitle">Memristive Logic: A Framework for Evaluation and Comparison</span>
        </a>
        <br>
        John Reuben, Rotem Ben-Hur, Nimrod Wald, Nishil Talati, <strong>Ameer Haj-Ali</strong>, Pierre-Emmanuel Gaillardon, Shahar Kvatinsky
        <br>
        <em>PATMOS, 2017</em>.
        <br>
        <a href="https://ieeexplore.ieee.org/document/8106959">paper</a>
        <p>
          This paper introduces a framework for comparing memristive logic families by evaluating their statefulness, proximity to memory arrays, and computational flexibility, providing metrics for performance, energy efficiency, and area, and offering guidelines for a comprehensive assessment to facilitate the development of new logic families.
        </p>
      </td>
    </tr>
    
    <tr>
      <td style="padding:20px;width:25%;vertical-align:middle">
        <div class="zero">
          <div>
          <img src='images/memristive-taxonomy.png' width=100%>
        </div>
      </td>
      <td style="padding:20px;width:75%;vertical-align:middle">
        <a href="https://link.springer.com/chapter/10.1007/978-3-319-76375-0_37">
          <span class="papertitle">A Taxonomy and Evaluation Framework for Memristive Logic</span>
        </a>
        <br>
        John Reuben, Rotem Ben-Hur, Nimrod Wald, Nishil Talati, <strong>Ameer Haj-Ali</strong>, Pierre-Emmanuel Gaillardon, Shahar Kvatinsky
        <br>
        <em>Springer, 2017</em>.
        <br>
        <a href="https://link.springer.com/chapter/10.1007/978-3-319-76375-0_37">chapter</a>
        <p>
          This chapter outlines a framework for evaluating memristive logic families based on their statefulness, proximity to memory, and computational flexibility, using metrics for latency, energy efficiency, and area, and includes a case study on eight-bit addition to demonstrate the methodology and assess the potential for large-scale data computation.
        </p>
      </td>
    </tr>
    

          </tbody></table>
          <table width="100%" align="center" border="0" cellspacing="0" cellpadding="20"><tbody>
            <tr>
              <td>
                <h1>Miscellanea</h1>
              </td>
            </tr>
          </tbody></table>
          <table width="100%" align="center" border="0" cellpadding="20"><tbody>
            
            <tr>
              <td style="padding:20px;width:25%;vertical-align:middle">
                <div class="one">
                <img src="images/neurips.png", width=100%>
                  </div>
              </td>
              <td width="75%" valign="center">
                <b>Area Chair</b>: NeurIPS 2024, NeurIPS 2023, NeurIPS 2022
                <br>
                <b>Conference & Journal Referee</b>: NeurIPS 2019, HPCA 2018, DATE 2018, VLSI-SoC 2018, ISCAS 2017, ISCAS 2016, CNNA 2016, TCAS-I, TCAS-II, TVLSI, Microelectronics Journal
              </td>
            </tr>
            <tr>
              <td style="padding:20px;width:25%;vertical-align:middle">
                <div class="one">
                <img src="images/gsi.png", width=100%>
                  </div>
              </td>
              <td width="75%" valign="center">
                <b>Academia and Teaching</b>
                <br>
                Graduate PhD Admissions Committee
                <br>
                DARE (Diversifying Access to Research in Engineering) Admissions Committee.
                <br>
                Undergraduate project committee
                <br>
                Graduate Student Instructor (GSI), <a href="https://people.eecs.berkeley.edu/~jrs/189/">Introduction to Machine Learning (CS 189/289A)</a>
                <br>
                Head TA, Circuit Theory (700+ students, 044105).
                <br>
                Head TA, Electronic Switching Circuits (300+ students, 044147).
                <br>
                Supervisor of B.Sc. projects, VLSI Lab and Parallel Systems Lab (044167).
                <br>
                TA, MATLAB (044147).
              </td>
            </tr>
            <tr>
              <td style="padding:20px;width:25%;vertical-align:middle">
                <div class="zero">
                <img src="images/advisor.jpeg", width=100%>
                  </div>
              </td>
              <td width="75%" valign="center">
                <b>Advised Students</b>
                <br>
                <a href="https://www.linkedin.com/in/ruochen99/">Chloe Liu</a> (First employment: graduate student at Stanford).
                <br>
                <a href="https://www.linkedin.com/in/ian-galbraith-73b052125/">Ian Galbraith</a> (First employment: software engineer at Twilio).
                <br>
                <a href="https://dfangshuo.github.io/">Fang Shuo Deng</a> (First employment: software engineer at Abnormal Security).
                <br>
                <a href="http://linkedin.com/in/stav-belogolovsky-400249134">Stav Belogolovsky</a> (First employment: Test and DFT Engineer at Arbe).                
                <br>
                <a href="https://www.linkedin.com/in/amnon-wahle">Amnon Wahle</a> (First employment: Algorithm Research at BeyondMinds).
              </td>
            </tr>
            <tr>
              <td style="padding:20px;width:25%;vertical-align:middle">
                <div class="one">
                <img src="images/awards.webp", width=100%, height="100%">
                  </div>
              </td>
              <td width="75%" valign="center">
                <b>Awards and Fellowships</b>
                <br>
                The person of the year in my home city (45,000 residents), Shefaraam, 2022.
                <br>
                Granted the EB1 + Green Card (Einstein Visa for Extraordinary Ability), USA, 2021.
                <br>
                Granted the O1 extraordinary ability Visa, USA, 2020.
                <br>
                The Valedictorian Honor (M.Sc.), Technion, 2019.
                <br>
                Open Gateway Fellowship, UC Berkeley, 2018.
                <br>
                The William Oldham Fellowship, UC Berkeley, 2018.
                <br>
                The Valedictorian Honor (B.Sc.), Technion, 2017.
                <br>
                Dean's scholarship for excellent graduate students, Technion, 2016.
                <br>
                Full tuition scholarship for M.Sc. studies, Technion , 2016-2018.
                <br>
                The System Architecture Labs Cluster Prize for outstanding  undergraduate projects (received twice), Technion, 2016.
                <br>
                Excellence award from Apple for excellent scholastic achievements, Technion, 2016.
                <br>
                Member of the President's List of highest honors for excellent scholastic achievements in all undergraduate semesters (top 3%), Technion, 2013-2016.
                <br>
                Full tuition scholarship for B.Sc. studies, Technion, 2013-2016.
            </td>
            </tr>
            <tr>
              <td align="center" style="padding:20px;width:25%;vertical-align:middle">
                <div class="one">
                  <img src="images/blogs.png", width=100%, height="100%">
                    </div>
              </td>
              <td width="75%" valign="middle">
                <b>Blog Posts</b>
                <br>
                <a href="https://www.anyscale.com/blog/cloud-infrastructure-for-llm-and-generative-ai-applications">Cloud Infrastructure for LLM and Generative AI Applications</a>
                <br>
                <a href="https://www.anyscale.com/blog/anyscale-endpoints-fast-and-scalable-llm-apis">Anyscale Endpoints Preview: Fast, Cost-Efficient, and Scalable LLM APIs</a>
                <br>
                <a href="https://www.anyscale.com/blog/autoscaling-clusters-with-ray">Autoscaling clusters with Ray</a>
                <br>
                <a href="https://medium.com/distributed-computing-with-ray/easy-distributed-scikit-learn-training-with-ray-54ff8b643b33">Easy Distributed Scikit-Learn with Ray</a>
                <br>
                <a href="https://ameerhajali.medium.com/scale-ml-on-your-local-clusters-with-ray-2469c17bb8c9">Scale ML on Your Local Clusters with Ray</a>
              </td>
            </tr>
            
            
          </tbody></table>
          <table style="width:100%;border:0px;border-spacing:0px;border-collapse:separate;margin-right:auto;margin-left:auto;"><tbody>
            <tr>
              <td style="padding:0px">
                <br>
                <p style="text-align:right;font-size:small;">
                  <a href="http://jonbarron.info">Website template credits</a>.
                </p>
              </td>
            </tr>
          </tbody></table>
        </td>
      </tr>
    </table>
  </body>
</html>