RTL Workshop

This GitHub repository summarises the class work done on Physical Design for ASICs (VL508)

Day 0 : Open Source Labs installation

Day 1 : Introduction to Verilog RTL design and Synthesis

Day 2 : Introduction to timing libs, Efficient flop coding styles, Hierarchical vs. Flat

Day 3 : Combinational and Sequential optimizations

Day 4 : Introduction to GLS, blocking vs. non-blocking, Synthesis-Simulation mismatch

Day 5 : Overview of If, case,for loops, and for generating loops

Day 0

Following are the steps for the installation of necessary tools:

Yosys

Yosys is a framework for Verilog RTL synthesis. It currently has extensive Verilog-2005 support and provides a basic set of synthesis algorithms for various application domains. Selected features and typical applications:

Process almost any synthesizable Verilog-2005 design
Converting Verilog to BLIF / EDIF/ BTOR / SMT-LIB / simple RTL Verilog / etc.
Built-in formal methods for checking properties and equivalence
Mapping to ASIC standard cell libraries (in Liberty File Format)
Mapping to Xilinx 7-Series and Lattice iCE40 and ECP5 FPGAs
Foundation and/or front-end for custom flows

Steps to install Yosys:

$ git clone https://github.com/YosysHQ/yosys.git
$ cd yosys-master 
$ sudo apt install make (If make is not installed please install it) 
$ sudo apt-get install build-essential clang bison flex 
    libreadline-dev gawk tcl-dev libffi-dev git 
    graphviz xdot pkg-config python3 libboost-system-dev
    libboost-python-dev libboost-filesystem-dev zlib1g-dev
$ make config-gcc
$ make 
$ sudo make install

Image after Installation:

Icarus verilog

Icarus Verilog is an implementation of the Verilog hardware description language compiler that generates netlists in the desired format (EDIF). It supports the 1995, 2001 and 2005 versions of the standard, portions of SystemVerilog, and some extensions.
Icarus Verilog is available for Linux, FreeBSD, OpenSolaris, AIX, Microsoft Windows, and Mac OS X. Released under the GNU General Public License, Icarus Verilog is free software.

Step to install iverilog:

sudo apt-get install iverilog

Image after Installation:

GTKWave

GTKWave is a VCD waveform viewer based on the GTK library. This viewer supports VCD and LXT formats for signal dumps.
Waveform dumps are written by the Icarus Verilog runtime program vvp. The user uses $dumpfile and $dumpvars system tasks to enable waveform dumping, then the vvp runtime takes care of the rest. The output is written into the file specified by the $dumpfile system task. If the $dumpfile call is absent, the compiler will choose the file name dump.vcd or dump.lxt, depending on runtime flags. The example below dumps everything in and below the test module.

Steps to install GTKWave:

sudo apt update
sudo apt install gtkwave

Image after Installation:

OpenSTA

OpenSTA is a gate-level static timing verifier. As a stand-alone executable, it can be used to verify the timing of a design using standard file formats.

Verilog netlist
Liberty library
SDC timing constraints
SDF delay annotation
SPEF parasitics

OpenSTA uses a TCL command interpreter to read the design, specify timing constraints, and print timing reports.

Steps to install OpenSTA:

Went to the GitHub repo: https://github.com/The-OpenROAD-Project/OpenSTA
and did the process mentioned within (installed the prerequisites and installed OpenSTA with Cmake).

Image after installation:

Ngspice

Ngspice is an open-source electronic circuit simulator software that allows engineers, researchers, and hobbyists to simulate and analyze electronic circuits. It is a part of the Spice (Simulation Program with Integrated Circuit Emphasis) family of circuit simulation tools, which have been widely used since the 1970s.

Ngspice is an evolution of the well-known Spice3 program, incorporating additional features and improvements. It is compatible with various operating systems, including Windows, Linux, and macOS. The software is primarily used for simulating analog, digital, and mixed-signal circuits.

Steps to install Ngspice:

After downloading the tarball from https://sourceforge.net/projects/ngspice/files/ to a local directory, unpack it using:
$ tar -zxvf ngspice-40.tar.gz
$ cd ngspice-40
$ mkdir release
$ cd release
$ ../configure  --with-x --with-readline=yes --disable-debug
$ make
$ sudo make install

Image after installation:

magic

Magic is a popular open-source tool used for ASIC (Application-Specific Integrated Circuit) design and layout. It is part of the Electric VLSI Design System and provides capabilities for creating and editing integrated circuit layouts. Magic is widely used in the semiconductor industry and academic settings for various ASIC design tasks.

Steps to install magic:

$   sudo apt-get install m4
$   sudo apt-get install tcsh
$   sudo apt-get install csh
$   sudo apt-get install libx11-dev
$   sudo apt-get install tcl-dev tk-dev
$   sudo apt-get install libcairo2-dev
$   sudo apt-get install mesa-common-dev libglu1-mesa-dev
$   sudo apt-get install libncurses-dev
git clone https://github.com/RTimothyEdwards/magic
cd magic
./configure
make
make install

Image after installation:

OpenLANE

OpenLANE is an open-source ASIC (Application-Specific Integrated Circuit) design flow and methodology that aims to automate and standardize the process of designing and fabricating custom digital integrated circuits. It is developed and maintained by the OpenROAD (Open Research for Advanced Nanotechnologies) project, which is a collaboration of various academic and industrial organizations.

Key components and features of OpenLANE include:

RTL Synthesis: The flow starts with RTL synthesis, where the RTL code is converted into a gate-level representation using synthesis tools.
Floorplanning: OpenLANE performs automatic floorplanning, which involves arranging the logical blocks and components on the chip's physical layout.
Placement: It automatically places the gates and cells on the chip, optimizing for area, power, and performance.
Clock Tree Synthesis (CTS): OpenLANE generates a clock tree to efficiently distribute the clock signal across the chip.
Routing: The tool performs automatic routing to connect all the elements on the chip while adhering to design rules and constraints.
Static Timing Analysis (STA): OpenLANE performs static timing analysis to verify that the design meets the required timing specifications.
Design Rule Check (DRC) and Layout versus Schematic (LVS) verification: OpenLANE checks the physical layout against manufacturing rules (DRC) and compares the layout to the original schematic (LVS) to ensure consistency.
Configuration and customization: OpenLANE allows users to configure various aspects of the design flow and customize different steps based on specific design requirements.

Steps to install OpenLANE:

sudo apt-get update
sudo apt-get upgrade
sudo apt install -y build-essential python3 python3-venv python3-pip make git

sudo apt install apt-transport-https ca-certificates curl software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg

echo "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

sudo apt update

sudo apt install docker-ce docker-ce-cli containerd.io

sudo docker run hello-world

sudo groupadd docker
sudo usermod -aG docker $USER
sudo reboot 

# After reboot
docker run hello-world

Image after installation:

Day 1

Introduction

This section mainly focuses on Iverilog,GTKwave, and Yosys. The simulation and synthesis of a basic 2x1 mux is also done.

A simulator refers to a software tool or program that simulates the behavior of the digital design described at the RTL level. It allows designers to test and verify the functionality of their digital designs before actual hardware is fabricated. Simulators take the RTL description and execute it in a software environment, allowing the designer to observe how the design behaves under different conditions and inputs. The simulator looks for changes in the input.Upon change inn the input the output is evaluated. If no change in input is observed, there will be no change in output. Icarus Verilog is an open-source RTL simulator that supports Verilog. It's widely used in academia and smaller projects due to its free and open nature.

A test bench is a set of simulation codes and associated data that is used to verify the correctness and functionality of a digital design described at the Register Transfer Level (RTL) or other abstraction levels. It serves as a virtual environment in which the design can be tested before it's physically implemented in hardware.The design may have more than one input and output, while the Test bench doesn't a primary input or a primary output.

The Iverilog-based simulation flow is that of below:

After Simulation Synthesis is required. For this, we are using a tool called Yosys, which will give us a netlist, which is a representation of the design in standard cells. There are certain commands like read_verilog, read_liberty, and write_verilog used for the synthesis process. After Synthesis verification of the netlist is also done.

A basic synthesis flow is as shown below:

( .lib is explained in the 'Other Relevant Data' section)

The set of primary inputs or primary outputs will remain the same in both RTL design and netlist,i.e. The testbench used for simulation and verification is same.

Verilog codes

We are simulating a simple 2x1 mux using iverilog and GTKwave, the codes have been taken from the GitHub repo:
https://github.com/kunalg123/sky130RTLDesignAndSynthesisWorkshop.git

The above git has been cloned and saved in the local system as shown below.

Simulation: iverilog and GTKwave

The below Linux shell commands are typed into the terminal to get execute the mux design file and the test bench. A vcd(value change dump) file is generated and that is opened using GTKwave as shown below.

iverilog good_mux.v tb_good_mux.v
./a.out
gtkwave tb_good_mux.vcd

Below are the Shell commands screenshot for the execution of both .v files (design and test bench):

Below is the GTKwave output for the same:

Synthesis: Yosys

Here we are Synthesizing a basic 2x1 mux which we have simulated in iverilog and GTKwave as shown in the above sections.
In the directory, we need to input the shell terminal command yosys for synthesis below shown are the commands used:

yosys> read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib 
yosys> read_verilog good_mux.v
yosys> synth -top good_mux
yosys> abc -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib 
yosys> show

To generate the netlist and view the 'netlist.v' file following commands are used:

yosys> write_verilog -noattr good_mux_netlist.v
yosys> !gvim good_mux_netlist.v

The Screenshot below shows how commands read_liberty and read_verilog are done:

The Screenshot below is of the syth -top <name.v> command:

The Screenshot below shows how the command abc -liberty is done:

The Screenshot below shows how the show command is done:

The Figure below is the generated synthesized design:

The Screenshot below shows the 'write_verilog -noattr<'name of netlist'>' command and the .v file:

Other Relevant data

RTL Design:
RTL stands for "Register Transfer Level," and in the context of digital hardware design, RTL design refers to the process of describing the behavior of a digital circuit or system using a hardware description language (HDL) at the register transfer level. It's a crucial step in designing complex digital systems such as microprocessors, application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), and more.

In RTL design, the designer specifies the functionality and behavior of the digital system using a high-level hardware description language like Verilog or VHDL. This description focuses on the flow of data between registers and the operations that take place on that data.

Synthesis:
RTL design is the process of transforming a high-level functional description of a digital system into a gate-level netlist that can be physically implemented on hardware platforms. This process involves mapping the logic to standard cells, optimizing for performance, and ensuring timing requirements are met. A Design is converted into gates and the connections are made between those gates, the final output file is what is termed as a netlist.

What is .lib ?

.lib is a collection of various logical modules
It includes basic gates like and, or etc...
There are different flavours(versions) of the same gate
- Slow
- Medium
- Fast

We need different flavors of gates because combinational delays in logical paths will determine the maximum speed of operation of digital logic circuits.

Why do we need different flavors of gates?

Different flavors of gates are necessary to provide a diverse toolkit for designing and implementing electronic circuits. They cater to various logical functions, optimization requirements, noise considerations, and implementation constraints, enabling the creation of complex and efficient systems.
Combinational delays in the logic path determine the max speed of operation of a digital logic circuit.

Based on the figure shown above, Tclk Tcq_a,Tcombi,Tsetup_b are the time period of the clock,propagation delay of A, Combinational delay, setup time of B respectively.
Tclk > Tcq_a + Tcombi + Tsetup_b
one clock pulse should be long enough for the delay of the 'A'-D.FF,combinational delay and setup time for 'B'-D.FF to be incorporated.
Tsetup_b is the time required for the the 'B'-D.FF data to be stable.

There is also a need for slow cells. The question of why we need them arises.

To ensure there are no 'HOLD' issues at B-D.FF, we need certain cells to work slowly
We need cells that work fast to meet the required performance and we need cells that work slow to meet HOLD.

Faster Cells Vs. Slower Cells:

A load in digital logic is a capacitor
A faster charging or discharging means less delay
To increase the rate of charging or discharging we need to widen the transistors.
Wider transistor gives lower delay: but more is required and more power is required
Narrow transistors give out more delay : we need less area and less power is consumed.

Day 2

Introduction

In this section, we will briefly go about understanding a bit more about the .lib file and other stuff.

Overview on .lib

Firstly lets open the sky130_fd_sc_hd__tt_025C_1v80.lib using the Vim editer.

gvim ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib

The nomenclature of the above .lib file is :

sky - skywater
130 - 130 nanometer(nm)
tt - typical library
025C - Temperature
1v80 - Voltage

When we look into a library 'Process Voltage Temparature' is relevant for a design to work.

Process is important because of variations in the fabrication.
Voltage is important because there will be variations in circuit behaviour due to the same.
Semiconductors are very dependent on temperature and we would need the design to work in a wide range of geographies having different temperatures.

We need to factor in all these conditions when designing and so our libraries will also model these specifications.

Below figure shows the the library sky130_fd_sc_hd__tt_025C_1v80.lib on Vim edior:

The Below figure shows both the library sky130_fd_sc_hd__tt_025C_1v80.lib and the .v file sky130_fd_sc_hd.v which consists of the design of any given cell in the above-mentioned library:

The Below side by side figure shows the details of different flavours of a 2 input and gate:
Here it is seen that the area of all three are different.On Day 1 we discussed the effect of the area in efficiency and delay etc..

Below are some of the Vim commands used:

:syn off "turn off highlighting
:se hls  "highlight cell
:se nu   "see line numbers
:g//     "see all the cells('highlighted ones')
:sp <directory>    "open a file with a directory along with 
:vsp     "opens the same file again side by side

Hierarchical Vs. Flat Synthesis

Hierarchical Synthesis:

In hierarchical synthesis, the design of a complex digital circuit is divided into smaller, more manageable modules or blocks. Each module represents a functional unit or a specific sub-task the overall design. These modules are designed and optimized separately, and then they are integrated into the larger system. The design hierarchy can have multiple levels, with modules containing sub- modules and so on.

Flat Synthesis:

In flat synthesis, the entire digital circuit is synthesized as a single monolithic unit, without breaking it down into smaller modules. This approach is suitable for smaller designs where the complexity doesn't warrant a hierarchical organization.

In this section, we will synthesize the same design in both Hierarchical and Flat to illustrate the difference in the netlist of both.

Hierarchical illustration
(The Figure below shows the schematic diagram of a design named multiple_module.v:)

It is seen that everything is divided into smaller submodules.

(The figure below is the netlist for the hierarchical design:)

Flat illustration
(The figure below shows the flattened-out netlist for the flattened design:)

We write the 'flatten' command just before the write_verilog command to flatten the netlist.

(The figure below shows the schematic diagram of the design).

Submodule Level Synthesis
Why is this done?

When we have multiple instantiations of the same module we prefer the submodule level by synthesis.
We might also want to use the divide and concur procedure, divide up the circuit to get the best possible design at the top level.

We need to make a small change in the synth command in yosys:

synth -top <sub_module_name>

(Illustrated below:)

(The design diagram for the same is shown below:)

Various Flops and Flop coding styles

Here we are going to look at some questions such as the ones below:

How to code a flop.
What are the flops that are present
What are the coding standards for it.

Why do we need to use flops?

Consider the logic diagram given below consisting of an and gate and or gate.
there exists a propagation delay, and due to this the output glitches. This is a serious issue as the number of combinational circuits increases the number of glitches also increases.

(In the figure below the glitch caused in the above logic diagram is illustrated in the blue shaded area:)

Like mentioned above, more combinational circuits mean more glitches so to avoid glitches, we need to store the data, for that we use flops.
(The figure below illustrates the above problem and solution:)

D-ff's give output only at the posedge of the clk. So, the next combinational circuit (block) will see only a stable input.

How do I code the Flop?

Below are the three different ways in which we can code the flop.

Synchronous & Asynchronous reset
Syncranous reset
Asynchronous reset

Lab flop synthesis simulations

Here we are going to simulate D-Flip flops with Asynchronous reset & set, Synchronous, and Synchronous & Asynchronous reset with Iverilog and GTKwave.

Asynchronous reset:

Here we are going to be using a .v file 'dff_asyncres.v' and its corresponding testbench. Run it on verilog and simulate it on GTKwave ash shown below.

Below we can see the output waveform of the design.
In this case, at around the 550ns range, we see that the output q follows the clk. i.e. q is synchronous with the clock.

If we consider this point around the 1090ns-1100ns range. when the async_reset is high the output 'q' will immediately go low. This is called asynchronous reset. As illustrated below.

Asynchronous set:

Here we do the same. The simulation output waveforms are shown below.
In the below waveform, in between the ranges of 500ns to 600 ns the async_set is low which makes the output looks for changes in 'd' upon the clock.

In the following waveform, when the async_set is high the output will be set high and will not follow the 'd' input.

Synchronous reset:

The steps for simulation are the same except here we use the dff_syncres.v file and its corresponding test bench.
In the below waveform, we can see between the 500ns-600ns range, when the sync_reset is high, the output follows the clock.
As shown below:

Synthesis of the above three designs:

Synthesis diagram for Asynchronous reset:

Synthesis diagram for Asynchronous set:

Synthesis diagram for Synchronous reset:

During synthesis, after the synth -top command in Yosys, we should use the following command to map DFF cells to sequential cells.

dfflibmap -liberty ..<directory of the .lib file>

Optimizations

This section deals with some special cases. Particularly two peculiar .v files.
let's open them in the Vim editor using the following Shell command:

gvim mult_*.v -o

Here we are opening two files mult_2.v and mult_8.v.

Let us consider the first one 'mult_2.v' :
The below figure shows the mult_2.v file.

The block diagram below explains the basic functionality of the design:

But as being a special case there must be a twist to it.
Apparently, there is no need for any extra hardware components. In the below figure, we can see the input 'a' and output 'y'.
(The output y is basically zero appended to 'a' {a,1'b0}. It is illustrated below.)

(In the below screenshot, we can see there are no hardware components required.)

(The below diagram shows the schematic diagram for the same:)

Let us consider the second one 'mult_8.v' :
(The below figure shows the mult_8.v file)

Here we are doing ax9=y, which can be rewritten as {ax(8+1)=y}
ax9 = {a,0,0,0} + a ----> {a,a}

(In the below screenshot, we can see there are no hardware components required.)

(The below diagram shows the schematic diagram for the same:)

Day 3

Introduction to optimizations

Here we are going to be looking at logic optimizations.

There are two kinds of logic optimizations:

Combinational Logic optimizations
Sequential Logic optimizations

Let us look into those.

Combinational logic optimization:

It is done to get the most optimized design
The most optimized design will be very efficient in both its area and power characteristics.

Below are the two techniques used for the same:
1. Constant Propagation
2. Boolean logic optimizations
Constant propagation

Let's consider Fig: A having an output Y. When deriving that circuit using MOS transistors we will need six MOSFETS.
if we consider input a = low. The total logic circuit will reduce to Fig: B. And has only a requirement of one inverter i.e 2 MOSFETS.

Boolean logic optimizations

In case of this, the synthesizer uses either KMAPS or Quinse McCluskey methord to find the most optimized logic.
Let us consider the image below:

Here we are implementing y = a?(b?c:(c?a:0)):(!c).Which is not optimized
The general output will be y = a'.b' + a.[ b.c + b'.a.c ]. In simplifying this we will get ~( a ^ b ).
The Synthesis tool does these kinds of optimizations to get the most optimized logic.

Sequential logic optimizations:

There are two types mostly:

Basic : (Sequential constant)
Advanced : (State optimization, Retiming, Cloning)

Sequential Constant

Consider the above figure Fig: A.
if there is a reset q =0, if there is no reset 'q' is again 0 since it follows 'd' and d=0.
And so it propagates y=1 always for this case. Effectively we don't have a need for the logic gates in the figure.

Now in another case in the figure below:

When the set is applied q=1 and when set in not applied q=0.
It can be explained through the timing diagram Fig: C. q will wait till the next posedge of the clock to go down. There will be a slack for q.

State optimization

State optimization in ASIC design is about finding the best trade-offs among performance, power efficiency, area utilization, and other design objectives to create an effective and efficient custom integrated circuit for a particular application.

Re-timing

It is a technique used to optimize the timing performance of a digital circuit by moving registers (flip-flops) to different locations within the circuit
without changing its functionality. The primary goal of retiming is to improve the critical path delay, which is the longest path through the logic circuit that determines the maximum operating frequency.

Sequential logic cloning

Also known as flip-flop cloning or state machine cloning, is a technique used to replicate or duplicate certain portions of sequential logic circuits. This technique is employed to improve performance, reduce critical path delays, or optimize power consumption in a design without altering its functional behavior.

Lab Combination logic optimizations

Here we will be doing the labs that illustrate combinational logic optimizations.
We will also be using a Yosys command to purge all unused cells:

opt_clean -purge

LAB 1:

In the above code, if we look at it. It is effectively a 2x1 mux which can be simplified to a 2 input and gate.
So, by doing the opt_clean -purge command we can purge unnecessary cells to make it optimized.

The Schematic diagram is shown below and as expected we have a 2 input and gate.

LAB 2:

Here we are performing the synthesis of opt_check2.v. it is done the same way as LAB 1
We get an optimized design of a 2-input or gate.
Relevent Screenshots are attached below.

LAB 3:

Here we are performing the synthesis of opt_check3.v. it is done the same way as the above labs
Relevent Screenshots are attached below.

LAB 4:

Here we are performing the synthesis of opt_check4.v. it is done the same way as the above labs
Relevent Screenshots are attached below.

LAB 5:

Here we are performing the synthesis of multiple_modules_opt.v. it is done the same way as before but here we have to flatten the design.
Relevent Screenshots are attached below.

Lab Sequential logic optimizations

LAB 1:

Here we are going to simulate and synthesize two .v files,'dff_const1.v' and 'dff_const2.v'.
Below are the .v files of the above-mentioned:

The simulations of the same are shown below:

The optimized synthesized diagram of dff_const1.v is shown below and is as expected.

The optimized synthesized diagram of dff_const2.v is shown below.
Here as per the simulation, we saw regardless of input and reset the output is always high.

LAB 2:

Here we are going to simulate and synthesize dff_const3.v .

LAB 3:

Here we are going to simulate and synthesize dff_const4.v .

LAB 4:

Here we are going to simulate and synthesize dff_const5.v .

Sequential optimization for unused outputs

This is a very important optimization technique which can be illustrated by the example below:

First, we are going to synthesize ' counter_opt.v ' and see the synthesized design diagram.

The two states count[2] and count[1] are unused.
The synthesizer automatically optimizes the design to make it like the below, only using one FF instead of three.

If we were using count[2] and count[1] also in the above code:

The synthesizer would use three FF's as shown below:

This optimization is so important as illustrated because it saves a ton of space, and speed, and improves efficiency in general.

Day 4

Gate Level Simulations (GLS) and Synthesis level mismatch

What is GLS?

It is basically running the testbench with netlist as Design under Test (DUT).
Netlist is logically the same as that of RTL code so the same testbench will fit.

Why do we use GLS?

To verify logical correctness after synthesis
To ensure the timing of the design is met: for this, GLS needs to be run with delay annotation.

GLS using verilog is as illustrated in the picture below:

If gate-level models are delay annotated then we can use GLS for timing validation.

Synthesis Simulation mismatch

Synthesis simulation mismatch refers to a discrepancy or misalignment between the expected behavior of a system or device, as predicted by a simulation or modeling process, and the actual behavior observed in the physical implementation or real-world operation of that system or device. This term is often used in fields such as electronics, engineering, and computer science, where simulations are employed to model the behavior of complex systems before they are physically constructed or deployed.

Synthesis simulation mismatch can lead to unexpected problems, performance degradation, or failure of the designed system. Engineers and designers often work to minimize these mismatches by refining simulation models, improving manufacturing processes, and conducting thorough testing and validation of designs.

There are mainly three ways mismatches occur:

Missing sensitivity list
Blocking vs Non-Blocking assignments
nonstandard verilog codes

Missing sensitivity list

Let us remember that a simulator checks for changes in activity, and look at the code shown below.

always @(sel)
begin
if (sel)
 out = i1;
else
 out = i0;
end

In the above code the always block only checks for 'sel' changes hence we don't get the exact required output.
To resolve this we should use:

always @(*)

Here the always block will get evaluated for any signal change. Hence, we will get the expected output.

Blocking & Non-Blocking assignments

Assignments happen inside the always block.

Blocking:

The '=' sign is used to represent blocking assignments
It executes the statements in the order it is written.

Non-Blocking:

The '<=' sign is used to represent non-blocking.
This executes all the RHS when always block is executed and assigned to LHS.
Parallel evaluation is being occurred here.

Caveats with blocking

Let's consider the below codes:

code 1:
if (reset)
  begin
    q0=1'b0;
    q =1'b0;
  end
else
  begin
    q=q0;
    q0=d;
  end

code 2:
if (reset)
  begin
    q0=1'b0;
    q =1'b0;
  end
else
  begin
    q0=d;
    q=q0;
  end

Here you can see there is not much difference between code1 and code2 except we are interchanging the positions of assignments in the else condition of code2.
we will get a drastic mismatch because of this as illustrated by the figure below.

The mismatch is very much evident here and for this reason, we must use non-blocking codes. Which will give no mismatch.
The keynote is we always use non-blocking for writing sequential circuits.

Let us consider another example, a combinational circuit this time.

code 1:
always @(*)
begin
  y  =  q0 & c;
  q0 =  a  | b;
end

code 2:
always @(*)
begin
  q0 =  a  | b;
  y  =  q0 & c;
end

For code 1: The old q0 value is used in the second statement.
For code 2: The new q0 value is used in the second statement.

The funny thing here is that both the circuits after simulation will be the same but the synthesized circuits will be different.

Due to all these issues, it is very paramount to check for synthesis & simulation mismatches. So for that, we use GLS

Lab GLS & mismatch

LAB 1

The below-given file is the .v file that we have to simulate and synthesize:

First, we are going to simulate the file with iverilog and GTKwave using the testbench.

Then we are going to synthesize and create a netlist file for the same.

Then we are going to simulate it again with the newly created netlist file, the Verilog models, and the testbench using iverilog and GTKwave.

We will get the following waveform in the GTKwave which matches our previous waveform.

LAB 2

The below-given file is the .v file that we have to simulate and synthesize:

We will get a waveform like this which is not matching a 2x1 mux waveform(it is seen as incorrect):

On synthesizing it, it is seen as a normal mux. We create a netlist for it also.
We will see a stark difference in the pre and post-synthesis waveforms. This is the Synthesis-Simulation mismatch.

Labs Blocking and Non-Blocking

LAB 1

We are going to see the Synthesis-Simulation mismatch caused by blocking statement.
The below-given file is the .v file that we have to simulate and synthesize:

We will get a waveform like this which is not matching the design d = ((a|b)&c).

On synthesizing it, we will get the required design diagram. A netlist is also created for the same.

We will see a stark difference in the pre and post-synthesis waveforms. This is the Synthesis-Simulation mismatch caused by blocking statement.

Day 5

If case constructs

Here we are going to discuss If else statements, case statements and the effect of them.
if statements are going to be of the below syntax.

if <condition 1>
  statements
else if <condition 1>
  statements
else
  statements

The equivalent logic diagram is:

There is an issue with if statements:

It can cause inferred latches.
Inferred latches are caused because of unknown cases. eg: if we forget to put the else condition.
We can say Inferred latches are due to bad coding styles.

Exceptional cases

There are some exceptional cases to the above-mentioned.
for example, In the case of counters, we can avoid the use of the else condition.
Let us consider the code below for a counter.

always @(posedge clk, posedge reset)
begin
 if (reset)
   count <=3'b000;
else if(en)
   count <=count + 1;
end

The above code will result in a latch which we will need for the counter to function properly.
if no enable is set the count should latch to the previous value.

Note 1: Combinational circuits should not have inferred latches

Note 2: If statements and case statements should always be used in an always block

It is recommended to use reg type for the assigned variables.

 reg y;
 always @(*)
 begin
   case(sel)
     2'b00: y = <some value 1>;
     2'b01: y = <some value 2>;
   endcase
 end

Caveats with case:

It should be known that incomplete cases would result in inferred latches. Such us the above code above.
To avoid this we must use default statements at the end of the case.

 reg [1:0] sel;
 always @(*)
 begin
   case(sel)
     2'b00:<some code>;
     2'b01:<some code>;
     default:<some code>;
   endcase
  end

It should also be noted that we need to assign all outputs in all the cases.
If not, like in the below code where partial assignments are made some issues might come in the design.

reg [1:0] sel;
always @(*)
 begin
   case(sel)
     2'b00: begin
             x=a;
             y=b;
            end
     2'b01: begin
             x=c;
            end 
     default: begin
               x=d;
               y=b;
             end
   endcase
  end

To resolve the above issue assign all the outputs in all the cases and do no partial assignments.

Note: It is important to not have overlapping case statements

Lab Incomplete If case

LAB 1

Below is the .v file that we are going to simulate. We are expecting a mux ideally with the use of if statements.

In the below waveform, you can see that the design becomes a DFF and the y follows i1 for the i0 as enable.

During synthesis, we can see that a D latch is generated instead of a mux.

The below figure shows the design synthesis diagram.

LAB 2

Below is the .v file that we are going to simulate. We are expecting two muxes ideally but like above what we get is quite different.

Below is the waveform of the same.
You can see that y(output) follows i1 when i0 is high.
if i0 is low and i2 is high, y follows i3.
for the rest a DFF is inferred as shown in the waveform:

The above statement of the inferred D latch is confirmed with the below screenshot.

Below is the synthesized design.

Lab Incomplete Overlap case

LAB 1

Below is the .v file that we are going to simulate. We are expecting a mux ideally but what we get is quite different.

Below is the waveform of the incomp_case.v RTL.
Here the output follows the logic for a mux for select lines 00 and 01 but the case for 10 and 11 are not defined so an inferred D latch will be formed.

The below diagram shows the synthesized diagram and like we thought a D latch is there.

LAB 2

Here were are going to simulate and synthesize a code that fixes the above-said problem.
Here, a latch will not be inferred as we are using default.
Below is the simulation of the RTL. In contrast to the previous waveform, this is correct.

As shown below, there is no D Latch inferred.

LAB 3

Here we are going to simulate and synthesize ' partial_case_assign.v '.

The figure below shows the simulation of the RTL. Due to the partial assignments in different cases, latches will be inferred.
The synthesized design diagram is shown below. As expected there is a D latch because of partial assignments.

LAB 4

In this particular lab, we are going to see ' bad_case.v ', which will have a synthesis simulation mismatch.
In the code below the simulator will get confused for sel '10' and '11' and will cause a latch-like action.

Below is the simulation on GTKWave.

After synthesis, we will get this design diagram which has no D-latch, so to investigate we simulate the netlist.

After the simulation of the netlist, we will get the following waveform which is correct. Hence a synthesis-simulation mismatch.

For loop and For generate

In this section, we are going to be looking at the for loop and generate for looping statements.
Looping constructs are two types:

For loop
generate for loop

For loop

The for loop is used to evaluate expressions.
It should always be used inside the always block.
It is not and should not be used for generating or instantiating hardware.

example:

input reg [31:0] inp;
integer i;
always @(*)
begin
  for (i=0;i<32;i=i+1)
  begin
    if(i == sel)
      y = inp[i];
  end
end

Generate for loop

The generate for loop is used for replicating the hardware.
It should be used outside the always block.

example:

genvar i;
generate
  for(i=0;i<3;i=i+1)
    begin
      and u1 (.a(a[i]) , .b(b[i]) , .y(y[i]));
    end
endgenerate

Labs "For loop" and "For generate"

LAB 1

Below is the RTL code for mux_generate.v.
It is a 4x1 mux using for loop in the logic.

The simulated output waveform is shown below and it coming out as expected.

After synthesis, we get the design layout as such.

We generate a netlist for the same and simulate it. As expected we get the output as the previous waveform.

LAB 2

Below is the RTL code for demux_generate.v.
It is a 1x8 demultiplexer using for loop in the logic.

The simulated output waveform is shown below and it coming out as expected.

After synthesis, we get the design layout as such.

We generate a netlist for the same and simulate it. As expected we get the output as the previous waveform.

LAB 3

Here shown below are the RTL codes for a ripple carry adder.
We are using generate for loop to replicate the full adder hardware.

Below given is the simulation of fa.v and rca.v.It is obtained as expected.
Below is the synthesized design diagram for the same.
The yosys commands are a bit different for this, as we are using two .v files.

read_liberty -lib <dir>  // same as usual
read_verilog fa.v rca.v  // since there are two .v files
synth                    // just write synth without anything else
abc -lib <directory>     // same as usual
show rca                 // instead of just show (you can write <show fa> to see the fa layout)
write_verilog rca_net.v  // same as usual

Simulate the netlist file along with the Verilog models and testbench.
We find the output waveform to be the same as the RTL simulation.

Acknowledgement

Kunal Ghosh,Director, VSD Corp.Pvt Ltd.
Skywater Foundry
Chat-GPT (OpenAi)
Kanish R, Colleague, IIIT-B
Alwin Shaju, Colleague, IIIT-B
Madhav Rao, Professor, IIIT-B
Nanditha Rao, Professor, IIIT-B
Manikandan RR, Professor, IIIT-B
Mariam Rakka

Name		Name	Last commit message	Last commit date
Latest commit History 99 Commits
README.md		README.md

emillal/RTL_Workshop_VSD

Folders and files

Latest commit

History

Repository files navigation

RTL Workshop

Day 0

Day 1

What is .lib ?

Why do we need different flavors of gates?

Faster Cells Vs. Slower Cells:

Day 2

Hierarchical Synthesis:

Flat Synthesis:

Why do we need to use flops?

How do I code the Flop?

Asynchronous reset:

Asynchronous set:

Synchronous reset:

Synthesis of the above three designs:

Day 3

Combinational logic optimization:

Constant propagation

Boolean logic optimizations

Sequential logic optimizations:

Sequential Constant

State optimization

Re-timing

Sequential logic cloning

LAB 1:

LAB 2:

LAB 3:

LAB 4:

LAB 5:

LAB 1:

LAB 2:

LAB 3:

LAB 4:

If we were using count[2] and count[1] also in the above code:

Day 4

What is GLS?

Synthesis Simulation mismatch

Missing sensitivity list

Blocking & Non-Blocking assignments

Caveats with blocking

LAB 1

LAB 2

LAB 1

Day 5

Exceptional cases

Caveats with case:

LAB 1

LAB 2

LAB 1

LAB 2

LAB 3

LAB 4

For loop

Generate for loop

LAB 1

LAB 2

LAB 3

Acknowledgement

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages