This repository is made to document the coursework done under Kunal Ghosh for ASICs and covers Advanced Physical Design using OpenLane and Sky130.
Day 1: Inception of open-source EDA, OpenLANE and Sky130 PDK
Day 2: Good floorplan vs. Bad floorplan and introduction to library cells
Day 3: Design Library Cell using magic layout and ngspice Characterizations
Day 4: Pre-layout timing analysis and importance of good clock tree
Day 5: Final steps for RTL2GDS using TritonRoute and OpenSTA
Introduction to RISC-V
RISC-V is an open-source instruction set architecture (ISA) for computer processors.
An instruction set architecture defines the set of instructions that a processor can execute and the organization and behaviour of those instructions. RISC-V is unique in that any single company or organization does not own it. and it is freely available for anyone to use, modify, and implement without the need for licensing fees or proprietary restrictions.
The RISC-V project began at the University of California, Berkeley in 2010, and it has since gained significant traction in both academia and industry. Its open nature has led to a growing ecosystem of hardware and software developers collaborating to create a wide range of products, from simple embedded devices to high-performance supercomputers.
Application software (apps) and hardware are linked by 'system software'.There are various layers of system softwar*. This includes major components like Compiler and Assembler.
The compiler compiles high-level codes like C and C++ to Instructions(eg: the codes inside .exe files) that can be read by the Assembler.
The Assembler converts it into binary codes which the machine can understand. The instructions act as an interface between the high-level language and the machine language.
The converted binary is then given to an RTL snippet that understands the instruction. This is done by a Hardware Description Language (HDL). This is basically called RTL implementation and a netlist is being generated. with this, a physical design implementation of the design is generated.
see more info at : https://github.com/mrdunker/RISC-V_based_MYTH_IIITB/
Communicating with computers
A QFN-48 package is a type of integrated circuit (IC) package that follows the Quad Flat No-Lead (QFN) format and contains 48 leads or pins. This package is characterized by its flat, square or rectangular shape
with no leads protruding from the sides. Instead, the electrical connections are made through small exposed pads on the bottom surface of the package, which are soldered directly onto the circuit board or PCB
(Printed Circuit Board).
The different componets in a broad view are given below.
In a QFN-48 package, the chip is attached to the die attach pad, which is the central exposed pad on the bottom surface of the package. This pad provides a mechanical and thermal connection between the chip and
the package. Electrical connections from the chip to the external world are made through the other exposed pads (leads) on the bottom surface of the package.
The chip within a QFN-48 package can vary widely in terms of its function, complexity, and manufacturer. It might be a microcontroller, a memory chip, a sensor, or any other type of integrated circuit designed to
perform specific tasks within an electronic system. The QFN-48 package serves to protect, house, and provide electrical connections for the chip, making it suitable for surface-mount assembly onto a printed
circuit board (PCB) in various electronic devices and applications.
A QFN-48 (Quad Flat No-Lead 48) package typically includes 48 pads, which are the exposed metal areas on the bottom surface of the package. These pads serve as the electrical connections between the integrated circuit (IC) inside the package and the printed circuit board (PCB) on which the QFN-48 package is mounted.
The core of the QFN-48 package is the central and most essential part of the package. It houses the semiconductor die or microchip, which contains the electronic circuitry, transistors, and other components responsible for the device's intended functionality. The core of the QFN-48 package is attached to the die attach pad, which is the central exposed pad on the bottom surface of the package.
The die is the heart of the integrated circuit (IC) and contains the actual electronic components, transistors, and circuitry responsible for the device's functionality. The die itself is where all the electronic magic happens. It contains the logic, memory, or other functional components that define the IC's purpose. The QFN-48 package serves to protect the die, provide electrical connections, and assist in thermal management, making it suitable for surface-mount assembly onto a PCB in various electronic devices and applications.
SoC Design using OpenLane
Desiging Digital Application Specific Integrated Chip(ASIC) require several elements. They are as follows:
- RTL IP's
- EDA tools
- PDK tools
RTL IP encompasses pre-constructed and pre-validated units of digital logic or functional modules, which are described at the register-transfer level (RTL). RTL serves as a hardware description level that
characterizes the operation of a digital circuit through data transfers between registers and logic operations. RTL IP cores represent reusable components that can be incorporated into more extensive ASIC or FPGA
designs. These cores encompass a range of functions, including processors, memory controllers, communication interfaces, and others. Designers frequently employ RTL IP to streamline the development of intricate
digital systems, thereby conserving time and resources.
Electronic Design Automation (EDA) tools are software applications that streamline the creation and validation of electronic circuits, encompassing ASICs, FPGAs, and other digital systems. These tools span across
multiple phases of the design process, from the initial conceptualization to the ultimate physical realization.
A Process Design Kit (PDK) encompasses a set of resources, including tools, libraries, and documentation, furnished by semiconductor foundries. These resources are designed to empower creators in fashioning ASICs
and other integrated circuits, leveraging the foundry's unique manufacturing processes. PDK tools form an integral component of the PDK bundle, serving various essential functions.
The below photo illustrates the various open source tools that can be used in designing ASIC's.
The simplified flow from RTL to GDSII is shown below.
Following are the various step shown in the above figure to convert RTL to GDSII
-
Synthesis: Synthesis involves the process of translating a high-level hardware description of a digital circuit into a Register-Transfer Level representation, which is a lower-level and more hardware- oriented description of the same circuit.
-
Floor & Power Planning: Floor planning and power planning are essential steps in the design and layout of an Application-Specific Integrated Circuit (ASIC). They involve the physical organization and allocation of resources within the chip to meet performance, power, and area requirements
- Chip floor planning : involves strategically organizing and allocating the available silicon area on a chip to accommodate various functional blocks.
- Macro floor planning : specific aspect of the chip floor planning process that focuses on the organization and placement of large functional blocks, often referred to as macros or IP (Intellectual Property) blocks, within an integrated circuit (IC) design.
- Power Planning : It involves the strategic distribution and management of power supply and ground connections within the chip to ensure proper power delivery, minimize voltage drop, and control power consumption.
-
Placement: process of determining the physical location of various functional blocks and components on the silicon die of the chip.
- Global Placement: It involves determining the approximate positions of all the functional blocks and components on the chip's silicon die. Global placement sets the initial arrangement of these blocks.
- Detailed Placement: Detailed placement aims to meet stringent design constraints, optimize chip area utilization, and minimize wirelength to ensure the chip's performance, power efficiency, and manufacturability.
-
Clock Tree Synthesis: CTS, or clock tree synthesis, involves creating a clock distribution network to guarantee that clock signals reach all sequential elements, like flip-flops, in a synchronized manner. Adequate CTS is essential to uphold timing requirements.
-
Routing: process of creating the physical interconnections or paths that allow electrical signals to flow between various components, such as gates, flip-flops, and memory elements, on a silicon die.
-
Signoff: After placement and routing,detailed design rule checking (DRC) and final verification is done to ensure the layout complies with fabrication constraints and meets specified requirements for timing, area, and power.
OpenLane is a fully automated process, spanning from RTL (Register-Transfer Level) to GDSII (Graphics Data System II), and relies on various components, including OpenROAD, Yosys, Magic, Netgen, CVC, SPEF-Extractor, KLayout, and a set of specialized scripts for design exploration and enhancement. This comprehensive flow covers every step of ASIC implementation.
OpenLANE utilises a variety of opensource tools in the execution of the ASIC flow:
- RTL Synthesis & Technology Mapping: yosys,abc
- Floorplan & PDN:init_fp, ioPlacer, pdn and tapcell
- Placement:RePLace, Resizer, OpenPhySyn & OpenDP
- Static Timing Analysis:OpenSTA
- Clock Tree Synthesis:TritonCTS
- Routing:FastRoute and TritonRoute
- SPEF Extraction:SPEF-Extractor
- DRC Checks, GDSII Streaming out:Magic, Klayout
- LVS check:Netgen
- Circuit validity checker:CVC
More info can be obtained from here
cd OpenLane
make mount
Inside the openlane container
./flow.tcl -interactive
package require openlane 0.9
prep -design picorv32a
run_synthesis
The netlist generated is shown below:
cd OpenLane/designs/picorv32a/runs/RUN_2023.09.10_07.47.37/results/synthesis/
gvim picorv32.v
To view report:
cd OpenLane/designs/picorv32a/runs/RUN_2023.09.10_07.47.37/reports/synthesis/
gvim 1-synthesis.AREA_0.stat.rpt
Flop ratio = Number of D Flip flops = 1596 = 0.1579
______________________ _____
Total Number of cells 10104
Chip floorplanning and considerations
There are certain factors that we have to take into consideration when doing floorplanning.Such as:
-
Utilization factor and Aspect Ratio
-
Define locations of preplaced cells
-
Decoupling capacitors
-
Power Planning
-
Pin Placement
The utilization factor, also known as the area utilization factor or chip utilization factor, is a measure of how efficiently the silicon area on a chip is being used for active components (logic gates, memory cells, etc.) compared to the total available area.
A Utilisation Factor of 1 signifies 100% utilisation leaving no space for extra cells such as buffer. However, practically, the Utilisation Factor is 0.5-0.6. Likewise, an Aspect ratio of 1 implies that the chip is square shaped. Any value other than 1 implies rectanglular chip.Utilisation Factor = Area occupied by netlist __________________________ Total area of core
The aspect ratio in ASIC design is a measure of the chip's physical shape, specifically the ratio of its width to its height. It is often used in the context of standard cell libraries and the dimensions of the chip's core area. The aspect ratio is expressed as:
Aspect Ratio = Height of the core _____________________ Width if the core
Preplaced cells, also known as predefined cells, are a category of components used in Application-Specific Integrated Circuit (ASIC) and digital integrated circuit design. Unlike standard cells, which are typically placed and routed automatically during the design process, preplaced cells are fixed or manually placed at specific locations on the chip's layout by the designer.Preplaced cells are IPs comprising large combinational logic which once placed maintain a fixed position.
The above mentioned preplaced cells must be surrounded with decoupling capacitors.Since the imepedence of the long wire lengths can cause power supply to drop significantly before reaching the logic circuit,leading to the signal not entering the noise margin range.
Decoupling capacitors are large capacitors that are charged to power supply voltage and kept close to the logic circuit.It serves the purpose of decoupling the logic circuit from power supply by providing adequete amount of current to the circuit.It prevents cross-talk.Unlike preplaced macros,each block on chip cannot have it's own decoupling capacitor. Powerplanning ensures that each block gas its own VDD and VSS pads and ground lines forming a mesh.
The space between the core and the chip is allocated for the placement of pins. The connectivity data encoded in either VHDL or Verilog is employed to decide the location of I/O pads for different pins. Subsequently, a logical placement is carried out for pre-placed macros to clearly distinguish that region from the pin area.
After simulation we run picorv32a floorplan using the commnand below:
run_floorplan
For viewing the floorplan we are using the tool magic.
We should move into the directory 'results/floorplan' and use the below command.magic -T /home/emil/.volare/sky130A/libs.tech/magic/sky130A.tech lef read ../../tmp/merged.nom.lef def read picorv32.def &
Here we need to specify the sky130A.tech file directory as well.
Basic Magic shortcuts:
- Press 'Z' on keryboard to zoom in.
- Press 'V' to center (zoom out fully).
- Hover over an element and press 'S' to select it.
- After selecting type 'what' in the console window to view it's details.
Library binding and Placement
In this step of OpenLANE ASIC flow,The synthesized netlist is to be placed on the floorplan.It occurs in two stages:
- Global Placement
- Detailed Placement
Global Placement finds optimal position for all cells which may be not legal at the time and overlap.
Detailed Placemnent changes this particular placement and make it legal.It is important from a timing point
of view
Here we are going to run placement and view the new layout on magic.
We are going to use the below command to run placement, in OpenLANE.
run_placement
After which we change directory to results/placement.
Inside the directory we run the following command for executing magic.
magic -T /home/emil/.volare/sky130A/libs.tech/magic/sky130A.tech lef read ../../tmp/merged.nom.lef def read
picorv32.def &
Cell Design and Characteristic Flow
The standard cell design flow in ASIC involves iterative processes, and each step must be carefully executed to ensure a successful design that meets the specified requirements within the constraints of the target technology node.
Standard cell design flow involves the fillowing:
- Process Design Kits (PDKs), Design Rule Checking (DRC) and Layout vs. Schematic (LVS) guidelines, SPICE models, libraries, and user-defined specifications.
- Circuit design, Layout design (Art of layout Euler's path and stick diagram), Extraction of parasitics, Characterization (timing, noise, power).
- CDL (circuit description language), LEF, GDSII, extracted SPICE netlist (.cir), timing, noise and power .lib files.
The industry-standard process for characterizing standard cells typically consists of the following stages:
- Read in the models and tech files
- Read extracted spice Netlist
- Recognise behavior of the cells
- Read the subcircuits
- Attach power sources
- Apply stimulus to characterization setup
- Provide neccesary output capacitance loads
- Provide neccesary simulation commands
For characterization an opensource software called GUNA is used.
All the steps from 1 to 8 are fed into GUNA,which in turn generates timing,noise and power models.
Timing characterization paramenters
It is the process of assessing and quantifying the timing behavior of digital logic elements, such as standard cells or custom-designed blocks, within an integrated circuit. It is a crucial step to ensure that the ASIC operates correctly and meets the required performance specifications.
Timing defintion | Value |
---|---|
slew_low_rise_thr | 20% value |
slew_high_rise_thr | 80% value |
slew_low_fall_thr | 20% value |
slew_high_fall_thr | 80% value |
in_rise_thr | 50% value |
in_fall_thr | 50% value |
out_rise_thr | 50% value |
out_fall_thr | 50% value |
The time disparity between the moment the changing input attains 50% of its ultimate level and the instance when the output reaches 50% of its ultimate level can be described as the delay. If you select inappropriate threshold values, it can result in negative delay values. Even when appropriate threshold values are chosen, the delay can occasionally be either positive or negative, influenced by the quality of the signal transition (slew rate).
Propagation delay = time(out_fall_thr) - time(in_rise_thr)
The interval required for the signal to transition between its states is referred to as the transition time. This time span is typically measured by observing the signal's shift from 10% to 90% or 20% to 80% of its signal levels.
Rise transition time = time(slew_high_rise_thr) - time(slew_low_rise_thr)
Low transition time = time(slew_high_fall_thr) - time(slew_low_fall_thr)
CMOS inverter using ngspice simulations
NGSpice is an open-source electronic circuit simulator software used for analog, digital, and mixed-signal electronic circuit simulation. It is part of the larger family of SPICE (Simulation Program with Integrated Circuit Emphasis) simulators.
- PnR is a iterative flow and hence, we can make changes to the environment variables in the fly to observe the changes in our design.
- If i am required to change pin configuration along the core from randomly placed to some other placement, we use the below command in the openlane interactive window
set ::env(FP_IO_MODE) 2
A SPICE deck includes information about the following:
- Model description
- Netlist description
- Component connectivity
- Component values
- Capacitance load
- Nodes
- Simulation type and parameters
- Libraries included
Before doing a SPICE simulation it is required for us to create a SPICE Deck,which provides information about various things such as:
- Component Connectivity - Connectivity of the Vdd, Vss,Vin, substrate. Substrate tunes the threshold voltage of the MOS.
- Component values - values of PMOS and NMOS, Output load, Input Gate Voltage, supply voltage.
- Node identification
- Simulation commands
- Model file - This file will have information regarding the NMOS and PMOS paramenters of a particular technology.
In the below figures we can see the variation of waveforms when parameters are varied.
It is the point with which the Vin = Vout on the DC transfer chara.
Here,both transistors will be in saturation region, meaning both will be in the ON condition and
there is a high chance of leakage current.Leakage current is the current which may flow directly
from VDD to GND.
Through transient analysis, we calculate the rise and fall delays of the CMOS by SPICE Simulation
We first clone the mag files and spice models of invertoer,pmos and nmos sky130 using the github link below.
Cloning is done inside the openlane folder.
git clone https://github.com/nickson-jose/vsdstdcelldesign.git
After cloning we are required to copy also the tech file into vsdstdcelldesign directory.
Then we run the magic command as shown below to get the layout.
magic -T sky130A.tech sky130_inv.mag &
Inception of Layout and CMOS Fabrication Process
16-Mask CMOS Fabrication encompasses several critical phases for crafting integrated circuits.
-
Substrate Selection.
This is the most initial phase of the process where the subrstrate is chosen.Here we are chosing a p-substrate.
-
Active region creation.
This is done to isolate the active regions for transistors, the process begins with the deposition of SiO2 and Si3N4 layers, followed by photolithography and silicon nitride etching.This is also known as LOCOS (Local Oxidation of Silicon),where oxide is grown in certain regions. The Si3N4 layer is removed using hot H2SO4.
-
N-Well and P-Well Formation.
The N-well and P-well regions are created separately.Ion implanation by Boron for P-well and by Phosphorous for N-well formation.High-temperature furnace processes drive-in diffusion to establish well depths, known as the tub process.
- Gate Formation.
The gate is a very important CMOS transistor terminal that controls threshold voltages for transistor switching. NMOS and PMOS gates formed by photolithography techniques.Important parameters for gate formation include oxide capacitance and doping concentration.
- Lightly dopped Drain(LDD).
LDD formed to avoid the hot electron effect.
- Source and Drain Formation.
Screen oxide added to avoid channelling during implants followed by Aresenic implantation and high temperature annealing.
- Local Interconnect Formation.
Removal of screen oxide by HF etching and deposition of Ti for low resistant contacts is done.Heat treatment results in chemical reactions, producing low-resistant titanium silicon dioxide for interconnect contacts and titanium nitride for top-level connections, enabling local communication.
- Higher Level Metal Formation.
Chemical Mechanical Polishing (CMP) is utilized by doping silicon oxide with Boron or Phosphorus to achieve surface planarization.This is followed up by TiN and Tungsten deposition.An aluminum (Al) layer is added and subjected to photolithography and CMP.This is the first interconnect and addditional interconnect layers can be added on top to reach higher level of metal layers.
At the end a dielectric layer usually Si3N4 is added ontop to protect the chip.
We can see the layers which are required for CMOS inverter. We also see that the drains of both PMOS and NMOS are connected together.
NMOS source connected to ground(VGND), PMOS source is connected to VDD(VPWR).
In Sky130 the first layer is called the local interconnect layer or Locali.
The below screenshot shows the highlighted part in the layout and the same is shown in the tkcon window.
It is a format that tells us about the boundaries of a cell, the VDD and GND lines. It contains information about the logic of the circuit.
Tech LEF - has information about the Metal layer,DRC etc..
Marcro LEF - Contains physical information of cell like size, pin,direction.
To extract the SPICE we open tkcon window.
type 'pwd' to check the directory we are extracting to.
The command 'extract all' is is used to to extract to the directory.
To create a spice file using the .ext file,the commmands are.
ext2spice cthresh 0 rthresh 0 //mothing is created in the directory with this command
Which extracts parasatic capacitances.
To create a file in the directory, we use the below command.
ext2spice
Sky130 Tech file LABS
Here we go into the created spice file and make changes to it and simulate.
In the spicefile the nmos and pmos model details were defined along with the sub circuit details and the other parasitic capacitance information also.
We are going to be doing a transient analysis so we make the following changes to it.
- VGND to VSS 0V
- Supply voltage VPWR to GND.
- Sweeping a pulse input.
- We add library files and change the scale to 0.01u
- Add a transient analysis with nessasary stoptime and precision as shown below.
Since the SPICE Deck is done,we run the simulation using ngspice.
ngspice sky130_inv.spice
To plot the graph using ngspice we are using the below code after opening ngspice.
plot y vs time a
The below waveform is plotted hence.
The spikes shown in the output(red) are caused due to low load capacitance.We can increase the cap value to sort this out.
There are four timing parameters used to characterize the inverter standard cell:
- Rise transition - Time taken for the output to rise from 20% to 80% of max value
- Fall Transition: Time taken for the output to fall from 80% to 20% of max value
- Cell Rise delay: difference in time(50% output rise) to time(50% input fall)
- Cell Fall delay: difference in time(50% output fall) to time(50% input rise)
In the ngspice waveform we can note down the values and calculate the above parameters.
Rise transition: 2.240 - 2.143 = 0.067ns (67ps)
Fall Transition: 4.0921 - 4.049 = 0.0431ns (43.1ps)
Cell Rise Delay : 2.17333 - 2.13 = 0.0433ns (43.33ps)
Cell Fall Delay : 4.076 - 4.0501 = 0.0259ns (25.9ps)
Here the following are done:
- In-depth overview of Magic DRC engine
- Introduction to Google/Skywater DRC rules
- Lab to warm up : Fixing a simple rule error
- Lab of main excersise : Fixing or creating a complex error
To know anything about magic use the following link:
http://opencircuitdesign.com/magic/
Majorly check out magic tutorails and magic command summary in the Using magic tab.
Also do check out the technlogy file manual in the technology files tab.
To view the documentation of Skywater pdks use the link below:
https://skywater-pdk.readthedocs.io/en/main/
We can view the rules associated with it there.
We are downloading the packaged files to our local pc using the wget command. It stands for Web get . The following command is used.
wget http://opencircuitdesign.com/open_pdks/archive/drc_tests.tgz
After this, extract it using the below command.
tar xfz drc_tests.tgz
Once it is done. A drc_test folder is created in the directory which extraction is done.
cd to that folder and run Magic.For better graphic use, the command belwo is used:
magic -d XR
To load a mag file we can load it using File > Open > .mag from the magic window .
Or we can use the terminal comand:
magic -d XR <filename>.mag
Select a particular block to check the DRC check. using drc why
.
We will use the following command in the tkcon window to see metal cut down.
cif see VIA2
Here we will load the poly.mag file into Magic.
Now we find the error by moving the cursor and find box area. We find out that Poly.9 is violated due to the spacing between polyres and poly.We need to fix this.
The polysilicon and polyres distance should be 22u is being shown as around 17u,and no errors. So we should go to the sky130 tech file and modify as below.
after line
*******************************************************
spacing npres *nsd 480 touching_illegal \
"poly.resistor spacing to N-tap < %d (poly.9)"
*******************************************************
we add one more line
*******************************************************
spacing npres allpolynonres 480 touching_illegal \
"poly.resistor spacing to N-tap < %d (poly.9)"
after line
*******************************************************
spacing xhrpoly,uhrpoly,xpc alldiff 480 touching_illegal \
"xhrpoly/uhrpoly resistor spacing to diffusion < %d (poly.9)"
*******************************************************
we add one more line
*******************************************************
spacing xhrpoly,uhrpoly,xpc allpolynonres 480 touching_illegal \
"xhrpoly/uhrpoly resistor spacing to diffusion < %d (poly.9)"
We then load the new tech file in the tkcon window and do a DRC check.
Here we are going to use magic to run nwell.mag and try to descrive the DRC error as a geometrical construct.
We see in the sky130 tech file,templayer for the dnwell. This is a supporting layer for the output layer to get a proper output.
After loading the nwell.mag we are going to run the following commands and see the output result for the same.
cif ostyle drc
cif see dnwell_shrink
feed clear
cif see nwell_missing
feed clear
It is important to know that the cif generations are very much resource using so it may slow down or even crash magic. So its best to use general DRC rules whenever possible and put the cif outputs in a seperate style varient which runs on demand.
DRC fast : intended for back end metal layer without checking layers below.
DFC full : It checks for the full layout considering it is relatively small.
cif drcs is a set of rules that check layers exaclty as they appear.There are several of these out of which cifwidthmax with the width of 0 is the most conveinent one to use.
Timing modelling using delay tables
Track is a path on which metal layers are drawn for routing.It is used to define the height of the standard cell.
Guidelines to be followed while making a standard cell:
- Input and output ports must lie on the intersection on Horizontal annd vertical tracks.
- Width of standard cell must be in the odd multiple of track pitch & Height in the odd multiple of track height pitch.
The information to get the grids is defined in tracks.info
.
cd to the particular location and open the file.
cd .volare/sky130A/libs.tech/openlane/sky130_fd_sc_hd/tracks.info
The content of the file are:
li1 X 0.23 0.46 //0.46um is the width
li1 Y 0.17 0.34 //0.34um is the height
met1 X 0.17 0.34
met1 Y 0.17 0.34
met2 X 0.23 0.46
met2 Y 0.23 0.46
met3 X 0.34 0.68
met3 Y 0.34 0.68
met4 X 0.46 0.92
met4 Y 0.46 0.92
met5 X 1.70 3.40
met5 Y 1.70 3.40
We iput the below command in the tkcon window to get grid on magic.
grid 0.46um 0.34um 0.23um 0.17um
After the layout is made we need to extract the LEF file for the cell. But, certain properties and defenitions have to be set to the pins of the cell which aid the placer and router tool. For LEF files, a cell that contains ports is written as a macro cell,and the ports are the declared PINs of the macro.
Defining port and setting correct class and use attributes to each port is the first step.
We highlight the port that we want to define in magic, Then Edit > Text and change values as below.
For each layer (to be turned into port), make a box on that particular layer and input a label name along with a sticky label of the layer name with which the port needs to be associated. Ensure the Port enable checkbox is checked and default checkbox is unchecked as shown in the figure:
The same needs to be done for the VPWR and VGND except the Attached to layer must be changed to metal1.
Before the CMOS Inverter standard cell LEF is extracted,and the purpose of ports must be defined.
Port A:
port class input
port use signal
Port Y:
port class output
port use signal
VPWR area:
port class inout
port use power
VGND area:
port class inout
port use ground
The below command is also run on tkcon for extracting LEF file into the same directory.
lef write // if no name is specified it will be the same name as mag file
This creates the file below:
VERSION 5.7 ;
NOWIREEXTENSIONATPIN ON ;
DIVIDERCHAR "/" ;
BUSBITCHARS "[]" ;
MACRO sky130_vsdinv
CLASS CORE ;
FOREIGN sky130_vsdinv ;
ORIGIN 0.000 0.000 ;
SIZE 1.380 BY 2.720 ;
SITE unithd ;
PIN A
DIRECTION INPUT ;
USE SIGNAL ;
ANTENNAGATEAREA 0.165600 ;
PORT
LAYER li1 ;
RECT 0.060 1.180 0.510 1.690 ;
END
END A
PIN Y
DIRECTION OUTPUT ;
USE SIGNAL ;
ANTENNADIFFAREA 0.287800 ;
PORT
LAYER li1 ;
RECT 0.760 1.960 1.100 2.330 ;
RECT 0.880 1.690 1.050 1.960 ;
RECT 0.880 1.180 1.330 1.690 ;
RECT 0.880 0.760 1.050 1.180 ;
RECT 0.780 0.410 1.130 0.760 ;
END
END Y
PIN VPWR
DIRECTION INOUT ;
USE POWER ;
PORT
LAYER nwell ;
RECT -0.200 1.140 1.570 3.040 ;
LAYER li1 ;
RECT -0.200 2.580 1.430 2.900 ;
RECT 0.180 2.330 0.350 2.580 ;
RECT 0.100 1.970 0.440 2.330 ;
LAYER mcon ;
RECT 0.230 2.640 0.400 2.810 ;
RECT 1.000 2.650 1.170 2.820 ;
LAYER met1 ;
RECT -0.200 2.480 1.570 2.960 ;
END
END VPWR
PIN VGND
DIRECTION INOUT ;
USE GROUND ;
PORT
LAYER li1 ;
RECT 0.100 0.410 0.450 0.760 ;
RECT 0.150 0.210 0.380 0.410 ;
RECT 0.000 -0.150 1.460 0.210 ;
LAYER mcon ;
RECT 0.210 -0.090 0.380 0.080 ;
RECT 1.050 -0.090 1.220 0.080 ;
LAYER met1 ;
RECT -0.110 -0.240 1.570 0.240 ;
END
END VGND
END sky130_vsdinv
END LIBRARY
To include the new standard cell in the synthesis, we need to copy the lef file which we have generated to the /designs/picorv32a/src
directory. The sky130_fd_sc_hd_typical.lib ,
sky130_fd_sc_hd__fast.lib ,sky130_fd_sc_hd__slow.lib file from vsdstdcelldesign/libs
directory needs to be copied to the designs/picorv32a/src
directory.
Now we need to modify the the config.json file as shown below.
"PL_RANDOM_GLB_PLACEMENT": 1,
"PL_TARGET_DENSITY": 0.5,
"FP_SIZING": "relative",
"LIB_SYNTH":"dir::src/sky130_fd_sc_hd__typical.lib",
"LIB_FASTEST":"dir::src/sky130_fd_sc_hd__fast.lib",
"LIB_SLOWEST":"dir::src/sky130_fd_sc_hd__slow.lib",
"LIB_TYPICAL":"dir::src/sky130_fd_sc_hd__typical.lib",
"TEST_EXTERNAL_GLOB":"dir::../picorv32a/src/*",
"SYNTH_DRIVING_CELL":"sky130_vsdinv"
Now we invoke OpenLANE as usual to integrate the standard cell in the OpenLANE flow.
Use the following commands in openlane.
prep -design picorv32a
set lefs [glob $::env(DESIGN_DIR)/src/*.lef]
add_lefs -src $lefs
run_synthesis
This is followed by floorplan and placement.
run_floorplan
run_placement
To check the layout invoke magic from the directory:
/runs/RUN_2023.09.16_11.41.17/
magic -T /home/emil/.volare/sky130A/libs.tech/magic/sky130A.tech lef read ../../tmp/merged.nom.lef def read picorv32.def &
Below is the output obtained in magic:
Delay is a parameter that has huge impact on our cells in the design. Delay decides each and
every other factor in timing. For a cell with different size, threshold voltages, delay model
table is created where we can it as timing table. Delay of a cell depends on input transition
and out load. Lets say two scenarios, we have long wire and the cell(X1) is sitting at the end
of the wire : the delay of this cell will be different because of the bad transition that
caused due to the resistance and capcitances on the long wire. we have the same cell sitting at
the end of the short wire: the delay of this will be different since the tarn is not that bad
comapred to the earlier scenario. Eventhough both are same cells, depending upon the input
tran, the delay got chaned. Same goes with o/p load also.
VLSI engineers have figured out some important rules for adding signal boosters to make sure
the signals stay strong. They've noticed that these boosters need to be a certain size, but
their speed can change depending on how much work they have to do. To deal with this, they came
up with the idea of "delay tables." These tables are like charts that show how fast the
boosters work based on the signal's starting speed and how much work they need to handle. These
charts help plan how the design should work.
In order to avoid large skew between endpoints of a clock treE:
- Buffers on the same level must have same capacitive load to ensure same timing delay or latency on the same level.
- Buffers on the same level must also be the same size (different buffer sizes -> different W/L ratio -> different resistance -> different RC constant -> different delay)
Buffers at various levels may have varying sizes and capacitive loads. However, if buffers at
the same level share the same size and load, the total delay for each path in the clock tree
will remain consistent, keeping the skew at zero. This implies that different levels will
exhibit differences in input transitions and output capacitive loads, leading to varying delays.
Delay tables, which are contained within the liberty file, are employed to record the timing
characteristics of each cell. The primary determinant of delay is the output slew, which, in
turn, is influenced by both the capacitive load and the input slew. The input slew, on the
other hand, is determined by the output capacitance load and input slew of the preceding
buffer, and it possesses its transition delay table.
Timing analysis with ideal clocks using OpenSTA
Timing analysis is carried out outside the OpenLANE flow using OpenSTA tool. For this, pre_sta.conf is required to carry out the STA analysis. Invoke OpenSTA outside the openLANE flow as follows:
sta pre_sta.conf
Since clock tree synthesis has not been performed yet, the analysis is with respect to ideal clocks and only setup time slack is taken into consideration. The slack value is the difference between data required time and data arrival time. The worst slack value must be greater than or equal to zero. If a negative slack is obtained, following steps may be followed:
- Change synthesis strategy, synthesis buffering and synthesis sizing values
- Review maximum fanout of cells and replace cells with high fanout
- sdc file for OpenSTA is modified.
base.sdc is located in vsdstdcelldesigns/extras directory. So, I copied it into our design folder using
cp my_base.sdc /home/emil/OpenLane/designs/picorv32a/src/
Since there were no timing violations, I skipped this step.Since clock is propagated only once
we do CTS, In placement stage, clock is considered to be ideal. So only setup slack is taken
into consideration before CTS.
The clock is produced by a PLL (Phase-Locked Loop) that contains internal circuits and some logic. The generation of the clock can vary depending on the specific circuit configuration. These variations are collectively referred to as "clock uncertainty." Within clock uncertainty, one of the factors is jitter, which means there's uncertainty about whether the clock will arrive precisely on time without any deviation. This is why it's called "clock uncertainty." Skew, jitter, and margin are all aspects that contribute to this uncertainty in the timing of the clock signal.
Clock Jitter : deviation of clock edge from its original position
Clock tree synthesis TritonCTS and signal integrity
Clock Tree Synthesis (CTS) plays a vital role in the creation of integrated circuits (ICs), particularly in the realm of digital electronics, where precise timing is of utmost importance. CTS involves the establishment of an organized network or structure of pathways for distributing the clock signal within the IC. This meticulous process guarantees that the clock signal effectively reaches all the sequential components, such as flip-flops and registers, in a synchronized and punctual fashion.
It can be implemeted in various ways and the choice of the specific technique depends on the
design requirements, constraints, and goals.
Some of the different types of approches to clock tree synthesis are:
- Balanced Tree CTS: The clock signal is spread out evenly, like branches of a tree. This helps ensure that all parts of the chip get the clock at about the same time, reducing timing problems. It's a straightforward method, but it might not save as much power as other methods.
- H-tree CTS: It is like a tree shape with the letter "H." It's great for spreading out clock signals across big chips. This tree structure helps make sure the timing is good and saves power, especially in large areas of the chip.
- Star CTS: In a star CTS, the clock signal is distributed from a single central point (like a star) to all the flip-flops. This approach simplifies clock distribution and minimizes clock skew but may require a higher number of buffers near the source.
- Mesh CTS: In a mesh CTS, clock wires are arranged in a mesh-like grid pattern, and each flip-flop is connected to the nearest available clock wire. It is often used in highly regular and structured designs, such as memory arrays. Mesh CTS can offer a balance between simplicity and skew minimization.
- Adaptive CTS: Adaptive CTS techniques adjust the clock tree structure dynamically based on the timing and congestion constraints of the design. This approach allows for greater flexibility and adaptability in meeting design goals but may be more complex to implement.
Crosstalk in VLSI refers to unwanted interference or coupling between adjacent conductive traces or wires on an integrated circuit (IC) or chip. It occurs when the electrical signals on one wire influence or disrupt the signals on neighboring wires.Uncontrolled crosstalk can lead to data corruption, timing violations, and increased power consumption. Mitigation: VLSI designers employ various techniques to mitigate crosstalk, such as optimizing layout and routing, using appropriate shielding, implementing proper clock distribution strategies, and utilizing clock gating to reduce dynamic power consumption when logic is idle
Clock net shielding in VLSI refers to a technique used to protect the clock signal from
interference or crosstalk. The clock signal is critical for synchronizing the operations of
various components on a chip, and any interference can lead to timing issues and performance
problems.
VLSI designers may use shielding techniques to isolate the clock network from other signals,
reducing the risk of interference. This can include dedicated clock routing layers, clock tree
synthesis algorithms, and buffer insertion to manage clock distribution more effectively.
VLSI designs often have multiple clock domains. Shielding and proper clock gating help ensure
that clock signals do not propagate between domains, avoiding metastability issues and
maintaining synchronization.
The below command is used to run CTS in OpenLANE
run_cts
After CTS run, my slack values are setup:12.36, Hold:0.38
Here also both values are not violating.
Timing analysis with real clocks
Analyzing setup time is a crucial element of designing digital circuits, especially in
synchronous digital systems. It pertains to the duration during which a signal must remain
steady and valid prior to the arrival of the clock edge. Guaranteeing the fulfillment of setup
time prerequisites is vital for averting data errors and securing the correct functioning of
the digital circuit.
To ensure the setup time requirements are met we need to make sure of some things:
- Selecting proper Filp flops or latches.
- Optimize combinational logic
- Clock Skew Analysis
- Timing constraints
Meeting setup time requrirements is cruical for a good digital circuit operation. If not done can result in data errors and multifunctioning of the circuit.
Analysis of hold time is an equally vital component of digital circuit design, especially in
synchronous systems. It concerns the minimum duration during which a data input (D) needs to
maintain its stability and validity after the clock edge before any changes can occur. Ensuring
that hold time requirements are met is essential to prevent data corruption and ensure the
proper operation of digital circuits.
Since, clock is propagated, from this stage, we do timing analysis with real clocks. From now post cts analysis is performed by operoad within the openlane flow
openroad
read_lef /home/emil/OpenLane/designs/picorv32a/runs/RUN_2023.09.17_04.44.22/tmp/merged.nom.lef
read_def /home/emil/OpenLane/designs/picorv32a/runs/RUN_2023.09.17_04.44.22/results/cts/picorv32.def
read_verilog /home/emil/OpenLane/designs/picorv32a/runs/RUN_2023.09.17_04.44.22/results/synthesis/picorv32.v
write_db pico_cts.db
read_db pico_cts.db
read_verilog /home/emil/OpenLane/designs/picorv32a/runs/RUN_2023.09.17_04.44.22/results/synthesis/picorv32.v
link_design picorv32
read_liberty $::env(LIB_SYNTH_COMPLETE)
read_sdc /home/emil/OpenLane/designs/picorv32a/src/my_base.sdc
set_propagated_clock (all_clocks)
report_checks -path_delay min_max -format full_clock_expanded -digits 4
Maze routing and Lee's Algorithm
Routing is the process of establishing a physical connection between two pins. Algorithms
designed for routing take source and target pins and aim to find the most efficient path
between them, ensuring a valid connection exists.
The Maze Routing algorithm, such as the Lee algorithm, is one approach for solving routing
problems.Here a grid similar to the one created during cell customization is utilized for
routing purposes.
The Lee algorithm starts with two designated points, the source and target, and leverages the
routing grid to identify the shortest or optimal route between them.
Lee's Algorithm has its limitations. It can be time consuming when dealing with millions of
pins.It essentially constructs a maze and then numbers its cells from the source to the target.
here are alternative algorithms that address similar routing challenges.
Here in this case he shortest path is one that follows a steady increment of one.There might
be multiple paths, but the best path that the tool will choose is one with less bends.The
route should not be diagonal and must not overlap an obstruction such as macros. The Lee
algorithm prioritizes selecting the best path, typically favoring L-shaped routes over
zigzags. If no L-shaped paths are available, it may resort to zigzag routes. This approach is
particularly valuable for global routing tasks.
This algorithm however has high run time and consume a lot of memory thus more optimized routing algorithm is preferred .
Design rule checks are physical checks of metal width, pitch and spacing requirement for the
different layers which depend on different technology nodes.It verifies whether a design meets
the predefined process technology rules given by the foundry for its manufacturing.
The layout of a design must be in accordance with a set of predefined technology rules given
by the foundry for manufacturability. After completion of the layout and its physical
connection, an automatic program will check each and every polygon in the design against these
design rules and report any violations.
Power Distribution Network Generation
Unlike the general ASIC flow, Power Distribution Network generation is not a part of floorplan run in OpenLANE. PDN must be generated after CTS and post-CTS STA analyses:
we can check whether PDN has been created or no by check the current def environment variable: echo $::env(CURRENT_DEF)
prep -design picorv32a -tag <RUN file name>
gen_pdn
-
gen_pdn Generates the power distribution network.
-
The power distribution network has to take the design_cts.def as the input def file.
-
Power rings,strapes and rails are created by PDN.
-
From VDD and VSS pads, power is drawn to power rings.
-
Next, the horizontal and vertical strapes connected to rings draw the power from strapes.
-
Stapes are connected to rings and these rings are connected to std cells. So, standard cells get power from rails.
-
here are definitions for the straps and the rails. In this design, straps are at metal layer 4 and 5 and the standard cell rails are at the metal layer 1. Vias connect accross the layers as required.
Routing
In the realm of routing within Electronic Design Automation (EDA) tools, such as both OpenLANE
and commercial EDA tools, the routing process is exceptionally intricate due to the vast design
space. To simplify this complexity, the routing procedure is typically divided into two
distinct stages: Global Routing and Detailed Routing.
There are two kinds of routing:
- Global Routing: In this stage, the routing region is subdivided into rectangular grid cells and represented as a coarse 3D routing graph. This task is accomplished by the "FASTE ROUTE" engine.
- Detailed Routing: Here, finer grid granularity and routing guides are employed to implement the physical wiring. The "tritonRoute" engine comes into play at this stage. "Fast Route" generates initial routing guides, while "Triton Route" utilizes the Global Route information and further refines the routing, employing various strategies and optimizations to determine the most optimal path for connecting the pins
Key Features:
- Initial Detail Routing: TritonRoute initiates the detailed routing process, providing the foundation for the subsequent routing steps.
- Adherence to Pre-Processed Route Guides:TritonRoute places significant emphasis on following pre-processed route guides. This involves several actions:
- Initial Route Guide Analysis
- Guide Splitting
- Guide Merging
- Guide Bridging
- Assumes route guide for each net satisfy inter guide connectivity Same metal layer with touching guides or neighbouring metal layers with nonzero vertically overlapped area
In summary, TritonRoute is a sophisticated tool that not only performs initial detail routing but also places a strong emphasis on optimizing routing within pre-processed route guides by breaking down, merging, and bridging them as needed to achieve efficient and effective routing results.
Make sure the current def is set to pdn.def
Start routing by using
run_routing
The .json file used here is:
{
"DESIGN_NAME": "picorv32",
"VERILOG_FILES": "dir::src/picorv32a.v",
"CLOCK_PORT": "clk",
"CLOCK_NET": "clk",
"FP_SIZING": "relative",
"LIB_SYNTH" : "dir::src/sky130_fd_sc_hd__typical.lib",
"LIB_FASTEST" : "dir::src/sky130_fd_sc_hd__fast.lib",
"LIB_SLOWEST" : "dir::src/sky130_fd_sc_hd__slow.lib",
"LIB_TYPICAL":"dir::src/sky130_fd_sc_hd__typical.lib",
"TEST_EXTERNAL_GLOB":"dir::/src/*",
"SYNTH_DRIVING_CELL":"sky130_vsdinv",
"pdk::sky130*": {
"FP_CORE_UTIL": 35,
"CLOCK_PERIOD": 24,
"scl::sky130_fd_sc_hd": {
"FP_CORE_UTIL": 30
}
}
}
flip-flop to standard cell ratio = 1596/9819 = 0.16
cd /home/emil/OpenLane/
./flow.tcl -interactive
package require openlane 0.9
prep -design picorv32a
run_synthesis
run_floorplan
detailed_placement
run_cts
gen_pdn
run_routing
cd /home/emil/OpenLane
make mount
./flow.tcl -design picorv32a
- Kunal Ghosh,VSD Corp.Pvt.Ltd.
- ChatGPT
- Alwin Shaju,Colleague,IIIT-B
- N Sai Sampath,Colleague,IIIT-B