Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vexriscv BRAM support #352

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3,044 changes: 3,044 additions & 0 deletions openfpga_flow/benchmarks/processor/picorv32/picorv32.v

Large diffs are not rendered by default.

7,460 changes: 7,460 additions & 0 deletions openfpga_flow/benchmarks/processor/vexriscv/vexriscv_large.v

Large diffs are not rendered by default.

8,639 changes: 8,639 additions & 0 deletions openfpga_flow/benchmarks/processor/vexriscv/vexriscv_linux.v

Large diffs are not rendered by default.

3,127 changes: 3,127 additions & 0 deletions openfpga_flow/benchmarks/processor/vexriscv/vexriscv_small.v

Large diffs are not rendered by default.

5 changes: 5 additions & 0 deletions openfpga_flow/misc/ys_tmpl_yosys_vpr_bram_dsp_dff_flow.ys
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,11 @@ opt -fast -mux_undef -undriven -fine
memory_map
opt -undriven -fine

#########################
# Map muxes to pmuxes
#########################
techmap -map +/pmux2mux.v

#########################
# Map flip-flops
#########################
Expand Down
5 changes: 5 additions & 0 deletions openfpga_flow/misc/ys_tmpl_yosys_vpr_bram_dsp_flow.ys
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,11 @@ opt -fast -mux_undef -undriven -fine
memory_map
opt -undriven -fine

#########################
# Map muxes to pmuxes
#########################
techmap -map +/pmux2mux.v

#########################
# Map flip-flops
#########################
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Run VPR for the 'and' design
#--write_rr_graph example_rr_graph.xml
vpr ${VPR_ARCH_FILE} ${VPR_TESTBENCH_BLIF} --clock_modeling route ${OPENFPGA_VPR_DEVICE_LAYOUT}
vpr ${VPR_ARCH_FILE} ${VPR_TESTBENCH_BLIF}

# Read OpenFPGA architecture definition
read_openfpga_arch -f ${OPENFPGA_ARCH_FILE}
Expand Down
17 changes: 17 additions & 0 deletions openfpga_flow/openfpga_yosys_techlib/dsp18_map.v
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
module mult_18x18 (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each file in this directory is binded to a specific architecture, so that for each architecture, we can customize the synthesis options to the most. Please avoid a generic naming. Suggest to rename this file to k4_frac_N8_tileable_adder_chain_dpram1K_dsp18_fracff_40nm_dsp_map.v

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouln't that lead to redundancy though? If I have two architectures like this:

k4_frac_N8_tileable_adder_chain_dpram1K_dsp18_fracff_40nm_dsp_map.v
vs
k4_frac_N8_tileable_adder_chain_dsp18_fracff_40nm_dsp_map.v (no BRAM)

Wouldn't that lead to duplicate multiplier map code? Couldn't those two architectures just use a single dsp18 map since they both have the same multiplier design?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I believe it is time to rework this directory and create rules when adding new technology libraries for yosys.
My current plan is

  • Each architecture has an independent set of technology library files, such as <arch_nam>_dsp_map.v, <arch_name>_bram_map.v etc. As such, it is easy for users to pick technology libraries because the rules are simple.
  • To avoid duplicated codes, you may try to use symbolic links. For example, k4_frac_N8_tileable_adder_chain_dpram1K_dsp18_fracff_40nm_dsp_map.v contains the actual codes, while k4_frac_N8_tileable_adder_chain_dsp18_fracff_40nm_dsp_map.v is a symbolic link.
  • We need to create separated directory for each architecture, and classify the files into different directories. As a result, it is easy to maintain. Otherwise, later on, when we 10+ architectures in the openfpga_yosys_techlib directory, it may become a mess.

Let me know what you think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh so you are planning on having a directory (for example, k4_frac_N8_tileable_adder_chain_dpram1K_dsp18_fracff_40nm) with the openfpga arch, vpr arch, dsp_map, bram_map, etc all in that directory?

input [0:17] A,
input [0:17] B,
output [0:35] Y
);
parameter A_SIGNED = 0;
parameter B_SIGNED = 0;
parameter A_WIDTH = 0;
parameter B_WIDTH = 0;
parameter Y_WIDTH = 0;

mult_18 #() _TECHMAP_REPLACE_ (
.A (A),
.B (B),
.Y (Y) );

endmodule
Original file line number Diff line number Diff line change
@@ -0,0 +1,220 @@
//-----------------------------
// Dual-port RAM 128x8 bit (1Kbit)
// Core logic
//-----------------------------
module dpram_128x8_core (
input wclk,
input wen,
input [0:6] waddr,
input [0:7] data_in,
input rclk,
input ren,
input [0:6] raddr,
output [0:7] data_out );

reg [0:7] ram[0:127];
reg [0:7] internal;

assign data_out = internal;

always @(posedge wclk) begin
if(wen) begin
ram[waddr] <= data_in;
end
end

always @(posedge rclk) begin
if(ren) begin
internal <= ram[raddr];
end
end

endmodule

//-----------------------------
// Dual-port RAM 128x8 bit (1Kbit) wrapper
// where the read clock and write clock
// are combined to a unified clock
//-----------------------------
module dpram_128x8 (
input clk,
input wen,
input ren,
input [0:6] waddr,
input [0:6] raddr,
input [0:7] data_in,
output [0:7] data_out );

dpram_128x8_core memory_0 (
.wclk (clk),
.wen (wen),
.waddr (waddr),
.data_in (data_in),
.rclk (clk),
.ren (ren),
.raddr (raddr),
.data_out (data_out) );

endmodule

//-----------------------------
// 18-bit multiplier
//-----------------------------
module mult_18(
input [0:17] A,
input [0:17] B,
output [0:35] Y
);

assign Y = A * B;

endmodule

//-----------------------------
// Native D-type flip-flop
//-----------------------------
(* abc9_flop, lib_whitebox *)
module dff(
output reg Q,
input D,
(* clkbuf_sink *)
(* invertible_pin = "IS_C_INVERTED" *)
input C
);
parameter [0:0] INIT = 1'b0;
parameter [0:0] IS_C_INVERTED = 1'b0;
initial Q = INIT;
case(|IS_C_INVERTED)
1'b0:
always @(posedge C)
Q <= D;
1'b1:
always @(negedge C)
Q <= D;
endcase
endmodule

//-----------------------------
// D-type flip-flop with active-high asynchronous reset
//-----------------------------
(* abc9_flop, lib_whitebox *)
module dffr(
output reg Q,
input D,
input R,
(* clkbuf_sink *)
(* invertible_pin = "IS_C_INVERTED" *)
input C
);
parameter [0:0] INIT = 1'b0;
parameter [0:0] IS_C_INVERTED = 1'b0;
initial Q = INIT;
case(|IS_C_INVERTED)
1'b0:
always @(posedge C or posedge R)
if (R == 1'b1)
Q <= 1'b0;
else
Q <= D;
1'b1:
always @(negedge C or posedge R)
if (R == 1'b1)
Q <= 1'b0;
else
Q <= D;
endcase
endmodule

//-----------------------------
// D-type flip-flop with active-high asynchronous set
//-----------------------------
(* abc9_flop, lib_whitebox *)
module dffs(
output reg Q,
input D,
input S,
(* clkbuf_sink *)
(* invertible_pin = "IS_C_INVERTED" *)
input C
);
parameter [0:0] INIT = 1'b0;
parameter [0:0] IS_C_INVERTED = 1'b0;
initial Q = INIT;
case(|IS_C_INVERTED)
1'b0:
always @(posedge C or posedge S)
if (S == 1'b1)
Q <= 1'b1;
else
Q <= D;
1'b1:
always @(negedge C or posedge S)
if (S == 1'b1)
Q <= 1'b1;
else
Q <= D;
endcase
endmodule

//-----------------------------
// D-type flip-flop with active-low asynchronous reset
//-----------------------------
(* abc9_flop, lib_whitebox *)
module dffrn(
output reg Q,
input D,
input RN,
(* clkbuf_sink *)
(* invertible_pin = "IS_C_INVERTED" *)
input C
);
parameter [0:0] INIT = 1'b0;
parameter [0:0] IS_C_INVERTED = 1'b0;
initial Q = INIT;
case(|IS_C_INVERTED)
1'b0:
always @(posedge C or negedge RN)
if (RN == 1'b0)
Q <= 1'b0;
else
Q <= D;
1'b1:
always @(negedge C or negedge RN)
if (RN == 1'b0)
Q <= 1'b0;
else
Q <= D;
endcase
endmodule

//-----------------------------
// D-type flip-flop with active-low asynchronous set
//-----------------------------
(* abc9_flop, lib_whitebox *)
module dffsn(
output reg Q,
input D,
input SN,
(* clkbuf_sink *)
(* invertible_pin = "IS_C_INVERTED" *)
input C
);
parameter [0:0] INIT = 1'b0;
parameter [0:0] IS_C_INVERTED = 1'b0;
initial Q = INIT;
case(|IS_C_INVERTED)
1'b0:
always @(posedge C or negedge SN)
if (SN == 1'b0)
Q <= 1'b1;
else
Q <= D;
1'b1:
always @(negedge C or negedge SN)
if (SN == 1'b0)
Q <= 1'b1;
else
Q <= D;
endcase
endmodule

Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
// Basic DFF
module \$_DFF_P_ (D, C, Q);
input D;
input C;
output Q;
parameter _TECHMAP_WIREINIT_Q_ = 1'bx;
dff _TECHMAP_REPLACE_ (.Q(Q), .D(D), .C(C));
endmodule

// Async active-high reset
module \$_DFF_PP0_ (D, C, R, Q);
input D;
input C;
input R;
output Q;
parameter _TECHMAP_WIREINIT_Q_ = 1'bx;
dffr _TECHMAP_REPLACE_ (.Q(Q), .D(D), .C(C), .R(R));
endmodule

// Async active-high set
module \$_DFF_PP1_ (D, C, R, Q);
input D;
input C;
input R;
output Q;
parameter _TECHMAP_WIREINIT_Q_ = 1'bx;
dffs _TECHMAP_REPLACE_ (.Q(Q), .D(D), .C(C), .S(R));
endmodule

// Async active-low reset
module \$_DFF_PN0_ (D, C, R, Q);
input D;
input C;
input R;
output Q;
parameter _TECHMAP_WIREINIT_Q_ = 1'bx;
dffrn _TECHMAP_REPLACE_ (.Q(Q), .D(D), .C(C), .RN(R));
endmodule

// Async active-low set
module \$_DFF_PN1_ (D, C, R, Q);
input D;
input C;
input R;
output Q;
parameter _TECHMAP_WIREINIT_Q_ = 1'bx;
dffsn _TECHMAP_REPLACE_ (.Q(Q), .D(D), .C(C), .SN(R));
endmodule
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,5 @@ run-task benchmark_sweep/mcnc_big20 --debug --show_thread_logs
#python3 openfpga_flow/scripts/run_modelsim.py mcnc_big20 --run_sim

run-task benchmark_sweep/signal_gen --debug --show_thread_logs

# run-task benchmark_sweep/processor --debug --show_thread_logs
36 changes: 36 additions & 0 deletions openfpga_flow/tasks/benchmark_sweep/processor/config/task.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
# Configuration file for running experiments
# = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
# timeout_each_job : FPGA Task script splits fpga flow into multiple jobs
# Each job execute fpga_flow script on combination of architecture & benchmark
# timeout_each_job is timeout for each job
# = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =

[GENERAL]
run_engine=openfpga_shell
power_tech_file = ${PATH:OPENFPGA_PATH}/openfpga_flow/tech/PTM_45nm/45nm.xml
power_analysis = false
spice_output=false
verilog_output=true
timeout_each_job = 20*60
fpga_flow=yosys_vpr

[OpenFPGA_SHELL]
openfpga_shell_template=${PATH:OPENFPGA_PATH}/openfpga_flow/openfpga_shell_scripts/write_full_testbench_example_script.openfpga
openfpga_arch_file=${PATH:OPENFPGA_PATH}/openfpga_flow/openfpga_arch/k4_frac_N4_adder_chain_40nm_cc_openfpga.xml
openfpga_sim_setting_file=${PATH:OPENFPGA_PATH}/openfpga_flow/openfpga_simulation_settings/auto_sim_openfpga.xml
openfpga_vpr_device_layout=auto
openfpga_fast_configuration=

[ARCHITECTURES]
arch0=${PATH:OPENFPGA_PATH}/openfpga_flow/vpr_arch/k4_frac_N4_tileable_adder_chain_40nm.xml

[BENCHMARKS]
bench0=${PATH:OPENFPGA_PATH}/openfpga_flow/benchmarks/processor/picorv32/picorv32.v

[SYNTHESIS_PARAM]
bench0_top = picorv32
bench0_chan_width = 300

[SCRIPT_PARAM_MIN_ROUTE_CHAN_WIDTH]
end_flow_with_test=
1 change: 1 addition & 0 deletions openfpga_flow/tasks/skywater_openfpga_task