Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

axi/wb bus cycle times #55

Open
jamesbbecker opened this issue Jun 27, 2022 · 7 comments
Open

axi/wb bus cycle times #55

jamesbbecker opened this issue Jun 27, 2022 · 7 comments

Comments

@jamesbbecker
Copy link

I have your design running in nexys A7 with an EL2 core.

When I do a sequence of writes to an I/O port, such as GPIO, and my code is running in ICCM, the writes are lost, except for the last one. I can solve the problem by inserting some delay in between the I/O writes.

It appears that with optimized code, the writes through Axi / Axi Mux / WB occur slower than the processor can write the data.

Is this the way it is supposed to work? Is there some sort of mechanism for the AXI writes to delay the processor core while each one is completing?

I guess I can write code to wait on I/O writes to complete before I send another one. Am I required to do that?

@olofk
Copy link
Collaborator

olofk commented Dec 26, 2022

Ouch! That sounds really bad and should definitely not happen. Do you have any test program to share?

@jamesbbecker
Copy link
Author

Hi Olof,

The code to make this happen is pretty simple. The hard part (from my experience) is getting the code inside of ICCM to run. I found the following was helpful:

I created a set of code which is loaded into boot rom which does nothing but jump to code
in ICCM.

#define ICCM_BASE_ADDRESS 0xxxxxxxxx < Put your ICCM address here.
typedef void (*function_ptr)(void);
main()
{
function_ptr jump_to_ptr;

jump_to_ptr = ((function_ptr)ICCM_BASE_ADDRESS);

(*jump_to_ptr)();
}

This code then had to be compiled and embedded in the boot_rom of the verilog, so that when its loaded into the FPGA, it runs when reset is released.

For the code that is in ICCM, you need to just write to GPIO over and over with different values. In the example below, the 0xaa will never be written to GPIO, but 0x55 will be written over an over again.

iccm_main()
{
int counter = 0;
do {
*((volatile UINT32 *)SYSCON_GPIO) = 0xaa;
*((volatile UINT32 *)SYSCON_GPIO) = 0x55;

   counter++;  // Introduce a delay.
   counter++;

} while (1);
}

I compiled this code and loaded it into ICCM using a debugger.

@jamesbbecker
Copy link
Author

Olof,

I'm not that familiar with the internals of the RISCV, but I did some simulation of the EL2 with my code to try to figure out why the instructions were being dropped. Attached are 2 screenshots from my simulator.

In the first, the instruction tries to do the write too quickly and the instruction is never executed. In the second, the instruction happens later, and succeeds.
Screenshot-Failed-Write
Screenshot-Success-Write

The signal obuf_wr_timer in the file el2_lsu_bus_buffer seems to be important. If it hasn't reached its maximum value of 7 prior to the write being executed, the write never occurs.

The actual instruction is the 3rd line from the top. The binary instruction 00e7a023 is the attempt to write to IO.

Hope this helps.

@olofk
Copy link
Collaborator

olofk commented Jan 6, 2023

Thank you! This is definitely helpful. I have been busy with other things, but hope to get to this soon.

@olofk
Copy link
Collaborator

olofk commented May 12, 2023

Ok, this took way longer than I had been hoping for. I do have a theory now at least. Or at least a question. Are you setting MRAC correctly before running your code? I have done some experimenting and can get similar (but not identical) results if I don't write to MRAC before running the code. If I put in fence operations or enough nops (at least eight nops seem to be required between the writes) I can get the writes to work again.

@jamesbbecker
Copy link
Author

jamesbbecker commented May 23, 2023 via email

@olofk
Copy link
Collaborator

olofk commented May 23, 2023

mrac is a SweRV (or VeeR, as it is now called) -specific register and is described in the EL2 manual https://github.com/chipsalliance/Cores-VeeR-EL2/blob/main/docs/RISC-V_VeeR_EL2_PRM.pdf

Basically, the 32-bit memory map is divided into 16 regions (MRAC == Memory Region Access Control). Two adjacent bits control for each region whether the region corresponds to cachable memory and if it is volatile. By default this register is set to 0x00000000 which means all regions are regarded as uncachable and non-volatile. The non-volatility means is the key here because it allows VeeR to optimize away subsequent writes to the same address. The SweRVolf (soon to be renamed VeeRwolf) bootloader begins by setting the upper half (0x80000000-0xFFFFFFFF) to uncachable, volatile memory by writing 0xAAAA0000 to mrac. https://github.com/chipsalliance/Cores-SweRVolf/blob/master/sw/boot_main.S#L37
(Technically, this should probably be 0xAAAA5555 to allow the RAM to be cachable, but at least this way we don't need to worry about stale caches.)

So, to conclude, please see if the problem persists after writing 0xAAAA0000 to mrac before running the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants