How to get FLOPS in RISC-V using SW or HW method? - profiling

I am a newbie to RISC-V. I wonder how I could get FLOPS using SW or HW method. I try to use CSR to get FLOPS, but there are some problems.
As I know, if I redesign the hpmcounter which counts every floating operation event, I could get FLOPS by using the csr read instruction. I know there is a similar design in the rocket-chip-based SiFive's U54-core manual. In the manual I can see SiFive core has sophisticated feature counting capabilities. This feature is controlled by the mhpmevent CSR. If I set lower eight bits of mhpmevent as 0, and enable the [19-25] bit, I can get counter value from mhpmcounter. I actually want to design this field like SiFive core.
I try to imitate it for FLOPS, but I encounter some problems.
I can't access to the mhpmcounter, and I can see the illegal instruction error like following link.
illegal instruction error message!!
I make a simple test code and compile it successfully, but there is a illegal instruction error when I implement it using spike and cycle accurate emulator. Both use proxy kernel.
// simple test code
unsigned long instret1 = 0;
unsigned long instret2 = 0;
float a,b,c;
a = 5.0;
b = 4.0;
asm volatile ("csrrs %0, mhpmcounter3, x0 " : "=r"(instret1));
c = a + b;
asm volatile ("csrrs %0, mhpmcounter3, x0 " : "=r"(instret2));
printf("instruction count : %ul \n", instret2-instret1);
It is hard to change to M-mode from user mode for access to the mhpmevet and mhpmcounter. In the RISC-V priv-spec 1.10, I find xRET instruction can change mode. Following text is about xRET in the spec.
The MRET, SRET, or URET instructions are used to return from traps in M-mode, S-mode, or
U-mode respectively. When executing an xRET instruction, supposing xPP holds the value y, x IE
is set to x PIE; the privilege mode is changed to y; x PIE is set to 1; and xPP is set to U (or M if
user-mode is not supported).
If someone knows it, I hope to see the detailed assembly code.
I try to modify rocket-chip/src/main/scala/rocket/CSR.scala for redesign CSR. Is it the only way? Firstly, I want to use spike to test the counter value. How should I change the code?
If anybody has some other ideas or has accomplished it, please point to me. Thanks!

Related

How to set Auxiliary Control Register bit on Cortex M4

My application running on a Cortex M4 is crashing with a hard fault. The CSFR register indicates IMPRECISERR.
Reading http://chmorgan.blogspot.nl/2013/06/debugging-imprecise-bus-access-fault-on.html I am advised to set the DISDEFWBUF bit in the Auxiliary Control Register (ACTLR). This will allow me to get PRECISERR which are easier to debug.
By reading the programming manual
for our CPU, we can see the ACTLR is at address 0xE000 E008, and the DISDEFWBUF bit is 1.
In main, this bit can be set with the following code:
*(uint8_t *)0xE000E008 |= (1<<i);
Where i = 1;
Change the value in SFRs pallete. STM32f429,Register:
Read this article for more information.

LLVM Backend: Patterns for register + offset addressing

My PIC24 processor offers register + offset addressing. To store a value I added
def MOV_reg2offset : InstReg2Offset<0b10011, (outs), (ins GPR:$Wd, GPR:$Ws, Slit10W:$Offset),
"mov\t$Ws, [$Wd + $Offset]",
[(store GPR:$Ws, (add GPR:$Wd, Slit10W:$Offset) )]>;
where GPR is a 16bit register class and Slit10W an even, signed 10 bit literal. Works perfectly!
Now I tried the same for a load instruction:
def MOV_offset2reg : InstReg2Offset<0b10010, (outs), (ins GPR:$Wd, GPR:$Ws, Slit10W:$Offset),
"mov\t[$Ws + $Offset], $Wd",
[(set GPR:$Wd, (load (add GPR:$Ws, Slit10W:$Offset) ))]>;
but tablegen crashed with an assertion violation.
Questions:
Is there something wrong with the syntax or semantics?
Or have I exceeded some theoretical limit about what can get matched? Maybe three levels in the pattern is too much?
Or does it look OK and I should try to update tablegen to the very latest version?
SOLVED: Simple copy and paste error - GPR:$Wd needs to be moved from 'ins' to 'outs'.
Now 'register + offset' addressing works for both read and write without any C++ code. Cool!

Are MachineBasicBlocks supposed to implicitly fall through to their successors?

I'm debugging an LLVM target backend, and I am chasing a problem where a certain basic block ends up jumping to "nothing", i.e. just after the end of the function, when compiled with optimizations turned on.
One thing I noticed is that after instruction selection, the machine basic block has a successor but no instruction to actually jump there:
BB#1: derived from LLVM BB %switch.lookup
Predecessors according to CFG: BB#0
%vreg5<def> = SEXT %vreg2, %SREG<imp-def,dead>; DLDREGS:%vreg5 GPR8:%vreg2
%vreg6<def,tied1> = ANDIWRdK %vreg5<tied0>, -2, %SREG<imp-def,dead>; DLDREGS:%vreg6,%vreg5
%vreg7<def> = LDIWRdK 4; DLDREGS:%vreg7
%vreg8<def> = LDIRdK 0; LD8:%vreg8
%vreg9<def> = LDIRdK 1; LD8:%vreg9
CPWRdRr %vreg6<kill>, %vreg7<kill>, %SREG<imp-def>; DLDREGS:%vreg6,%vreg7
%vreg0<def> = Select8 %vreg9<kill>, %vreg8<kill>, 1, %SREG<imp-use>; GPR8:%vreg0 LD8:%vreg9,%vreg8
Successors according to CFG: BB#2(?%)
I see similar ISel results from the x86 LLVM backend and the end result doesn't have a jump-to-nothingness, so I assume this, on its own, is not a problem:
BB#1: derived from LLVM BB %switch.lookup
Predecessors according to CFG: BB#0
%vreg7<def> = MOVSX32rr8 %vreg3; GR32:%vreg7 GR8:%vreg3
%vreg8<def,tied1> = AND32ri %vreg7<tied0>, 65534, %EFLAGS<imp-def,dead>; GR32:%vreg8,%vreg7
%vreg9<def,tied1> = SUB32ri8 %vreg8<tied0>, 4, %EFLAGS<imp-def>; GR32:%vreg9,%vreg8
%vreg0<def> = SETNEr %EFLAGS<imp-use>; GR8:%vreg0
Successors according to CFG: BB#2(?%)
So my question is: What is the mechanism by which these CFG-specified successors are supposed to be turned into real jumps? Does the x86 backend implement something special for this to work that the backend I'm debuggig doesn't?
Should I change my ISelLowering class to lower Select8 into something that ends with an explicit jump, or is that unnecessary (maybe potentially even detrimental for some optimization to kick in) and there's some other magic that I need to do so that these implicit successors are correctly lowered?
It is perfectly valid for a MachineBasicBlock to fall through to the next Block:
That is valid. Passes that want to reorder basic blocks should only do
so if the AnalyzeBranch and related target hooks (Insert/Remove) allow
it.

Checking volatile value of address in C++

I'm trying to implement a mailbox write for the Raspberry Pi. According to the info I found, I can write to address 0x2000B8A0 when mailbox is empty, meaning 0x2000B898 has not the last bit set. I wrote it like this:
uint32_t *mailbox = reinterpret_cast<uint32_t*>(0x2000B880);
while((mailbox[6] & 0x80000000) != 0);
mailbox[8] = value + channel;
But the disassembly shows that the value at mailbox[6] is only loaded once, before the loop, then it just repeats the check with that one value.
I could not find a solution because I don't even know the proper words for this problem. I'm sure it's simple but googling brought nothing for this special case.
Answer lies in title of your question.
You should use the following:
volatile uint32_t *mailbox = const_cast<volatile uint32_t *>(reinterpret_cast<uint32_t*>(0x2000B880));
This will make sure the value is loaded each time in your loop. If you see any application not responding, consider adding some sleep or delay or yield in while.

Using Pow in C++ MinGW. Works hard coded but not with variables

This is hopefully a simple linker issue but I've spent hours searching and haven't moved forward in that time. I'm trying to use
#include <cmath>
double aA = 2;
double result = pow((double)2.0,(double)aA);
I get no error messages and it compiles without issue. But an unrelated grid I'm drawing with openGL doesn't display. If i substitute the aA for 2 then it displays the grid. Like
#include <cmath>
double aA = 2;
double result = pow((double)2.0,(double)2);
This outputs 4 as expected. The previous example outputs nothing. It's as if the program hangs but there are no errors.
This computation isn't used anywhere and in fact just sits in main (or anywhere else) and the variables are unique and are unused.
I'm using code::blocks and minGW GNU GCC compiler in Windows 7. -g -Wall - WExtra
Rendering with glew + freeglut and everything else works until i use a variable with pow.
I've tried every combination of casting I can think of and I've tried powf with the exact same result. I'm using sqrt and other functions so believe that the inclusion is working. I've also tried math.h but get the same problem.
I have never wished to see an error message from a compiler more so than I do right now.
So 1. Why am I not getting an error when it looks like its stopping the whole program in its tracks?
And 2. What have I missed to get pow() working with variables?
Update : After creating a new project and trying it out I have no issues so there must be something in my setup that's interfering. I'll keep experimenting. Thanks for the quick responses things sure move fast around here!
Update 2:
Very strange.
float aAs = 1.0;
float amplitudeA = (float)pow((float)2.,(float)aAs);
char str[50];
int test = (int) (amplitudeA);
sprintf (str, "out - %d", test);
MessageBox(NULL,str,NULL,NULL);
This outputs 2 in the message box. Then my grid draws and the program behaves. If i comment out only the message box like so:
float aAs = 1.0;
float amplitudeA = (float)pow((float)2.,(float)aAs);
char str[50];
int test = (int) (amplitudeA);
sprintf (str, "out - %d", test);
//MessageBox(NULL,str,NULL,NULL);
No drawing of my grid. What could be causing this?
char str[50];
int test = (int) (1);
sprintf (str, "out - %d", test);
MessageBox(NULL,str,NULL,NULL);
float aAs = 1.0;
float amplitudeA = (float)pow((float)2.,(float)aAs);
Swapping the message box over recreates the issue. No grid drawn. It's as if focus needs to be taken away from the program when I'm using a variable in pow. I'm completely baffled.
Another Update : I temporarily got around it by writing my own simple powerOf function. But now I'm having the same issue with the cos() function.
Can anyone tell me if there is something wrong with that image? This issue has to stem from incorrect linking. Is that what you would expect from hovering over coz in code::blocks with gcc?
This a error that occurs only when running through the program with a bad cos call. Interesting that I've been using cos for camera calculations since I started this app with no issue.
Error #667: UNADDRESSABLE ACCESS: reading 0x00000003-0x00000007 4 byte(s)
# 0 ntdll.dll!RtlImageNtHeader +0x124c (0x77ca43d0 <ntdll.dll+0x343d0>)
# 1 ntdll.dll!RtlImageNtHeader +0x422 (0x77ca35a7 <ntdll.dll+0x335a7>)
# 2 ntdll.dll!RtlImageNtHeader +0x30d (0x77ca3492 <ntdll.dll+0x33492>)
# 3 KERNEL32.dll!HeapFree +0x13 (0x775e14dd <KERNEL32.dll+0x114dd>)
# 4 atioglxx.dll!atiPPHSN +0x11afaa (0x66538f3b <atioglxx.dll+0xeb8f3b>)
# 5 atioglxx.dll!DrvSwapBuffers +0x33fb (0x6569b9cc <atioglxx.dll+0x1b9cc>)
# 6 atioglxx.dll!DrvSwapBuffers +0x3cad (0x6569c27e <atioglxx.dll+0x1c27e>)
# 7 atioglxx.dll!DrvSwapBuffers +0x7c57 (0x656a0228 <atioglxx.dll+0x20228>)
# 8 atioglxx.dll!DrvSwapBuffers +0x12c (0x656986fd <atioglxx.dll+0x186fd>)
# 9 atioglxx.dll!DrvValidateVersion +0x28 (0x65697c19 <atioglxx.dll+0x17c19>)
#10 OPENGL32.dll!wglSwapMultipleBuffers +0xc5d (0x66c8af0b <OPENGL32.dll+0x3af0b>)
#11 OPENGL32.dll!wglSwapMultipleBuffers +0xe45 (0x66c8b0f3 <OPENGL32.dll+0x3b0f3>)
Note: #0:00:05.233 in thread 3136
Note: instruction: mov 0x04(%ecx) -> %ecx
Solved. There was an uninitialized variable that was sitting at the bottom of the vertex buffer object I was using to draw the grid. For whatever reason feeding a variable to one of the math functions caused unexpected results in this buffer object.
Thanks to Angew an Kos for pointing me towards memory.