LLVM Backend: Patterns for register + offset addressing - llvm

My PIC24 processor offers register + offset addressing. To store a value I added
def MOV_reg2offset : InstReg2Offset<0b10011, (outs), (ins GPR:$Wd, GPR:$Ws, Slit10W:$Offset),
"mov\t$Ws, [$Wd + $Offset]",
[(store GPR:$Ws, (add GPR:$Wd, Slit10W:$Offset) )]>;
where GPR is a 16bit register class and Slit10W an even, signed 10 bit literal. Works perfectly!
Now I tried the same for a load instruction:
def MOV_offset2reg : InstReg2Offset<0b10010, (outs), (ins GPR:$Wd, GPR:$Ws, Slit10W:$Offset),
"mov\t[$Ws + $Offset], $Wd",
[(set GPR:$Wd, (load (add GPR:$Ws, Slit10W:$Offset) ))]>;
but tablegen crashed with an assertion violation.
Questions:
Is there something wrong with the syntax or semantics?
Or have I exceeded some theoretical limit about what can get matched? Maybe three levels in the pattern is too much?
Or does it look OK and I should try to update tablegen to the very latest version?

SOLVED: Simple copy and paste error - GPR:$Wd needs to be moved from 'ins' to 'outs'.
Now 'register + offset' addressing works for both read and write without any C++ code. Cool!

Related

I switched my project to x64 and got an error

I have this project that I'm working on for a while now, and it was working just fine before I switched from releasing an x86 version to an x64 one.
This is the line of code where the error appeared:
ReadProcessMemory(a.hProcess, LPCVOID(b->Ebx + 8), LPVOID(&c), 4, 0);
I don't understand assembly but when the Error appeared at b->Ebx I changed it to b->Rbx. and it compiled and run but it didn't do the job it was supposed to do.
Am I using the wrong register?
After a little bit of debugging
pe_dos_h = PIMAGE_DOS_HEADER(pe_image);
pe_nt_h = PIMAGE_NT_HEADERS(DWORD(pe_image) + pe_dos_h->e_lfanew);
//error when trying to assign a value or access the signature in nt-headers
//ERROR: read access violation
IMAGE_NT_HEADERS* pe_nt_h->Signature == IMAGE_NT_SIGNATURE;//0x00004550-PE00
The pe_image is raw data copied from a PE(.exe) file. Is the difference is in handling x86 PE image vs x64 one?
This is a simplified version of the code. If you think the problem is out of the scope of this piece of code let me know.
The problem was in the way I was defining the IMAGE_SECTION_HEADER, In x64 architecture, I had to use the x64-specific datatype IMAGE_SECTION_HEADER64.
and also when casting pe_nt_h = PIMAGE_NT_HEADERS(DWORD_PTR(pe_image) + pe_dos_h->e_lfanew);
I had to use DWORD_PTR instead of just DWORD because it was missing some bits.
here is the solution code for the simplified version I provided in the post:
PIMAGE_DOS_HEADER pe_dos_h = PIMAGE_DOS_HEADER(pe_image);
PIMAGE_NT_HEADERS64 pe_nt_h = PIMAGE_NT_HEADERS(DWORD_PTR(pe_image) + pe_dos_h->e_lfanew);

How to get FLOPS in RISC-V using SW or HW method?

I am a newbie to RISC-V. I wonder how I could get FLOPS using SW or HW method. I try to use CSR to get FLOPS, but there are some problems.
As I know, if I redesign the hpmcounter which counts every floating operation event, I could get FLOPS by using the csr read instruction. I know there is a similar design in the rocket-chip-based SiFive's U54-core manual. In the manual I can see SiFive core has sophisticated feature counting capabilities. This feature is controlled by the mhpmevent CSR. If I set lower eight bits of mhpmevent as 0, and enable the [19-25] bit, I can get counter value from mhpmcounter. I actually want to design this field like SiFive core.
I try to imitate it for FLOPS, but I encounter some problems.
I can't access to the mhpmcounter, and I can see the illegal instruction error like following link.
illegal instruction error message!!
I make a simple test code and compile it successfully, but there is a illegal instruction error when I implement it using spike and cycle accurate emulator. Both use proxy kernel.
// simple test code
unsigned long instret1 = 0;
unsigned long instret2 = 0;
float a,b,c;
a = 5.0;
b = 4.0;
asm volatile ("csrrs %0, mhpmcounter3, x0 " : "=r"(instret1));
c = a + b;
asm volatile ("csrrs %0, mhpmcounter3, x0 " : "=r"(instret2));
printf("instruction count : %ul \n", instret2-instret1);
It is hard to change to M-mode from user mode for access to the mhpmevet and mhpmcounter. In the RISC-V priv-spec 1.10, I find xRET instruction can change mode. Following text is about xRET in the spec.
The MRET, SRET, or URET instructions are used to return from traps in M-mode, S-mode, or
U-mode respectively. When executing an xRET instruction, supposing xPP holds the value y, x IE
is set to x PIE; the privilege mode is changed to y; x PIE is set to 1; and xPP is set to U (or M if
user-mode is not supported).
If someone knows it, I hope to see the detailed assembly code.
I try to modify rocket-chip/src/main/scala/rocket/CSR.scala for redesign CSR. Is it the only way? Firstly, I want to use spike to test the counter value. How should I change the code?
If anybody has some other ideas or has accomplished it, please point to me. Thanks!

Vista and FindFirstFileEx(..., FIND_FIRST_EX_LARGE_FETCH)

i have an application that queries a specific folder for its contents with a quick interval. Up to the moment i was using FindFirstFile but even by applying search pattern i feel there will be performance problems in the future since the folder can get pretty big; in fact it's not in my hand to restrict it at all.
Then i decided to give FindFirstFileEx a chance, in combination with some tips from this question.
My exact call is the following:
const char* search_path = "somepath/*.*";
WIN32_FIND_DATA fd;
HANDLE hFind = ::FindFirstFileEx(search_path, FindExInfoBasic, &fd, FindExSearchNameMatch, NULL, FIND_FIRST_EX_LARGE_FETCH);
Now i get pretty good performance but what about compatibility? My application requires Windows Vista+ but given the following, regarding FIND_FIRST_EX_LARGE_FETCH:
This value is not supported until Windows Server 2008 R2 and Windows 7.
I can pretty much compile it on my Windows 7 but what will happens if someone runs this on a Vista machine? Does the function downgrade to a 0
(default) in this case? It's safe to not test against operating system?
UPDATE:
I said above about feeling the performance is not good. In fact my numbers are following on a fixed set of files (about 100 of them):
FindFirstFile -> 22 ms
FindFirstFile -> 4 ms (using specific pattern; however all files may wanted)
FindFirstFileEx -> 1 ms (no matter patterns or full list)
What i feel about is what will happen if folder grows say 50k files? that's about x500 bigger and still not big enough. This is about 11 seconds for an application querying on 25 fps (it's graphical)
Just tested under WinXP (compiled under win7). You'll get 0x57 (The parameter is incorrect) when it calls ::FindFirstFileEx() with FIND_FIRST_EX_LARGE_FETCH. You should check windows version and dynamically choose the value of additional parameter.
Also FindExInfoBasic is not supported before Windows Server 2008 R2 and Windows 7. You'll also get run-time 0x57 error due to this value. It must be changed to another alternative if binary is run under old windows version.
at first periodic queries a specific folder for its contents with a quick interval - not the best solution I think.
you need call ReadDirectoryChangesW - as result you will be not need do periodic queries but got notifies when files changed in directory. the best way bind directory handle with BindIoCompletionCallback or CreateThreadpoolIo and call first time direct call ReadDirectoryChangesW. then when will be changes - you callback will be automatic called and after you process data - call ReadDirectoryChangesW again from callback - until you got STATUS_NOTIFY_CLEANUP (in case BindIoCompletionCallback) or ERROR_NOTIFY_CLEANUP (in case CreateThreadpoolIo) in callback (this mean you close directory handle for stop notify) or some error.
after this (first call to ReadDirectoryChangesW ) you need call FindFirstFileEx/FindNextFile loop but only once - and handle all returned files as FILE_ACTION_ADDED notify
and about performance and compatibility.
all this is only as information. not recommended to use or not use
if you need this look to - ZwQueryDirectoryFile - this give you very big win performance
you only once need open File handle, but not every time like with
FindFirstFileEx
but main - look to ReturnSingleEntry parameter. this is key
point - you need set it to FALSE and pass large enough buffer to
FileInformation. if set ReturnSingleEntry to TRUE function
and return only one file per call. so if folder containing N files -
you will be need call ZwQueryDirectoryFile N times. but with
ReturnSingleEntry == FALSE you can got all files in single call, if buffer will be large enough. in all case you serious reduce
the number of round trips to the kernel, which is very costly
operation . 1 query with N files returned much more faster than N
queries. the FIND_FIRST_EX_LARGE_FETCH and do this - set
ReturnSingleEntry to TRUE - but in current implementation (i check this on latest win 10) system do this only in
FindNextFile calls, but in first call to
FindFirstFileEx it (by unknown reason) still use
ReturnSingleEntry == TRUE - so will be how minimum 2 calls to the ZwQueryDirectoryFile, when possible have single call
(if buffer will be large enough of course) and if direct use
ZwQueryDirectoryFile - you control buffer size. you can
allocate once say 1MB for buffer, and then use it in periodic
queries. (without reallocation). how large buffer use
FindFirstFileEx with FIND_FIRST_EX_LARGE_FETCH - you can
not control (in current implementation this is 64kb - quite
reasonable value)
you have much more reach choice for FileInformationClass - less
informative info class - less data size, faster function worked.
about compatibility? this exist and worked from how minimum win2000 to latest win10 with all functional. (in documentation - "Available starting with Windows XP", however in ntifs.h it declared as #if (NTDDI_VERSION >= NTDDI_WIN2K) and it really was already in win2000 - but no matter- XP support more than enough now)
but this is undocumented, unsupported, only for kernel mode, no lib file.. ?
documented - as you can see, this is both for user and kernel mode - how you think - how is FindFirstFile[Ex] / FindNextFile - working ? it call ZwQueryDirectoryFile - no another way. all calls to kernel only through ntdll.dll - this is fundamental. ( yes still possible that ntdll.dll stop export by name and begin export by ordinal only for show what is unsupported really was). lib file exist, even two ntdll.lib and ntdllp.lib (here more api compare first) in any WDK. headers, where declared ? #include <ntifs.h>. but it conflict with #include <windows.h> - yes conflict, but if include ntifs.h in namespace with some tricks - possible avoid conflicts

How can i track a specific loop in binary instrumentation by using pin tool?

I am fresh in using intel pin tool, and want to track a certain loop in a binary file, but i found in each run the address of the instructions changed in each run, how can i find a specific instruction or a specific loop even it change in each run ? Edit 0: I have the following address, which one of them is the RVA:( the first section of address(small address) are constant for each run, but the last section(big address) changed for each run) Address loop_repeation No._of_Instruction_In_Loop
4195942 1 8
4195972 1 3
....... ... ...
140513052566480 1 2
...... ... ...
the address of the instructions changed in each run, how can i find a specific instruction or a specific loop even it change in each run ?
This is probably because you have ASLR enabled (which is enabled by default on Ubuntu). If you want your analyzed program to load at the same address in each run, you might want to:
1) Disable ASLR:
Disable it system-wide: sysctl -w kernel.randomize_va_space=0 as explained here.
Disable it per process: $> setarch $(uname -m) -R /bin/bash as explained here.
2) Calculate delta (offsets) in your pintool:
For each address that you manipulate, you need to use a RVA (Relative Virtual Address) rather than a full VA (Virtual Address).
Example:
Let's say on your first run your program loads at 0x80000000 (this is the "Base Address"), and a loop starts at 0x80000210
On the second run, the program loads at 0x90000000 ("Base Address") and the loops starts at 0x90000210
Just calculate the offsets of the loops from the Base Address:
Base_Address - Program_Address = offset
0x80000210 - 0x80000000 = 0x210
0x90000210 - 0x90000000 = 0x210
As both resulting offsets are the same, you know you have the exactly the same instruction, independently of the base address of the program.
How to do that in your pintool:
Given an (instruction) address, use IMG_FindByAddress to find the corresponding image (module).
From the image, use IMG_LowAddress to get the base address of the module.
Subtract the module base from the instruction: you have the RVA.
Now you can compare RVA between them and see if they are the same (they also must be in the same module).
Obviously this doesn't work for JITed code as JITed code has no executable module (think mmap() [linux] or VirtualAlloc() [windows])...
Finally there's a good paper (quite old now, but still applicable) on doing a loop detection with pin, if that can help you.

Need an example of Ypsilon usage

I started to mess with Ypsilon, which is a C++ implementation of Scheme.
It conforms R6RS, features fast garbage collector, supports multi-core CPUs and Unicode but has a LACK of documentation, C++ code examples and comments in the code!
Authors provide it as a standalone console application.
My goal is to use it as a scripting engine in an image processing application.
The source code is well structured, but the structure is unfamiliar.
I spent two weeks penetrating it, and here's what I've found out:
All communication with outer world is done via C++ structures called
ports, they correspond to Scheme ports.
Virtual machine has 3 ports: IN, OUT and ERROR.
Ports can be std-ports (via console), socket-ports,
bytevector-ports, named-file-ports and custom-ports.
Each custom port must provide a filled structure called handlers.
Handlers is a vector containing 6 elements: 1st one is a boolean
(whether
port is textual), and other five are function pointers (onRead, onWrite, onSetPos, onGetPos, onClose).
As far as I understand, I need to implement 3 custom ports (IN, OUT and ERROR).
But for now I can't figure out, what are the input parameters of each function (onRead, onWrite, onSetPos, onGetPos, onClose) in handlers.
Unfortunately, there is neither example of implementing a custom port no example of following stuff:
C++ to Scheme function bindings (provided examples are a bunch of
.scm-files, still unclear what to do on the C++ side).
Compiling and
running bytecode (via bytevector-ports? But how to compile text to
bytecode?).
Summarizing, if anyone provides a C++ example of any scenario mentioned above, it would significantly save my time.
Thanks in advance!
Okay, from what I can read of the source code, here's how the various handlers get called (this is all unofficial, based purely on source code inspection):
Read handler: (lambda (bv off len)): takes a bytevector (which your handler will put the read data into), an offset (fixnum), and a length (fixnum). You should read in up to len bytes, placing those bytes into bv starting at off. Return the number of bytes actually read in (as a fixnum).
Write handler: (lambda (bv off len)): takes a bytevector (which contains the data to write), an offset (fixnum), and a length (fixnum). Grab up to len bytes from bv, starting at off, and write them out. Return the number of bytes actually written (as a fixnum).
Get position handler: (lambda (pos)) (called in text mode only): Allows you to store some data for pos so that a future call to the set position handler with the same pos value will reset the position back to the current position. Return value ignored.
Set position handler: (lambda (pos)): Move the current position to the value of pos. Return value ignored.
Close handler: (lambda ()): Close the port. Return value ignored.
To answer another question you had, about compiling and running "bytecode":
To compile an expression, use compile. This returns a code object.
There is no publicly-exported approach to run this code object. Internally, the code uses run-vmi, but you can't access this from outside code.
Internally, the only place where compiled code is loaded and used is in its auto-compile-cache system.
Have a look at heap/boot/eval.scm for details. (Again, this is not an official response, but based purely on personal experimentation and source code inspection.)