can GCC print out intermediate results? - c++

Check the code below:
#include <avr/io.h>
const uint16_t baudrate = 9600;
void setupUART( void ) {
uint16_t ubrr = ( ( F_CPU / ( 16 * (float) baudrate ) ) - 1 + .5 );
UBRRH = ubrr >> 8;
UBRRL = ubrr & 0xff;
}
int main( void ) {
setupUART();
}
This is the command used to compile the code:
avr-gcc -g -DF_CPU=4000000 -Wall -Os -Werror -Wextra -mmcu=attiny2313 -Wa,-ahlmns=project.lst -c -o project.o project.cpp
ubrr is calculated by the compiler as 25, so far so good. However, to check what the compiler calculated, I have peek into the disassembly listing.
000000ae <setupUART()>:
ae: 12 b8 out UBRRH, r1 ; 0x02
b0: 89 e1 ldi r24, 0x19 ; 25
b2: 89 b9 out UBRRL, r24 ; 0x09
b4: 08 95 ret
Is it possible to make avr-gcc print out the intermediate result at compile time (or pull the info from the .o file), so when I compile the code it prints a line like (uint16_t) ubbr = 25 or similar? That way I can do a quick sanity check on the calculation and settings.

GCC has command line options to request that it dump out its intermediate representation after any stage of compilation. The "tree" dumps are in pseudo-C syntax and contain the information you want. For what you're trying to do, the -fdump-tree-original and -fdump-tree-optimized dumps happen at useful points in the optimization pipeline. I don't have an AVR compiler to hand, so I modified your test case to be self-contained and compilable with the compiler I do have:
typedef unsigned short uint16_t;
const int F_CPU = 4000000;
const uint16_t baudrate = 9600;
extern uint16_t UBRRH, UBRRL;
void
setupUART(void)
{
uint16_t ubrr = ((F_CPU / (16 * (float) baudrate)) - 1 + .5);
UBRRH = ubrr >> 8;
UBRRL = ubrr & 0xff;
}
and then
$ gcc -O2 -S -fdump-tree-original -fdump-tree-optimized test.c
$ cat test.c.003t.original
;; Function setupUART (null)
;; enabled by -tree-original
{
uint16_t ubrr = 25;
uint16_t ubrr = 25;
UBRRH = (uint16_t) ((short unsigned int) ubrr >> 8);
UBRRL = ubrr & 255;
}
$ cat test.c.149t.optimized
;; Function setupUART (setupUART, funcdef_no=0, decl_uid=1728, cgraph_uid=0)
setupUART ()
{
<bb 2>:
UBRRH = 0;
UBRRL = 25;
return;
}
You can see that constant-expression folding is done so early that it's already happened in the "original" dump (which is the earliest comprehensible dump you can have), and that optimization has further folded the shift and mask operations into the statements writing to UBRRH and UBRRL.
The numbers in the filenames (003t and 149t) will probably be different for you. If you want to see all the "tree" dumps, use -fdump-tree-all. There are also "RTL" dumps, which don't look anything like C and are probably not useful to you. If you're curious, though, -fdump-rtl-all will turn 'em on. In total there are about 100 tree and 60 RTL dumps, so it's a good idea to do this in a scratch directory.
(Psssst: Every time you put spaces on the inside of your parentheses, God kills a kitten.)

There might be a solution for printing intermediate results, but it will take you some time to be implemented. So it is worthwhile only for a quite large source code base.
You could customize your GCC compiler; either thru a plugin (painfully coded in C or C++) or thru a MELT extension. MELT is a high-level, Lisp-like, domain specific language to extend GCC. (It is implemented as a [meta-]plugin for GCC and is translated to C++ code suitable for GCC).
However such an approach requires you to understand GCC internals, then to add your own "optimization" pass to do the aspect oriented programming (e.g. using MELT) to print the relevant intermediate results.
You could also look not only the generated assembly (and use -fverbose-asm -S as options to GCC) but also perhaps in the generated Gimple representations (perhaps with -fdump-tree-gimple). For some interactive tool, consider the graphical MELT probe.
Perhaps adding your own builtin (with a MELT extension) like __builtin_display_compile_time_constant might be relevant.

I doubt there is an easy way to determine what the compiler does. There may be some tools in gcc specifically to dump the intermediate form of the language, but it will definitely not be easy to read, and unless you REALLY suspect that the compiler is doing something wrong (and have a VERY small example to show it), it's unlikely you can use it for anything meaningful - simply because it is too much work to follow what is going on.
A better approach is to add temporary variables (and perhaps prints) to your code, if you worry about it being correct:
uint16_t ubrr = ( ( F_CPU / ( 16 * (float) baudrate ) ) - 1 + .5 );
uint8_t ubrr_high = ubrr >> 8
uint8_t ubrr_low = ubrr & 0xff;
UBRRH = ubrr_high;
UBRRL = ubrr_low;
Now, if you have a non-optimized build and step through it in GDB, you should be able to see what it does. Otherwise, adding printouts of some sort to the code to show what the values are...
If you can't print it on the target system because you are in the process of setting up the uart that you will be using to print with, then replicate the code on your local host system and debug it there. Unless the compiler is very buggy, you should get the same values from the same compilation.

Here's a hack: simply automate what you are doing by hand now.
In your makefile, ensure that avr-gcc produces a disassembly (-ahlms=output.lst). Alternatively, use your own dissassembly method as a post-compile step in your makefile.
As a post-compilation step, process your listing file using your favorite scripting language to look for out UBRRH and out UBRRL lines. These are going to be loaded from registers, so your script can pull out the immediately preceeding assignments to registers that will be loaded into UBRRH and UBRRL. The script can then reassemble the UBRR value from the value loaded into the general-purpose registers which whhich are used to set UBRRH and UBRRL.
This sounds to be easier than Basile Starynkevich's very useful suggestion of MELT extension. Now, granted that this solution seems fragile, at first blush, so let's consider this issue:
We know that (at least on your processor) the lines out UBRR_, r__ will appear in the disassembly listing: there is simply no other way to set the registers/write data to port. One thing that might change is the spacing in/around these lines but this can be easily handled by your script
We also know that out instructions can only take place from general-purpose registers, so we know there will be a general-purpose register as the second argument to the out instruction line, so that should not be a problem.
Finally, we also know that this register will be set prior to out instruction. Here we must allow for some variability: instead of LDI (load immediate), avr-gcc might produce some other set of instructions to set the register value. I think as a first pass the script should be able to parse immediate loading, and otherwise dump whatever last instruction it finds involving the register that will be written to UBRR_ ports.
The script may have to change if you change platforms (some processors have UBRRH1/2 registers instea of UBRRH, however in that case you baud code will have to change. If the script complains that it can't parse the disassembly then you'll at least know that your check has not been performed.

Related

How to get FLOPS in RISC-V using SW or HW method?

I am a newbie to RISC-V. I wonder how I could get FLOPS using SW or HW method. I try to use CSR to get FLOPS, but there are some problems.
As I know, if I redesign the hpmcounter which counts every floating operation event, I could get FLOPS by using the csr read instruction. I know there is a similar design in the rocket-chip-based SiFive's U54-core manual. In the manual I can see SiFive core has sophisticated feature counting capabilities. This feature is controlled by the mhpmevent CSR. If I set lower eight bits of mhpmevent as 0, and enable the [19-25] bit, I can get counter value from mhpmcounter. I actually want to design this field like SiFive core.
I try to imitate it for FLOPS, but I encounter some problems.
I can't access to the mhpmcounter, and I can see the illegal instruction error like following link.
illegal instruction error message!!
I make a simple test code and compile it successfully, but there is a illegal instruction error when I implement it using spike and cycle accurate emulator. Both use proxy kernel.
// simple test code
unsigned long instret1 = 0;
unsigned long instret2 = 0;
float a,b,c;
a = 5.0;
b = 4.0;
asm volatile ("csrrs %0, mhpmcounter3, x0 " : "=r"(instret1));
c = a + b;
asm volatile ("csrrs %0, mhpmcounter3, x0 " : "=r"(instret2));
printf("instruction count : %ul \n", instret2-instret1);
It is hard to change to M-mode from user mode for access to the mhpmevet and mhpmcounter. In the RISC-V priv-spec 1.10, I find xRET instruction can change mode. Following text is about xRET in the spec.
The MRET, SRET, or URET instructions are used to return from traps in M-mode, S-mode, or
U-mode respectively. When executing an xRET instruction, supposing xPP holds the value y, x IE
is set to x PIE; the privilege mode is changed to y; x PIE is set to 1; and xPP is set to U (or M if
user-mode is not supported).
If someone knows it, I hope to see the detailed assembly code.
I try to modify rocket-chip/src/main/scala/rocket/CSR.scala for redesign CSR. Is it the only way? Firstly, I want to use spike to test the counter value. How should I change the code?
If anybody has some other ideas or has accomplished it, please point to me. Thanks!

Removing OPENSSL_cleanse from OpenSSL-1.0.1r

I found out that OPENSSL_cleanse wastes a lot of time in my project. For example, if it runs for 25 seconds, 3 seconds are wasted by OPENSSL_cleanse. I checked the code of this function and decided that it isn't doing anything very useful for me. I know it fills memory with garbage data for security reasons but I don't really care about it. So I decided to place return; just before the start of any operations in this function.
void OPENSSL_cleanse(void *ptr, size_t len)
{
return;
// original OpenSSL code goes here
}
I'm using Mac OS and Xcode. I've compiled the lib and installed it in /Users/ForceBru/Desktop/openssl via the --openssldir option of the Configure script. I've added it to my project in Build Settings->Link Binary With Libraries and added include dirs in Build Settings->Search Paths->Header Search Paths and Build Settings->Search Paths->Library Search Paths.
The project compiled fine, but the time profiler still shows pretty expensive calls to OPENSSL_cleanse.
Edit: the C tag is because OpenSSL is written in C, and the C++ tag is because my code is in C++. Maybe this information will be helpful.
The question is, what am I doing wrong? How do I remove the calls to OPENSSL_cleanse? I think this has to do with linking, because the command line includes -lcrypto, which means this library can actually be taken from anywhere (right?), not necessarily from /Users/ForceBru/Desktop/openssl.
Edit #2: I've edited the linker options to use the .a file in /Users/ForceBru/Desktop/openssl and removed it from Build Settings->Link Binary With Libraries. Still no effect.
It turns out that OpenSSL has lots of assembly code generated by some Perl scripts that are located in the crypto directory (*cpuid.pl). These scripts generate assembly code for the following architectures: alpha, armv4, ia64, ppc, s390x, sparc, x86 and x86_64.
When make runs, the appropriate script fires generating a *cpuid.S (where * is one of the architectures mentioned earlier). These files are compiled into the library and seem to override the OPENSSL_cleanse implemented in crypto/mem_clr.c.
What I had to do is to simply change the body of OPENSSL_cleanse to ret in x86_64cpuid.pl:
.globl OPENSSL_cleanse
.type OPENSSL_cleanse,\#abi-omnipotent
.align 16
OPENSSL_cleanse:
ret
# loads of OPENSSL assembly
.size OPENSSL_cleanse,.-OPENSSL_cleanse
This isn't quite the answer that you were looking for, but it may help you along...
Removing OPENSSL_cleanse from OpenSSL-1.0.1r...
I checked the code of this function and decided that it isn't doing anything very useful for me...
That's probably a bad idea, but we would need to know more about your threat model. Zeroization allows you to deterministically remove sensitive material from memory.
Its also a Certification and Accreditation (C&A) item. For example, FIPS 140-2 requires zeroization even at Level 1.
Also, you can't remove OPENSSL_cleanse per se because OPENSSL_clear_realloc, OPENSSL_clear_free and friends call it. Also see the OPENSSL_cleanse man page.
For example, if it runs for 25 seconds, 3 seconds are wasted by OPENSSL_cleanse
OK, so this is a different problem. OPENSSL_cleanse is kind of convoluted, and it does waste some cycles in an effort to survive the optimization pass.
If you check Commit 380f18ed5f140e0a, then you will see it has been changed in OpenSSL 1.1.0 to the following. Maybe you could use it instead?
diff --git a/crypto/mem_clr.c b/crypto/mem_clr.c
index e6450a1..3389919 100644 (file)
--- a/crypto/mem_clr.c
+++ b/crypto/mem_clr.c
## -59,23 +59,16 ##
#include <string.h>
#include <openssl/crypto.h>
-extern unsigned char cleanse_ctr;
-unsigned char cleanse_ctr = 0;
+/*
+ * Pointer to memset is volatile so that compiler must de-reference
+ * the pointer and can't assume that it points to any function in
+ * particular (such as memset, which it then might further "optimize")
+ */
+typedef void *(*memset_t)(void *,int,size_t);
+
+static volatile memset_t memset_func = memset;
void OPENSSL_cleanse(void *ptr, size_t len)
{
- unsigned char *p = ptr;
- size_t loop = len, ctr = cleanse_ctr;
-
- if (ptr == NULL)
- return;
-
- while (loop--) {
- *(p++) = (unsigned char)ctr;
- ctr += (17 + ((size_t)p & 0xF));
- }
- p = memchr(ptr, (unsigned char)ctr, len);
- if (p)
- ctr += (63 + (size_t)p);
- cleanse_ctr = (unsigned char)ctr;
+ memset_func(ptr, 0, len);
}
Also see Issue 455: Reimplement non-asm OPENSSL_cleanse() on OpenSSL's GitHub.
How do I remove the calls to OPENSSL_cleanse?
OK, so this is a different problem. You have to locate all callers and do something with each. It looks like there's about 185 places you will need to modify things:
$ cd openssl
$ grep -IR _cleanse * | wc -l
185
Instead of this:
void OPENSSL_cleanse(void *ptr, size_t len)
{
return;
// original OpenSSL code goes here
}
Maybe you can delete the function, and then:
#define OPENSSL_cleanse(x, y)
Then the function calls becomes a macro that simply disappears during optimization. Be sure to perform a make clean after changing from a function to a macro.
But I would not advise doing so.
The project compiled fine, but the time profiler still shows pretty expensive calls to OPENSSL_cleanse.
My guess here is either (1) you did not perform a make clean after the changes to the OpenSSL library, or (2) you compiled and linked to the wrong version of the OpenSSL library. But I could be wrong on both.
You can see what your executable's runtime dependencies are with otool -L. Make sure its the expected one. Also keep in mind OpenSSL does not use -install_name.
Before you run your executable, you can set DYLD_LIBRARY_PATH to ensure the dylib you are modifying is loaded. Also see the dyld(1) man pages.

How to define _FILE_OFFSET_BITS & _LARGE_FILES macro for Solaris and HP-AIX

I Have a C Program as follow.
I don't what to use stat64 instead of stat in both Solaris & HP-AIX.
I want to build this Program this on both Solaris & HP-AIX.
#include "zunx.h"
#include <nls.h>
/*
* NAME: zunx_file_exists
*
* PURPOSE: Checks if a file exists.
*
* INVOCATION: boolean zunx_file_exists(name)
* char *name;
*
* INPUTS: name - file to check
*
* OUTPUTS: TRUE or FALSE
*
* DESCRIPTION: zunx_file_exists does a stat on the specified file,
* and returns TRUE if a stat is found. No check is
* made to determine what type of file it is.
*/
boolean zunx_file_exists
(const char *buf)
{
#if defined(UNIX)
struct stat fstat;
if (buf != NULL && stat(I2E1(buf), &fstat) == 0)
return TRUE;
else
return FALSE;
#endif
#ifdef NT_OS
struct _stat64 fstat;
if (buf != NULL && _stat64((char *) I2E1(buf), &fstat) == 0)
return TRUE;
else
return FALSE;
#endif
}
I came across a macro in Solaris like :
#ifdef UNIX
#define _FILE_OFFSET_BITS 64
#endif
is this definition is correct for above program?
for HP-AIX its use _LARGE_FILES macro.
but I don't know how to define this macro in above program in order to run successfully on both OS.
Please suggest some ideas.
Detailed data for accessing large files, including large file compile, link, and other flags can be found for Solaris on the lfcompile man page:
lfcompile
large file compilation environment for 32-bit applications
Description
All 64-bit applications can manipulate large files by default. The
methods described on this page allow 32-bit applications to manipulate
large files.
...
Note that the man page specifically states that defines other than those returned via getconf are necessary:
Set the compile-time flag _FILE_OFFSET_BITS to 64 before including any headers. Applications may combine objects produced in the large file compilation environment with objects produced in the transitional compilation environment, but must be careful with respect to interoperability between those objects. Applications should not declare global variables of types whose sizes change between compilation environments.
along with
Applications wishing to access fseeko() and ftello() as well as the POSIX and X/Open specification-conforming interfaces should define the macro _LARGEFILE_SOURCE to be 1 and set whichever feature test macros are appropriate to obtain the desired environment (see standards(5)).
See the examples section of the man page for details.
The exact names and values of these #define's are implementation dependent. Fortunately, the getconf shell command will tell you what these are when you pass it the LFS_CFLAGS parameter. You can then pass them in on the command line when you compile.
gcc `getconf LFS_CFLAGS` -o program program.c
Linux/Solaris/HP-UX
-D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE
AIX
-D_LARGE_FILES
Effective for AIX >= 5.1
See also: https://www.ibm.com/docs/en/aix/7.1?topic=volumes-writing-programs-that-access-large-files

Converting bits from one array to another?

I am building a library for this vacuum fluorescent display. Its a very simple interface and I have all the features working.
The problem I am having now is that I am trying to make the code as compact as possable, but the custom character loading is not intuitive. That is the bitmap for the font maps to completely different bits and bytes to the display itself. From the IEE VFD datasheet, when you scroll down you see that the bits are mapped all over the place.
The code I have so far works like so:
// input the font bitmap, the bit from that line of the bitmap and the bit it needs to go to
static unsigned char VFD_CONVERT(const unsigned char* font, unsigned char from, unsigned char to) {
return ((*font >> from) & 0x01) << to;
//return (*font & (1 << from)) ? (1<<to) : 0;
}
// macros to make it easyer to read and see
#define CM_01 font+0, 4
#define CM_02 font+0, 3
#define CM_03 font+0, 2
#define CM_04 font+0, 1
#define CM_05 font+0, 0
// One of the 7 lines I have to send
o = VFD_CONVERT(CM_07,6) | VFD_CONVERT(CM_13,5) | VFD_CONVERT(CM_30,4) | VFD_CONVERT(CM_23,3) | VFD_CONVERT(CM_04,2) | VFD_CONVERT(CM_14,1) | VFD_CONVERT(CM_33,0);
send(o);
This is oviously not all the code. You can see the rest over my Google code repository but it should give you some idea what I am doing.
So the question I have is if there is a better way to optimize this or do the translation?
Changing the return statement on VFD_CONVERT makes GCC go crazy (-O1, -O2, -O3, and -Os does it) and expands the code to 1400 bytes. If I use the return statement with the inline if, it reduces it to 800 bytes. I have been going though the asm generated statements and current I am tempted to just write it all in asm as I am starting to think the compiler doesn't know what it is doing. However I thought maybe its me and I don't know what I am doing and so it confuses the compiler.
As a side note, the code there works, both return statements upload the custom character and it gets displayed (with a weird bug where I have to send it twice, but that's a separate issue).
First of all, you should file a bug report against gcc with a minimal example, since -Os should never generate larger code than -O0. Then, I suggest storing the permutation in a table, like this
const char[][] perm = {{ 7, 13, 30, 23, 4, 14, 33}, ...
with special values indicating a fixed zero or one bit. That'll also make your code more readable.

Arduino ports registers in Eclipse are not working

I am building a Arduino application in C with Eclipse and the Arduino plugin. In my old code I used pinMode and digitalWrite. But as you all know, this uses more space. So I am re-building my code now with port manipulation. If you don't know what that is, you can see it here: http://www.arduino.cc/en/Reference/PortManipulation
I will explain what I did.
Where there stands pinMode, I changed it to something like this: DDRD = 0b11111111;
And where there stands digitalWrite, I changed it to PORTD = 0b10000000;
You can see it in my code below.
Eclipse is now giving me the error (highlighting the words DDRD and PORTD with a red line) of symbol not resolved for DDRD and PORTD, but the program is building and running normal. How do I solve this?
#include <avr/io.h>
#include <util/delay.h>
int main()
{
UCSR0B = 0; // disconnect pins 0 and 1 from USART (Serial)
DDRD = 0b11111111; // all pins of port D as output
for(;;)
{
PORTD = 0b10000000; // Pin 7 on
_delay_ms(500); // Wait
}
}
These are multi-level macros which encode direct volatile access to the SFR locations.
They are individually defined in one of an assortment of chip-specific header files which avr/io.h will include when it is informed of the specific CPU variant you are using.
Normally this is done with the -mmcu flag to avr-gcc, for example
-mmcu=atmega328p
However, if the Eclipse plugin does it's own pass through the project sources to try to give you advisory errors, it may not be smart enough to turn that into a define (you can get Eclipse claiming errors even when gcc is happy). To work around that, you may need to explicitly define the cpu type above the include in your code, or in some configuration for Eclipse. For example:
#ifndef __AVR_ATmega328P__
#define __AVR_ATmega328P__
#endif
#include <avr/io.h>
Note that this may cause problems if you later change the processor type! And even as is, it's a little iffy as there are two potential names for each processor.
I have faced the same problem. This is what solved for me in Eclipse.
From project explorer right click on the active project > Index > Rebuild