How can i place a break point with logical condition in Trace 32 - trace32

I want to set a break point, which stops my application when two variables contain a certain value. E.g. Stop execution when both x==10 and y==11.
How can I achieve that in Lauterbach TRACE32?

The command Var.Break.Set has an option \VarCONDition, which allows you to specify a condition under which the CPUs stays stopped once it hits the corresponding breakpoint. (In the dialog for setting breakpoints you'll find also the field "Condition" for that, when you click on "advanced".)
So for you scenario the required two commands would be:
Var.Break.Set x /Write /VarCONDition (x==10 && y==11)
Var.Break.Set y /Write /VarCONDition (x==10 && y==11)
As a result the CPU is stopped on every write to x or y, but gets immediately restarted when condition "x==11 && y==11" is not met.
Of course x and y must be located in memory. It won't work if the variable is implemented in a CPU register (unless you have one of these rarer CPU, that supports read/write breakpoints on core registers.)
In case you are using a Cortex-A or Cortex-R CPU you must also add the option "/AfterStep" since these processors have a break-before-make behavior also for address-write-breakpoints.
If your CPU supports data value breakpoints (e.g. Cortex-M4) you can set the breakpoints also like this
Var.Break.Set x /Write /DATA 10. /VarCONDition y==11
Var.Break.Set y /Write /DATA 11. /VarCONDition x==10.
This is much better since it only stops the CPU when the correct value is written to x or y and stays stopped if also the other variable has the correct value.

Related

Second real written to stdout with the P descriptor is wrong by factor 10

Here's a minimal working example:
program test_stuff
implicit none
real :: b
b = 10000.0
write(*,'(A10,1PE12.4,F12.4)') "b, b: ", b, b
end program
which I simply compile with gfortran test_stuff.f90 -o test_stuff
However, running the program gives the following output:
$ ./test_stuff
b, b: 1.0000E+04 100000.0000
The second real written to the screen is wrong by a factor of 10.
This happens with gfotran 9.3.0 as well as 10.2.0, so I definitely must be doing something wrong, but I can't see what it is. Can anybody spot what I'm doing wrong?
The P control edit descriptor "temporarily changes" (Fortran 2018 13.8.5) the scale factor connection mode of the connection.
However, what is meant by temporary is until the mode is changed again or until the end of the data transfer statement: (Fortran 2018 12.5.2)
Edit descriptors take effect when they are encountered in format processing. When a data transfer statement terminates, the values for the modes are reset to the values in effect immediately before the data transfer statement was executed.
In the case of the question, both output values are thus processed with the scale factor having value 1.
This scale factor is responsible for the "wrong" second value: there is a difference in interpretation of the scale factor for E and F editing. For E editing the scale factor simply changes the representation, with the external and internal values the same (with the significand scaled up by 10 and the exponent reduced by 1), but for F editing the output value is itself scaled:
On output, with F output editing, the effect is that the externally represented number equals the internally represented number multiplied by 10k
So while 10000 would be represented by 0.1000E+05 with scale factor 0 and 1.0000E+04 with scale factor 1 under E12.4, under F12.4 the value 10000 is scaled to 100000 with the scale factor in place.
As a style note: although the comma is optional between 1P and E12.4 (and similar), many would regard it much better to include the comma, precisely to avoid this apparent tight coupling of the two descriptors (or looking like one descriptor). As the scale factor has a different effect for each of E and F, has no effect for EN and sometimes but not always has an effect with G, I'm not going to argue with anyone who calls P evil.
You are looking for section 12.5.2 of the Fortran 2018 standard.
A connection for formatted input/output has several changeable modes: these are ... and scale factor (13.8.5).
Values for the modes of a connection are established when the connection is initiated. If the connection is initiated by an OPEN statement, the values are as specified, either explicitly or implicitly, by the OPEN statement. If the connection is initiated other than by an OPEN statement (that is, if the file is an internal file or pre-connected file) the values established are those that would be implied by an initial OPEN statement without the corresponding keywords.
The scale factor cannot be explicitly specified in an OPEN statement; it is implicitly 0.
The modes of a connection can be temporarily changed by ... or by an edit descriptor. ... Edit descriptors take effect when they are encountered in format processing. When a data transfer statement terminates, the values for the modes are reset to the values in effect immediately before the data transfer statement was executed.
So when you used 1P in your format, you changed the mode for the connection. This applies to all output items after the 1P has been processed. When the write statement completes the scale factor is reset to 0.

changing thread number doesn't affect code

I am trying to learn xeon-phi , and while studying the Intel Xeon-Phi Coprocessor HPC book , I tried to run the code here. (from book)
The code uses openmp and 2 threads.
But the results I am taking are the same as running with 1 thread.
( no use of openmp at all )
I even used in mic different combinations but still the same:
export OMP_NUM_THREADS=2
export MIC_OMP_NUM_THREADS=124
export MIC_ENV_PREFIX=MIC
It seems that somehow openmp is not enabled?Am I missing something here?
The code using only 1 thread is here
I compiled using:
icc -mmic -openmp -qopt-report -O3 hello.c
Thanks!
I am not sure exactly which book you are talking about, but perhaps this will help.
The code you show does not use the offload programming style and must be run natively on the the coprocessor, meaning you copy the executable to the coprocessor and run it there or you use the micnativeloadex utility to run the code from the host processor. You show that you know the code must be run natively because you compiled it with the -mmic option.
If you use micnativeloadex, then the number of omp threads on the coprocessor is set by executing "export MIC_OMP_NUM_THREADS=124" on the host. If you copy the executable to the coprocessor and then log in to run it there, the number of omp threads on the coprocessor is set by executing "export OMP_NUM_THREADS=124" on the coprocessor. If you use "export OMP_NUM_THREADS=2" on the coprocessor, you get only two threads; the MIC_OMP_NUM_THREADS environment variable is not used if you set it directly on the coprocessor.
I don't see any place in the code where it prints out the number of threads, so I don't know for sure how you determined the number of threads actually being used. I suspect you were using a tool like micsmc. However micsmc tells you how may cores are in use, not how many threads are in use.
By default, the omp threads are laid out in order, so that the first core would run threads 0,1,2,3, the second core would run threads 4,5,6,7 and so on. If you are using only two threads, both threads would run on the first core.
So, is that what you are seeing - not that you are using only one thread but instead that you are using only one core?
I was looking at the serial version of the code you are using. For the following lines:
for(j=0; j<MAXFLOPS_ITERS; j++)
{
//
// scale 1st array and add in the 2nd array
// example usage - y = mx + b;
//
for(k=0; k<LOOP_COUNT; k++)
{
fa[k] = a * fa[k] + fb[k];
}
}
I see that here you do not scan the complete array. Instead you keep on updating the first 128 (LOOP_COUNT) elements of the array Fa. If you wish to compare this serial version to the parallel code you are referring to, then you will have to ensure that the program does same amount of work in both versions.
Thanks
I noticed three things in your first program omp:
the total floating point operations should reflect the number of threads doing the work. Therefore,
gflops = (double)( 1.0e-9*LOOP_COUNTMAXFLOPS_ITERSFLOPSPERCALC*numthreads);
You harded code the number of thread = 2. If you want to use the OMP env variable, you should comment out the API "omp_set_num_threads(2);"
After transferring the binary to the coprocessor, to set the OMP env variable in the coprocessor please use OMP_NUM_THREADS, and not MIC_OMP_NUM_THREADS. For example, if you want 64 threads to run your program in the coprocessor:
% ssh mic0
% export OMP_NUM_THREADS=64

IRQ 8 isn't working... HW or SW?

First, I program for Vintage computer groups. What I write is specifically for MS-DOS and not windows, because that's what people are running. My current program is for later systems and not the 8086 line, so the plan was to use IRQ 8. This allows me to set the interrupt rate in binary values from 2 / second to 8192 / second (2, 4, 8, 16, etc...)
Only, for some reason, on the newer old systems (ok, that sounds weird,) it doesn't seem to be working. In emulation, and the 386 system I have access to, it works just fine, but on the P3 system I have (GA-6BXC MB w/P3 800 CPU,) it just doesn't work.
The code
setting up the interrupt
disable();
oldrtc = getvect(0x70); //Reads the vector for IRQ 8
settvect(0x70,countdown); //Sets the vector for
outportb(0x70,0x8a);
y = inportb(0x71) & 0xf0;
outportb(0x70,0x8a);
outportb(0x71,y | _MRATE_); //Adjustable value, set for 64 interrupts per second
outportb(0x70,0x8b);
y = inportb(0x71);
outportb(0x70,0x8b);
outportb(0x71,y | 0x40);
enable();
at the end of the interrupt
outportb(0x70,0x0c);
inportb(0x71); //Reading the C register resets the interrupt
outportb(0xa0,0x20); //Resets the PIC (turns interrupts back on)
outportb(0x20,0x20); //There are 2 PICs on AT machines and later
When closing program down
disable();
outportb(0x70,0x8b);
y = inportb(0x71);
outportb(0x70,0x8b);
outportb(0x71,y & 0xbf);
setvect(0x70,oldrtc);
enable();
I don't see anything in the code that can be causing the problem. But it just doesn't seem to make sense. While I don't completely trust the information, MSD "does" report IRQ 8 as the RTC Counter and says it is present and working just fine. Is it possible that later systems have moved the vector? Everything I find says that IRQ 8 is vector 0x70, but the interrupt never triggers on my Pentium III system. Is there some way to find if the Vectors have been changed?
It's been a LONG time since I've done any MS-DOS code and I don't think I ever worked with this particular interrupt (I'm pretty sure you can just read the memory location to fetch the time too, and IRQ0 can be used to trigger you at an interval too, so maybe that's better. Anyway, given my rustiness, forgive me for kinda link dumping.
http://wiki.osdev.org/Real_Time_Clock the bottom of that page has someone saying they've had problem on some machines too. RBIL suggests it might be a BIOS thing: http://www.ctyme.com/intr/rb-7797.htm
Without DOS, I'd just capture IRQ0 itself and remap all of them to my own interrupt numbers and change the timing as needed. I've done that somewhat recently! I think that's a bad idea on DOS though, this looks more recommended for that: http://www.ctyme.com/intr/rb-2443.htm
Anyway though, I betcha it has to do with the BIOS thing:
"Notes: Many BIOSes turn off the periodic interrupt in the INT 70h handler unless in an event wait (see INT 15/AH=83h,INT 15/AH=86h).. May be masked by setting bit 0 on I/O port A1h "

Efficient variable watching in C/C++

I'm currently writing a multi-threaded, high efficient and scalable algorithm. Because I have to guess a parameter for the code and I'm not sure how the calculation performs on a specific data set, I would like to watch a variable. The test only works with a real world, huge data set. It is possible to analyze the collected data after profiling. Imagine the following, simple code example (real code can contain multiple watch points:
// function get's called by loops of multiple threads
void payload(data_t* data, double threshold) {
double value = calc(data);
// here I want to watch the value
if (value < threshold) {
doSomething(data);
} else {
doSomethingElse(data);
}
}
I thought about the following approaches:
Using cout or other system outputs
Use a binary output (file, network)
Set a breakpoint via gdb/lldb
Use variable watching + logging via gdb/lldb
I'm not happy with the results because: To use 1. and 2. I have to change the code, but this is a debugging/evaluating task. Furthermore 1. requires locking and 1.+2. requires I/O operations, which heavily slows down the entire code and makes testing with real data nearly impossible. 3. is also too slow. To use 4., I have to know the variable address because it's not a global variable, but because threads get created by a dynamic scheduler, this would require breaking + stepping for each thread.
So my conclusion is, that I need a profiler/debugger that works at machine code level and dumps/logs/watches the variable without double->string conversion and is highly efficient, or to sum up with other words: I would like to profile the internal state of my algorithm without heavy slow-down and without doing deep modification. Does anybody know a tool that is able to this?
OK, this took some time but now I'm able to present a solution for my problem. It's called tracepoints. Instead of breaking the program every time, it's more lightweight and (ideally) doesn't change performance/timing too much. It does not require code changes. Here is an explanation how to use them using gdb:
Make sure you compiled your program with debugging symbols (using the -g flag). Now, start the gdb server and provide a network port (e.g. 10000) and the program arguments:
gdbserver :10000 ./program --parameters you --want --to use
Now, switch to a second console and start gdb (program parameters are not required here):
gdb ./program
All following commands are entered in the gdb command line interface. So let's connect to the server:
target remote :10000
After you got the connection confirmation, use trace or ftrace to set a tracepoint to a specific source location (try ftrace first, it should be faster but doesn't work on all platforms):
trace source.c:127
This should create tracepoint #1. Now you can setup an action for this tracepoint. Here I want to collect the data from myVariable
action 1
collect myVariable
end
If expect much data or want to use the data later (after restart), you can set a binary trace file:
tsave trace.bin
Now, start tracing and run the program:
tstart
continue
You can wait for program exit or interrupt your program using CTRL-C (still on gdb console, not on server side). Continue by telling gdb that you want to stop tracing:
tstop
Now we come the tricky part and I'm not really happy with the following code because it's really slow:
set pagination off
set logging file trace.txt
tfind start
while ($trace_frame != -1)
set logging on
printf "%f\n", myVariable
set logging off
tfind
end
This dumps all variable data to a text file. You can add some filter or preparation here. Now you're done and you can exit gdb. This will also shutdown the server:
quit
For detailed documentation especially for explanation of filtering and more advanced tracepoint positions, you can visit the following document: http://sourceware.org/gdb/onlinedocs/gdb/Tracepoints.html
To isolate trace file writing from your program execution, you can use cgroups or another network connected computer. When using another computer, you have to add the host to the port information (e.g. 192.168.1.37:10000). To load a binary trace file later, just start gdb as shown above (forget the server) and change the target:
gdb ./program
target tfile trace.bin
you can set hardware watchpoint using gdb debugger, for example if you have
bool b;
variable and you want to be notified every time the value of it has chenged (by any thread)
you would declare a watchpoint like this:
(gdb) watch *(bool*)0x7fffffffe344
example:
root#comp:~# gdb prog
GNU gdb (GDB) 7.5-ubuntu
Copyright ...
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /dist/Debug/GNU-Linux-x86/cppapp_socket5_ipaddresses...done.
(gdb) watch *(bool*)0x7fffffffe344
Hardware watchpoint 1: *(bool*)0x7fffffffe344
(gdb) start
Temporary breakpoint 2 at 0x40079f: file main.cpp, line 26.
Starting program: /dist/Debug/GNU-Linux-x86/cppapp_socket5_ipaddresses
Hardware watchpoint 1: *(bool*)0x7fffffffe344
Old value = true
New value = false
main () at main.cpp:50
50 if (strcmp(mask, "255.0.0.0") != 0) {
(gdb) c
Continuing.
Hardware watchpoint 1: *(bool*)0x7fffffffe344
Old value = false
New value = true
main () at main.cpp:41
41 if (ifa ->ifa_addr->sa_family == AF_INET) { // check it is IP4
(gdb) c
Continuing.
mask:255.255.255.0
eth0 IP Address 192.168.1.5
[Inferior 1 (process 18146) exited normally]
(gdb) q

What can be adjusted in this simple code to make signal change in fsm

Well i have process a in my main component and process b in my other sub component(inmplemented in the main one).
both process a and b have only the clock in their sensitivity list:
process a control eneable signal called ready which if 1 process b can work , 0 process b will do nothing.
Problem is in process a , when process a changes value of enable signal to 0 , it has to take to the next clock cycle to change so process b ends up and run an extra clock cycle.
a:process(clk)
begin
if(rising_edge(clk)) then
if(output/=old_output) then
enable<='0';
end if;
end if;
end process;
b:process(clk)
begin
if(rising_edge(clk)) then
if(enable='1') then
--do anything
end if;
end if;
end process;
The reason is that the value is latched/sampled at the exact rising_edge of the clock. At that time, 'enable' is still equal to one. In that simulation delta, enabled will get the value zero, but it won't be available until AFTER the first delta.
This is also true for when enable BECOMES one (given that it is also generated on a rising clock edge), the process will latch the value exactly when clock rises, and in the simulator, enabled will look high for a whole clock period, even though "--do anything" will not happen.
You can think of this as real electrical circuits instead of a programming language. Consider that the evaluation of "output/=old_output" will consume time, and that you as a designer want that to be DONE before the next rising clock edge.
Hope this helps, but this is how the language works. I could give you a better answer if both the setting and resetting of the enable.