count consecutive basic block with BBL_NUMins < 7 - c++

I'm new in pintool and I want count number of consecutive Basic Block with BBL_NumINS < 7 and with specific Tail instruction such as Indirect Jump or Indirect Call or ret.
So I wrote this code
static UINT32 consecutiveBasicBlockscount = 0;
//------------------------------------------------------------------------------------------
// This function is called before every block
VOID docount()
{
OutFile << "Inc Consecutive Basic Block Counter From " <<consecutiveBasicBlockscount<<"\tto "<<consecutiveBasicBlockscount+1<< endl;
OutFile << "----------------------------------------------------------------------------------------" <<endl;
consecutiveBasicBlockscount += 1;
}
for (BBL bbl = TRACE_BblHead(trace); BBL_Valid(bbl); bbl = BBL_Next(bbl))
{
INS insTail = BBL_InsTail(bbl);
if(INS_IsIndirectBranchOrCall(BBL_InsTail(bbl)))
{
if((!INS_IsCall(insTail) && !INS_HasFallThrough(insTail) && !INS_IsHalt(insTail) && !INS_IsRet(insTail))||(INS_IsCall(insTail) && !INS_HasFallThrough(insTail) && !INS_IsHalt(insTail) && !INS_IsRet(insTail)) || INS_IsRet(insTail))
{
if (BBL_NumIns(bbl) < 7)
{
OutFile << "*****"<< hex << BBL_Address(bbl) <<"*****"<<endl;
for(INS ins = BBL_InsHead(bbl); INS_Valid(ins); ins=INS_Next(ins))
{
OutFile << INS_Disassemble(ins) <<endl;
}
OutFile << "********************************" <<endl;
BBL_InsertCall(bbl, IPOINT_BEFORE, (AFUNPTR)docount, IARG_END);
}
}
}
}
the output file
----------------------------------------------------------------------------------------
Inc Consecutive BasicBlock Counter From 0 to 1
----------------------------------------------------------------------------------------
*****b6709ba0*****
mov eax, 0xc9
call dword ptr gs:[0x10]
********************************
Inc Consecutive BasicBlock Counter From 1 to 2
----------------------------------------------------------------------------------------
Inc Consecutive BasicBlock Counter From 2 to 3
----------------------------------------------------------------------------------------
Inc Consecutive BasicBlock Counter From 3 to 4
----------------------------------------------------------------------------------------
*****b6709bac*****
ret
********************************
Inc Consecutive BasicBlock Counter From 4 to 5
----------------------------------------------------------------------------------------
I test this pintool against firefox.
Why pin does not show Basic Block, when Counter is 0, 2, 3?

Unless I completely misunderstood your question you want to find all instances during a concrete execution of a binary where multiple basic blocks with an indirect call/jump or ret instruction as the final instruction (tail) is executed after each other.
When starting to write PIN tools the difference between analysis code and instrumentation code can be quite confusing. Even though PIN is a dynamic binary instrumentation framework the code you write can either exist in a static context or in a dynamic context. Instrumentation code (mapped via the TRACE_AddInstrumentFunction hook for example) execute in a static context, which means that they are not executed every time a basic block is encountered, but only when a new basic block needs to be instrumented. Analysis code (mapped via the BBL_InsertCall hook for example) on the other hand exists in a dynamic context, which means that it will be executed every time a basic block is executed. In fact, the binary under analysis is recompiled in memory (called a code cache) together with the analysis code in the PIN tool.
If I've understood your question correctly your code kind of mixes these contexts in a way which causes the output for 0-1 3-4 to be an accident more than anything else. I've written a simple PIN tool to list all the chains of indirect basic blocks of length 2 or more, modifying it to print the asm instructions should be easy for you with a little more research into the BBL_InsertCall documentation.
main
This code instructs PIN to call the function instrument_trace every time a basic block that has not already been instrumented is discovered. For simplicity I also declare a couple of global variables to simplify the structure.
#include "pin.H"
#include <iostream>
#include <vector>
std::vector<ADDRINT>* consecutive_indirect_bbls = new std::vector<ADDRINT>();
std::ostream& Output = std::cout;
int main(int argc, char *argv[]) {
if (PIN_Init(argc, argv) == 0) {
TRACE_AddInstrumentFunction(instrument_trace, NULL);
PIN_StartProgram();
}
return 0;
}
instrument_trace
This is the code that is executed every time a basic block that hasn't already been instrumented is discovered by PIN. It is basically the same code that you provided in your question with a few important changes. The instrumentation code is only meant to set up the analysis code so that the execution flow of the binary under analysis can be monitored. The binary under analysis is actually not executed when this code is executed, but could be seen as "paused".
In order to print the call chains we're interested in we also need to insert an analysis call to those basic blocks that would immediately follow such a call chain since we have no other method of displaying the call chain, or know if there was a break in the chain. This logic should be quite obvious once you've played around with it a little bit.
VOID instrument_trace(TRACE trace, VOID* vptr) {
for (BBL bbl = TRACE_BblHead(trace); BBL_Valid(bbl); bbl = BBL_Next(bbl)) {
INS tail = BBL_InsTail(bbl);
if ((INS_IsIndirectBranchOrCall(tail) || INS_IsRet(tail))
&& BBL_NumIns(bbl) < 7) {
BBL_InsertCall(bbl, IPOINT_BEFORE,
(AFUNPTR) analysis_indirect_bbl,
IARG_ADDRINT, BBL_Address(bbl),
IARG_END);
} else {
BBL_InsertCall(bbl, IPOINT_BEFORE,
(AFUNPTR) analysis_print_vector,
IARG_END);
}
}
}
analysis_indirect_bbl
This function is called every time a basic block that ends in an indirect call/jump or ret instruction is executed in the binary that we are monitoring. Whenever this happens we push the starting address of that basic block to the global vector we use to keep track of these chains.
VOID analysis_indirect_bbl(ADDRINT address) {
consecutive_indirect_bbls->push_back(address);
}
analysis_print_vector
This is just a function to print the call chains we are interested in to Output (std::out in this example).
VOID analysis_print_vector() {
if (consecutive_indirect_bbls->size() > 2) {
for (unsigned int i = 0;
i < consecutive_indirect_bbls->size();
++i) {
Output << "0x" << std::hex
<< consecutive_indirect_bbls->at(i) << " -> ";
}
Output << "END" << std::endl;
consecutive_indirect_bbls->clear();
} else if (!consecutive_indirect_bbls->empty()) {
consecutive_indirect_bbls->clear();
}
}
When testing the PIN tool I would strongly advice against running a program such as firefox since it will be impossible to test changes against the exact same execution flow. I usually test against gzip myself since it is really easy to control the length of the execution.
$ lorem -w 500000 > sample.data
$ cp sample.data sample_exec-001.data
$ pin -injection child -t obj-ia32/cbbl.so -- /bin/gzip -9 sample_exec-001.data
0xb775c7a8 -> 0xb774e5ab -> 0xb7745140 -> END
0xb775c7a8 -> 0xb774e5ab -> 0xb7745140 -> END
0xb775c7a8 -> 0xb774e5ab -> 0xb7745140 -> END
0xb775b9ca -> 0xb7758d7f -> 0xb77474e2 -> END
0xb5eac46b -> 0xb5eb2127 -> 0xb5eb2213 -> END
0xb5eac46b -> 0xb5eb3340 -> 0xb5eb499e -> END
0xb5eac46b -> 0xb5eb3340 -> 0xb5eb499e -> END
...
0xb5eac46b -> 0xb5eb3340 -> 0xb5eb499e -> END

Related

Why debug execution order doesn't match code order in c++?

I am new to C++. When I debugged in Clion, I found that the execution order using Step over (F8) doesn't match the real code's order. So far, I think the most possible reason is compiler optimization. I have no impression that I've enabled this and I am using LLDB to debug.
It seems related to this answer:
Why don't my breakpoints run in order? GCC and C
Here is my code:
int depthSum(vector<NestedInteger> &nestedList) {
deque<pair<int, NestedInteger>> q;
int level = 1;
for (NestedInteger n : nestedList) {
q.push_back({ level, n});
}
int res = 0;
while (!q.empty()) {
auto pair = q.front();
q.pop_front();
if (pair.second.isInteger()) {
level--;
res = (level * (pair.second.getInteger()));
cout << res << endl;
} else {
level++;
for (NestedInteger nestedInteger : pair.second.getList()) {
q.push_front({level, nestedInteger});
}
}
}
return res;
}
When the breakpoint step at the res = (level * (pair.second.getInteger())); and then press the F8 it jumps back to the level--, and then press F8 jump to the res = (level * (pair.second.getInteger()));, once again press F8 that will jump to the cout << res << endl;
marked order:
Is this really because code reordering? By the way, the result of the code logical without error though debugging order not match expectation, but I have not found some open code optimization option in my project.
add profile of CMake
I tried to avoid optimization but problem still exists.
---------------------------- updated
This is working, but I have no clue why -O0 is not working. Logically speaking -O0 should be working because this avoids all optimization.
-DCMAKE_C_FLAGS_DEBUG:STRING="-g -O1" -DCMAKE_CXX_FLAGS_DEBUG:STRING="-g -O1"

Severe & Bizzare performance issue

Update
Ok, I removed the 3 couts and replaced it with *buffer = 'a', and there was a big performance difference. Removing that line made the program 2x as fast. If you go on godbolt and compile it using msvc, that single line of code changes most of the program. (It adds a whole lot more complexity)
The following might seem extremely weird, but it's true on my computer:
Alright, so I was doing some benchmarking of some code, and I noticed extremely weird performance anomalies that were 100% consistent. I'm running windows-10 and visual-studio-2019. Basically, deleting a line of code that is never called completely changes the performance of the program.
Here is exactly what to do:
Create new VS-2019 Console C++ App project
Set the configuration to Release & x64
Paste the code below:
#include <iostream>
#include <chrono>
class Test {
public:
size_t length;
size_t doublingVal;
char* buffer;
Test() : length(0), doublingVal(2) {
buffer = static_cast<char*>(malloc(1));
}
~Test() {
std::cout << "called" << "\n";
std::cout << "called" << "\n";
std::cout << "called" << "\n"; // Remove this line and the time decreases DRASTICALLY (ie remove line 14)
}
void append() {
if (doublingVal == length) {
doublingVal <<= 1;
}
*buffer = 'a';
++length;
}
};
int main()
{
Test test;
auto start = std::chrono::high_resolution_clock::now();
for (size_t i = 0; i < static_cast<size_t>(1024) * 1024 * 1024 * 4; ++i) {
test.append();
}
std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::high_resolution_clock::now() - start).count() << "\n";
}
Run the program using CTRL+F5, not in debug. Now remember how long it takes to run. (a few seconds)
Then, in the destructor of Test, remove the third line which has the comment.
Run the program again, and you should see that the performance increases drastically. I tested this exact same code with 4 different projects all brand new, and 3 different computers.
The destructor is called at the very end, when the entire program is finished measuring time. The extra cout shouldn't affect anything.
Edit:
You can also see a similar thing go on if you remove the 3 cout's and replace it with a single *buffer = 'a'. Then CTRL+F5 once again, record the time, and then remove that line we just added. Then run it again and the time magically decreases by half.
WTF is going on, and how do you solve the weird performance difference?

Does GDB in RISC-V support Program Context aware breakpoint?

I would like to learn if existing GDB for RISC-V supports Program Context aware breakpoints?
By program context aware breakpoints : I mean, when there is JAL or JALR instruction PC changes when there is a function call. in other cases in Function call ==> PC = PC + (Current Program Counter + 4)
in Function Return : PC = PC - (Return address (ra register value) ).
I have installed fedora(risc-V) on my ubuntu(virtual machine). Since it is virtual machine I can't print PC register value, that is why I couldn't check if it supports Program Context aware breakpoint or not?
My second question is : How can I print PC register value on my qemu risc-v virtual machine?
#include<stdio.h>
int check_prime(int a)
{
int c;
for (c=2;c<a;c++)
{
if (a%c == 0 ) return 0;
if (c == a-1 ) return 1;
}
}
void oddn(int a)
{
printf("oddn --> %d is an odd number \n",a);
if (check_prime(a)) printf("oddn --> %d is a prime number\n",a);
}
int main()
{
int a;
a=7;
if (check_prime(a)) printf("%d is a prime number \n",a);
if (a%2==1) oddn(a);
}
This is the program I am trying to breakpoint using GDB.
As you see on the picture it breaks twice(which should break once only).
It also gives error :
Error in testing breakpoint condition:
Invalid data type for function to be called
What you're looking for is documented here:
https://sourceware.org/gdb/current/onlinedocs/gdb/Convenience-Funs.html#index-_0024_005fstreq_002c-convenience-function
You should look at $_caller_is, $_caller_matches, $_any_caller_is, and $_any_caller_matches.
As an example, to check if the immediate caller is a particular function we could do this:
break functionD if ($_caller_is ("functionC"))
Then main -> functionD will not trigger the breakpoint, while main -> functionC -> functionD will trigger the breakpoint.
The convenience functions I listed all take a frame offset that can be used to specify which frame GDB will check (for $_caller_is and $_caller_matches) or to limit the range of frames checked (for $_any_caller_is and $_any_caller_matches).

Verifying halting-problem on self-implemented pseudo-assembly

I wrote a very simple implementation of what could be a similarity to Assembly/machine code.
It is even capable of recursion as in this example:
9 6
IFEQ R0,0
RET 1
ENDIF
MOV R1,R0
SUB R1,1
CALL R1
MOV R2,R9
MUL R2,R0
RET R2
Output: 720 (factorial of 6)
Description:
9 = Program Lines
6 = Program Input. Will be set to registry R0 value at class construction
CALL = calls the program again with the passed value (recursion)
RET = returns the program with the specified value. Sets registry R9 value to output value.
R0 to R9 -> general purpose registry.
R0 - program input value
R9 - program output value
-edit: Program commands:
MOV, ADD, SUB, MUL, DIV, MOD, IFEQ, IFNEQ, IFG, IFGE, IFL, IFLE, ENDIF, CALL, RET
However the program can enter into infinite loop/recursion. e.g:
2 0
CALL 10
RET 1 //will never be reached
How do I verify whether MY program will enter into an infinite loop/recursion?
Here's my implementation, don't know whether it's necessary, but just in case you need. (It's the whole code... hope you don't mind).
#include <iostream>
#include <map>
#include <string> //std::getline
#include <sstream>
#include <vector>
namespace util
{
template<typename I>I readcin(I& input) {
std::cin >> input;
std::cin.clear(); std::cin.ignore();
return input;
}
template<typename I, typename...O> I readcin(I& input, O&... others) {
readcin(input);
return readcin(others...);
}
}
//operations
enum OP
{
MOV, ADD, SUB, MUL, DIV, MOD,
IFG, IFL,
IFEQ, IFGE, IFLE,
IFNEQ,
CALL,
RET,
ENDIF,
};
std::map<std::string, OP> OPTABLE
{
{"MOV", MOV}, {"ADD", ADD}, {"SUB", SUB}, {"MUL", MUL}, {"DIV", DIV}, {"MOD", MOD},
{"RET", RET},
{"IFG", IFG}, {"IFL", IFL},
{"IFNEQ", IFNEQ}, {"IFEQ", IFEQ}, {"IFGE", IFGE}, {"IFLE", IFLE},
{"CALL", CALL},
{"ENDIF", ENDIF}
};
//registry index
enum RI
{
R0, R1, R2, R3, R4, R5, R6, R7, R8, R9, RI_MAX
};
std::map<std::string, RI> RITABLE =
{
{"R0", R0}, {"R1", R1}, {"R2", R2}, {"R3", R3}, {"R4", R4}, {"R5", R5},
{"R6", R6}, {"R7", R7}, {"R8", R8}, {"R9", R9}
};
struct Instruction
{
OP op;
RI r1;
int r2value;
Instruction() = delete;
Instruction(OP operation, RI firstRegister, int _2ndRegValue = -1)
{
op = operation;
r1 = firstRegister;
r2value = _2ndRegValue;
}
};
class Assembly
{
private:
int REG[RI::RI_MAX] {0};
int GetRegistryValue(RI ri) const { return REG[ri]; }
void SetRegistryValue(RI ri, int val) { REG[ri] = val; }
enum CMP_FLAG{ CMP_FAIL, CMP_OK };
CMP_FLAG flag { CMP_OK };
CMP_FLAG GetFlag() const { return flag; }
void SetFlag(bool setFlag) { flag = static_cast<CMP_FLAG>(setFlag); }
std::vector<std::string> programLines;
OP ExtractOP(const std::string& line);
RI ExtractRI(const std::string& line, OP op);
int Extract2ndRIval(const std::string& line, OP op);
public:
void addCommand(const std::string& line) { programLines.push_back(line); }
void Execute();
Assembly() = delete;
Assembly(int inputValue) { REG[R0] = inputValue; }
int ReturnValue() const { return REG[R9]; }
private:
//recursion only
Assembly(int inputValue, const std::vector<std::string>& progLines) {
REG[R0] = inputValue;
programLines = progLines;
this->Execute();
}
};
OP Assembly::ExtractOP(const std::string& line)
{
std::istringstream issline{ line };
std::string operation;
issline >> operation;
return OPTABLE[operation];
}
RI Assembly::ExtractRI(const std::string& line, OP op)
{
auto space = line.find(' ');
if(op <= IFNEQ){
auto comma = line.find(',');
return RITABLE[std::string(line.begin() + space + 1, line.begin() + comma)];
}
return RI_MAX;
}
int Assembly::Extract2ndRIval(const std::string& line, OP op)
{
if(op == ENDIF) {
return -1;
}
std::size_t spaceOrComma;
if(op == CALL || op == RET) {
spaceOrComma = line.find(' ');
} else {
spaceOrComma = line.find(',');
}
std::string opval = std::string(line.begin() + spaceOrComma + 1, line.end());
auto it = RITABLE.find(opval);
if(it != RITABLE.end()){
return this->REG[it->second];
}
auto num = std::atoi(opval.c_str());
return num;
}
void Assembly::Execute()
{
for(const std::string& line : programLines)
{
OP op = ExtractOP(line);
RI r1 = ExtractRI(line, op);
int r2value = Extract2ndRIval(line, op);
Instruction command ( op, r1, r2value );
if(GetFlag() == CMP_FAIL)
{
if(command.op == ENDIF){
SetFlag(CMP_OK);
}
continue;
}
switch(command.op)
{
case MOV: { SetRegistryValue(command.r1, command.r2value); } break;
case ADD: { SetRegistryValue(command.r1, REG[command.r1] + command.r2value); } break;
case SUB: { SetRegistryValue(command.r1, REG[command.r1] - command.r2value); } break;
case MUL: { SetRegistryValue(command.r1, REG[command.r1] * command.r2value); } break;
case DIV: { SetRegistryValue(command.r1, REG[command.r1] / command.r2value); } break;
case MOD: { SetRegistryValue(command.r1, REG[command.r1] % command.r2value); } break;
case IFEQ: { SetFlag(GetRegistryValue(command.r1) == command.r2value); } break;
case IFNEQ: { SetFlag(GetRegistryValue(command.r1) != command.r2value); } break;
case IFG: { SetFlag(GetRegistryValue(command.r1) > command.r2value); } break;
case IFL: { SetFlag(GetRegistryValue(command.r1) < command.r2value); } break;
case IFGE: { SetFlag(GetRegistryValue(command.r1) >= command.r2value); } break;
case IFLE: { SetFlag(GetRegistryValue(command.r1) <= command.r2value); } break;
case RET:
{
SetRegistryValue(R9, command.r2value);
return;
}break;
//oh boy!
case CALL:
{
// std::cout << "value to call:" << command.r2value << std::endl;
Assembly recursion(command.r2value, this->programLines);
SetRegistryValue(R9, recursion.ReturnValue());
}break;
}
}
}
int main()
{
while(true)
{
int pl, input;
util::readcin(pl, input);
if(pl == 0){
break;
}
Assembly Asm(input);
for(auto i=0; i<pl; ++i)
{
std::string line;
std::getline(std::cin, line);
Asm.addCommand(line);
}
Asm.Execute();
std::cout << Asm.ReturnValue() << std::endl;
}
return 0;
}
The only way to check to see if a program is stuck in an infinite loop in the general case is to check to see the program has entered the same state as previous state. If it has entered exactly the same state previously, then it must continue on executing in a loop returning to the same state over and over following the same sequence of steps. In real programs this essentially impossible because of the huge number of possible states the program can be in, but your assembly language only allows much more limited number of possible states.
Since your CALL instruction works just like invoking the program at the start and this is the only form of looping, this means that checking if the code enters the same state twice is simple. A CALL instruction with a certain argument has the exact same effect as invoking the program with that argument as an input. If the CALL instruction has the same argument as any previously executed CALL instruction or the program's input value, then it must continue executing in a loop endlessly returning to same state in the same sequence of steps.
In other words, the only state that needs to be checked is the R0 value at the start of the program. Since this value is stored in a int, it can only have 2^32 possible values on any common C++ implementation, so it's reasonable and easy to brute force check see if a given program with a given input gets stuck in infinite loop.
In fact, it's possible to use memoization of the return values to brute force check all possible inputs in O(N) time and O(N) space, where N is number of possible inputs. There are various ways you could do this, but the way I would do it is to create three vectors, all with a number of elements equal to the number of possible inputs. The first vector is a bool (bit) vector that records whether or not a given input has been memoized yet, the second vector is also bool vector and it records whether a given input has been used already on the call stack, the second vector is an int vector that records the result and is used a linked list of input values to create a call stack to save space. (In the code below these vectors are called, is_memoized, input_pending and memoized_value respectively.)
I'd take your interpreter loop and rewrite it to be non-recursive, something like this pseudo-code:
input = reg[R0]
if is_memoized[input]:
reg[R9] = memoized_value[input]
return
input_pending[input] = true
memoized_value[input] = input // mark the top of the stack
while true:
for command in program:
...
if command.op == CALL:
argument = command.r2value
if input_pending[argument]:
// Since this input value is ready been used as input value
// somewhere on the call stack this the program is about to enter
// an identical state as a previous state and so is stuck in
// a infinite loop.
return false // program doesn't halt
if is_memoized[argument]:
REG[R9] = memoized_value[argument]
else:
memoized_value[argument] = input // stack the input value
input = argument
REG = [input, 0, 0, 0, 0, 0, 0, 0, 0, 0]
input_pending[input] = true
break // reevaluate the program from the beginning.
if command.op == RET:
argument = command.r2value
stacked_input = memoized_value[input]
input_pending[input] = false
if stacked_input == input: // at the top of the stack
REG[R9] = argument
return true // program halts for this input
is_memoized[input] = true
memoized_value[input] = argument
input = stacked_input
REG = [input, 0, 0, 0, 0, 0, 0, 0, 0, 0]
break // reevaluate the program from the beginning
You'd then call this interpreter loop for each possible input, something like this:
for i in all_possible_inputs:
if not program.execute(input = i): // the function written above
print "program doesn't halt for all inputs"
return
print "program halts for all inputs"
A recursive version should be faster since it doesn't have to reevaluate the program on each unmemoized CALL instruction in the program, but it would require hundreds of gigabytes of stack space in the worst case. This non-recursive version only requires 17 GB of memory. Either way it's still O(N) space and time, you're just making one constant factor smaller and another bigger.
To get this execute in reasonable amount of time you'd also probably want to parse the code once, and execute some byte code representation instead.
I take it you're looking for outside-the-box thinking.
Think of the halting problem this way. Turing proved programs are free from control. But why? Because the language has instructions to control execution. This means feasibly regulating and predicting execution in programs requires removing all control semantics from the language.
Even my collaborative process architecture doesn't accomplish that. It just forbids them because of the mess they make. It is still composed from a language which contains them. For example, you can use IF to break, return or continue but not for other operations. Function calls are illegal. I created such restrictions to achieve controllability. However not even a collaborative organization removes control structures from the language to prevent their misuse.
My architecture is online via my profile with a factorial example in my W article.
If the program steps into the same configuration twice then it will loop forever.
This is also true for Turing Machines, it is just that the (infinite) input is part of the machine's configuration.
This is also the intuition behind the pumping lemmas.
What constitutes a configuration in your model?
Since you have no memory and no IO, a configuration is given by the content of the registers and the line number of the current instruction (i.e. the instruction pointer).
When do a configuration change?
After every instruction, sure. But in the case of a non-branching instruction, the configurations before and after it are surely different because even if the instruction is a NOP then line number changed.
In the case of a branch, the line number might be one that we've seen before (for a backwards branch), so that's where the machine could step into the same configuration.
The only jumping instruction of interest, to me, seems to be call. The IF like ones will always produce different configurations because they are not expressive enough to produce iteration (jump back and repeat).
How does call change a configuration? It sets the line number to 1 and all the registers (except r0) to zero.
So the condition for a call to produce the same configuration reduces to having the same input.
If you check, in the call implementation, if the operand value has already been used before (in the simulation) then you can tell that the program will loop forever.
If a register has size n, then the possible states are O(2n), which is generally a lot.
You must be prepared to give up after a (possible customizable) threshold. Or in your case where your registers are int, most C++ implementations have 32-bit int, and modern machines can handle a 512MiB bitmap of 2^32 bits. (std::vector<bool> implements a packed bitmap for example; index it with unsigned to avoid negative ints). A hash table is another alternative (std::unordered_set<int>). But if you used a wider type for your registers, the state would be too big to practically record every possible one and you would need some limit. A limit is kind of built-in to your implementation as you will overflow the C++ callstack (C++ recursion depth) before seeing anywhere near 2^32 repeats of the machine being simulated.
If the registers are unbounded in their value, this reduces to the Halting Problem and thus undecideable in the general case. (But as #Brendan says, you can still look for early repeats of the state; many programs will terminate or infinitely repeat in a simple way.)
If you change the call implementation to not zero out the other registers, you must fallback to check the whole configuration at the call site (and not just the operand).
To check the termination of the program on every input you must proceed non-deterministically and symbolically.
There are two problems: the branches and the input value.
It is a famous theorem that an NDTM can be simulated by a TM in an exponential number of steps w.r.t. the steps of the NDTM.
The only problematic instructions are the IF ones because they create non-determinism.
You can take several approaches:
Split the simulation in two branches. One that executes the IF one that does not.
Rewrite the code to be simulated to produce an exponential (in the number of branches) number of branch-free variants of the code. You can generate them lazily.
Keep a tree of configurations, each branch in the program generating two children in the current node in the tree.
They are all equivalent.
The input value is not known, so it's hard to tell if a call ends up in the same configuration.
A possible approach is to record all the changes to the input register, for example you could end up with a description like "sub(add(add(R0, 1), 4), 5);".
This description should be easy to manipulate, as it's easy to see that in the example above R0 didn't change because you get "sub(add(R0, 5), 5);" and then "R0;".
This works by relying on the laws of arithmetics, you must take care of inverse operations, identity elements (1 * a = a) and overflow.
Basically, you need to simplify the expression.
You can then check if the input has changed at a given point in the simulated time.
How do I verify whether a program will enter into an infinite loop/recursion?
In practice; the halting problem is trivial to solve. It's only impossible in theory.
The reason people think that halting problem is impossible to solve is that the question is constructed as a false dilemma ( https://en.wikipedia.org/wiki/False_dilemma ). Specifically; the question asks to determine if a program will always halt or will never halt; but there's a third possibility - sometimes halting (and sometimes not halting). For an example if this, imagine a program that asks the user if they want to halt forever or exit immediately (and correctly does what the user requested). Note that all sane applications are like this - they're not supposed to exit until/unless the user tells them to.
More correctly; in practice, there are 4 possibilities:
runs until something external causes it to halt (power turned off, hardware failure, kill -9, ...)
sometimes halts itself
always halts itself
indeterminate (unable to determine which of the 3 other cases it is)
Of course with these 4 possibilities, you can say you've created a "halting problem solver" just by classifying every program as "indeterminate", but it won't be a good solution. This gives us a kind of rating system - extremely good "halting problem solvers" rarely classify programs as "indeterminate", and extremely bad "halting problem solvers" frequently classify programs as "indeterminate".
So; how do you create a good "halting problem solver"? This involves 2 parts - generating control flow graphs ( https://en.wikipedia.org/wiki/Control-flow_graph ) for each function and a call graph ( https://en.wikipedia.org/wiki/Call_graph ) for the whole program; and value tracking.
Control Flow Graphs and Call Graph
It's not that hard to generate a control flow graph for each function/procedure/routine, just by examining the control flow instructions (call, return, jump, condition branch); and not that hard to generate a call graph while you're doing this (just by checking if a node is already in the call graph when you see a call instruction and adding it if it's not there yet).
While doing this, you want to mark control flow changes (in the control flow graph for each function) as "conditional" or "not conditional", and you want to mark functions (in the call graph for the whole program) as "conditional" or "not conditional".
By analyzing the resulting graphs you can classify trivial programs as "runs until something external causes it to halt" or "always halts itself" (e.g. this is enough to classify OP's original code as "runs until something external causes it to halt"); but the majority of programs will still be "indeterminate".
Value Tracking
Value tracking is (trying) to keep track of the possible values that could be in any register/variable/memory location at any point in time. For example, if a program reads 2 unsigned bytes from disk into 2 separate variable you know both variables will have a value from 0 to 255. If those variables are multiplied you know the result will be a value from 0*0 to 255*255; if those variables are added you know the result will be a value from 0+0 to 255+255; etc. Of course the variable's type gives absolute maximum possible ranges, even for assembly (where there's no types) - e.g. (without knowing if it's signed or unsigned) you know that a 32-bit read from memory will return a value from -2147483648 to 4294967296.
The point of value tracking is to annotate conditional branches in the control flow graph for each function; so that you can use those annotations to help classify a program as something other than "indeterminate".
This is where things get tricky - improving the quality of a "practical halting problem solver" requires increasing the sophistication/complexity of the value tracking. I don't know if it's possible to write a perfect "halting problem solver" (that never returns "indeterminate") but I do know that it's possible to write a "halting problem solver" that is sufficient for almost all practical purposes (that returns "indeterminate" so rarely that nobody cares).

DMA write to SD card (SSP) doesn't write bytes

I'm currently working on replacing a blocking busy-wait implementation of an SD card driver over SSP with a non-blocking DMA implementation. However, there are no bytes actually written, even though everything seems to go according to plan (no error conditions are ever found).
First some code (C++):
(Disclaimer: I'm still a beginner in embedded programming so code is probably subpar)
namespace SD {
bool initialize() {
//Setup SSP and detect SD card
//... (removed since not relevant for question)
//Setup DMA
LPC_SC->PCONP |= (1UL << 29);
LPC_GPDMA->Config = 0x01;
//Enable DMA interrupts
NVIC_EnableIRQ(DMA_IRQn);
NVIC_SetPriority(DMA_IRQn, 4);
//enable SSP interrupts
NVIC_EnableIRQ(SSP2_IRQn);
NVIC_SetPriority(SSP2_IRQn, 4);
}
bool write (size_t block, uint8_t const * data, size_t blocks) {
//TODO: support more than one block
ASSERT(blocks == 1);
printf("Request sd semaphore (write)\n");
sd_semaphore.take();
printf("Writing to block " ANSI_BLUE "%d" ANSI_RESET "\n", block);
memcpy(SD::write_buffer, data, BLOCKSIZE);
//Start the write
uint8_t argument[4];
reset_argument(argument);
pack_argument(argument, block);
if (!send_command(CMD::WRITE_BLOCK, CMD_RESPONSE_SIZE::WRITE_BLOCK, response, argument)){
return fail();
}
assert_cs();
//needs 8 clock cycles
delay8(1);
//reset pending interrupts
LPC_GPDMA->IntTCClear = 0x01 << SD_DMACH_NR;
LPC_GPDMA->IntErrClr = 0x01 << SD_DMACH_NR;
LPC_GPDMA->SoftSReq = SD_DMA_REQUEST_LINES;
//Prepare channel
SD_DMACH->CSrcAddr = (uint32_t)SD::write_buffer;
SD_DMACH->CDestAddr = (uint32_t)&SD_SSP->DR;
SD_DMACH->CLLI = 0;
SD_DMACH->CControl = (uint32_t)BLOCKSIZE
| 0x01 << 26 //source increment
| 0x01 << 31; //Terminal count interrupt
SD_SSP->DMACR = 0x02; //Enable ssp write dma
SD_DMACH->CConfig = 0x1 //enable
| SD_DMA_DEST_PERIPHERAL << 6
| 0x1 << 11 //mem to peripheral
| 0x1 << 14 //enable error interrupt
| 0x1 << 15; //enable terminal count interrupt
return true;
}
}
extern "C" __attribute__ ((interrupt)) void DMA_IRQHandler(void) {
printf("dma irq\n");
uint8_t channelBit = 1 << SD_DMACH_NR;
if (LPC_GPDMA->IntStat & channelBit) {
if (LPC_GPDMA->IntTCStat & channelBit) {
printf(ANSI_GREEN "terminal count interrupt\n" ANSI_RESET);
LPC_GPDMA->IntTCClear = channelBit;
}
if (LPC_GPDMA->IntErrStat & channelBit) {
printf(ANSI_RED "error interrupt\n" ANSI_RESET);
LPC_GPDMA->IntErrClr = channelBit;
}
SD_DMACH->CConfig = 0;
SD_SSP->IMSC = (1 << 3);
}
}
extern "C" __attribute__ ((interrupt)) void SSP2_IRQHandler(void) {
if (SD_SSP->MIS & (1 << 3)) {
SD_SSP->IMSC &= ~(1 << 3);
printf("waiting until idle\n");
while(SD_SSP->SR & (1UL << 4));
//Stop transfer token
//I'm not sure if the part below up until deassert_cs is necessary.
//Adding or removing it made no difference.
SPI::send(0xFD);
{
uint8_t response;
unsigned int timeout = 4096;
do {
response = SPI::receive();
} while(response != 0x00 && --timeout);
if (timeout == 0){
deassert_cs();
printf("fail");
return;
}
}
//Now wait until the device isn't busy anymore
{
uint8_t response;
unsigned int timeout = 4096;
do {
response = SPI::receive();
} while(response != 0xFF && --timeout);
if (timeout == 0){
deassert_cs();
printf("fail");
return;
}
}
deassert_cs();
printf("idle\n");
SD::sd_semaphore.give_from_isr();
}
}
A few remarks about the code and setup:
Written for the lpc4088 with FreeRTOS
All SD_xxx defines are conditional defines to select the right pins (I need to use SSP2 in my dev setup, SSP0 for the final product)
All external function that are not defined in this snippet (e.g. pack_argument, send_command, semaphore.take() etc.) are known to be working correctly (most of these come from the working busy-wait SD implementation. I can't of course guarantee 100% that they are bugless, but they seem to be working right.).
Since I'm in the process of debugging this there are a lot of printfs and hardcoded SSP2 variables. These are of course temporarily.
I mostly used this as example code.
Now I have already tried the following things:
Write without DMA using busy-wait over SSP. As mentioned before I started with a working implementation of this, so I know the problem has to be in the DMA implementation and not somewhere else.
Write from mem->mem instead of mem->sd to eliminate the SSP peripheral. mem->mem worked fine, so the problem must be in the SSP part of the DMA setup.
Checked if the ISRs are called. They are: first the DMA IRS is called for the terminal count interrupt, and then the SSP2 IRS is called. So the IRSs are (probably) setup correctly.
Made a binary dump of the entire sd content to see if it the content might have been written to the wrong location. Result: the content send over DMA was not present anywhere on the SD card (I did this with any change I made to the code. None of it got the data on the SD card).
Added a long (~1-2 seconds) timeout in the SSP IRS by repeatedly requesting bytes from the SD card to make sure that there wasn't a timeout issue (e.g. that I tried to read the bytes before the SD card had the chance to process everything). This didn't change the outcome at all.
Unfortunately due to lack of hardware tools I haven't been able yet to verify if the bytes are actually send over the data lines.
What is wrong with my code, or where can I look to find the cause of this problem? After spending way more hours on this then I'd like to admit I really have no idea how to get this working and any help is appreciated!
UPDATE: I did a lot more testing, and thus I got some more results. The results below I got by writing 4 blocks of 512 bytes. Each block contains constantly increasing numbers module 256. Thus each block contains 2 sequences going from 0 to 255. Results:
Data is actually written to the SD card. However, it seems that the first block written is lost. I suppose there is some setup done in the write function that needs to be done earlier.
The bytes are put in a very weird (and wrong) order: I basically get alternating all even numbers followed by all odd numbers. Thus I first get even numbers 0x00 - 0xFE and then all odd numbers 0x01 - 0xFF (total number of written bytes seems to be correct, with the exception of the missing first block). However, there's even one exception in this sequence: each block contains 2 of these sequences (sequence is 256 bytes, block is 512), but the first sequence in each block has 0xfe and 0xff "swapped". That is, 0xFF is the end of the even numbers and 0xFE is the end of the odd series. I have no idea what kind of black magic is going on here. Just in case I've done something dumb here's the snippet that writes the bytes:
uint8_t block[512];
for (int i = 0; i < 512; i++) {
block[i] = (uint8_t)(i % 256);
}
if (!SD::write(10240, block, 1)) { //this one isn't actually written
WARN("noWrite", proc);
}
if (!SD::write(10241, block, 1)) {
WARN("noWrite", proc);
}
if (!SD::write(10242, block, 1)) {
WARN("noWrite", proc);
}
if (!SD::write(10243, block, 1)) {
WARN("noWrite", proc);
}
And here is the raw binary dump. Note that this exact pattern is fully reproducible: so far each time I tried this I got this exact same pattern.
Update2: Not sure if it's relevant, but I use sdram for memory.
When I finally got my hands on a logic analyzer I got a lot more information and was able to solve these problems.
There were a few small bugs in my code, but the bug that caused this behaviour was that I didn't send the "start block" token (0xFE) before the block and I didn't send the 16 bit (dummy) crc after the block. When I added these to the transfer buffer everything was written successfully!
So this fix was as followed:
bool write (size_t block, uint8_t const * data, size_t blocks) {
//TODO: support more than one block
ASSERT(blocks == 1);
printf("Request sd semaphore (write)\n");
sd_semaphore.take();
printf("Writing to block " ANSI_BLUE "%d" ANSI_RESET "\n", block);
SD::write_buffer[0] = 0xFE; //start block
memcpy(&SD::write_buffer[1], data, BLOCKSIZE);
SD::write_buffer[BLOCKSIZE + 1] = 0; //dummy crc
SD::write_buffer[BLOCKSIZE + 2] = 0;
//...
}
As a side note, the reason why the first block wasn't written was simply because I didn't wait until the device was ready before sending the first block. Doing so fixed the problem.