Test environment for an Online Judge - c++

I am planning to build an Online Judge on the lines of CodeChef, TechGig, etc. Initially, I will be accepting solutions only in C/C++.
Have thought through a security model for the same, but my concern as of now is how to model the execution and testing part.
Method 1
The method that seems to be more popular is to redirect standard input to the executable and redirect standard output to a file, for example:
./submission.exe < input.txt > output.txt
Then compare the output.txt file with some solution.txt file character by character and report the results.
Method 2
A second approach that I have seen is not to allow the users to write main(). Instead, write a function that accepts some arguments in the form of strings and set a global variable as the output. For example:
//This variable should be set before returning from submissionAlgorithm()
char * output;
void submissionAlgorithm(char * input1, char * input2)
{
//Write your code here.
}
At each step, and for a test case to be executed, the function submissionAlgorithm() is repeatedly called and the output variable is checked for results.
Form an initial analysis I found that Method 2 would not only be secure (I would prevent all read and write access to the filesystem from the submitted code), but also make the execution of test cases faster (maybe?) since the computations of test results would occur in memory.
I would like to know if there is any reason as to why Method 1 would be preferred over Method 2.
P.S: Of course, I would be hosting the online judge engine on a Linux Server.

Don't take this wrong, but you will need to look at security from a much higher perspective. The problem will not be the input and output being written to a file, and that should not affect performance too much either. But you will need to manage submisions that can actually take down your process (in the second case) or the whole system (with calls to the OS to write to disk, acquire too much memory....)
Disclaimer I am by no means a security expert.

Related

Is there a way to populate an ellipses parameter programmatically?

I'm going to be getting in data from a file of my own making. This file will contain a printf format string and the parameters passed to it. I've already generated this code.
Now I want to do the reverse. Read format string and the parameters and pass it back to printf functions. Can I somehow generate the appropriate call stack or am I going to have to reparse the format string and send it to printf() piecemeal?
Edit
I know the risks with the printf functions. I understand that there are security vulnerabilities. These issues are non-issues as:
This is to be used in a debugging context. Not to be handled outside of that scope.
Executable that reads the file, is executed by the person who made the file.
The datafile created will be read by an executable that simply expands the file and is not accessible by a third party.
It has no access to writing anything to memory (%n is not valid).
Use Case
To compress a stream with minimal CPU overhead by tracking constantly repeating strings and replacing them with enumerations. All other data is saved as binary data, thus requiring only minimal processing instead of having to convert it to a large string every time.

How to print out specific lines of user input to console (C++)

I am using c++ and the terminal. So my program takes in user input using read(STD_FILENO,buf,BUFFER and I am trying to write back only specific lines.
So for example, if the user entered in a total of 10 lines, how would I print out lines 3 through 7 or 6 through 10?
I am trying to use the write() function (write(STD_FILENO,buf,BUFFER)) but it's not printing what I want it to.
I have tried messing around with the BUFFER and tried to make it smaller than the total amount of characters that the user has input, but it is still not working.
My understanding is that whatever I say the BUFFER is to be, it will write UP TO that BUFFER value, so it will start from 0 to BUFFER. But if I wanted to start from line 6, that may start on character #15 and not 0...does this make sense?
please note: I need to use read() and write()
Thank You!
If you are required to only use read(2) and write(2), then you'll also need open(2), close(2), lseek(2) and you need to design and code your own buffered IO library above it. Read carefully the documentation of every system call mentioned here. Use the result of each of them. Handle error cases in your code. See errno(3) & perror(3).
So keep a buffer (or more than one) and several pointers (or offsets) into it (probably at least the currently consumed position, and the last read position, etc).
Perhaps you'll want to use some container. You might start implementing your own equivalent of fgetc on your buffered IO class, and build above that.
Lines do not really exist at the system call level. You need to take care of \n in your code.
BTW you could study, for inspiration, the source code of several free software C libraries implementing <stdio.h>, such as musl-libc
Of course you should compile with all warnings and debug info ( g++ -Wall -Wextra -g with GCC) and you'll need to use the debugger gdb to understand the behavior of your program and find your bugs. Don't be shy in drawing on some board what happens in your virtual address space (with pointers represented by arrows).
NB: SO is not a do-my-homework service.

Can I query LTTNG if a given tracepoint with given args is going to be traced, before tracing it?

We need to adapt a huge number of existing traces, printf-like, to LTTNG. One of the issues we are foreseeing is that we will need a catch-all tracepoint with the format of args plus a char* string. We are trying to find a way to avoid having to compose the string before calling the LTTNG tracepoint. Is there any way to know beforehand if the tracepoint "will be traced" before passing it to the LTTNG library? Any method we can call to know if the trace is a match?
Thanks a lot!
P.S. We know that having this kind of tracepoint is a bad practice, but zillions of trace lines are flying above us.
Use tracepoint_enabled() and do_tracepoint() macros as following, code copied from man page:
if (tracepoint_enabled(ust_tests_hello, tptest)) {
/* prepare arguments */
do_tracepoint(ust_tests_hello, tptest, i, netint, values,
text, strlen(text), dbl, flt);
}
Note: For this to work you need to have atleast LTTng-UST 2.7.0-rc1
You could technically query the status of the tracing session through liblttng-ctl. However if your goal is to improve performance, I am not sure doing a lookup through this library every time you hit a tracepoint will be more efficient than a string formatting. You would have to benchmark it.
As a side note, if you are moving existing printf() calls to LTTng tracepoints, you may want to look at tracef(), which is basically a single-format-string tracepoint already defined by the tracer. There is also a slightly more advanced tracelog() function which will be introduced in LTTng 2.7.

GDB backtrace with long function names

I am doing some debugging of an application that uses boost::spirit. This means that backtraces are very deep and that many of the intermediate layers have function names that take several pages to print. The length of the function names makes examining the backtrace difficult. How can I have gdb limit the length of a function name to 1 or 2 lines? I'd still like the see the full path to the file and line number, but I don't need four pages of template parameters!
I don't think it can be done directly right now. I think it would be a reasonable feature.
However, you can write your own implementation of "bt" in Python and then apply whatever transforms you like. This isn't actually very hard.

Safely embedding a string in C code (Secure string, Secure char*)

I have a dll (ansi c) that has some string litarals defined.
__declspec(dllexport) char* GetSomeString()
{
return "This is a test string from TestLib.dll";
}
When compiled this string is still visible in "notepad" for example. I'm fairly new to C, so I was wondering, is there a way to safely store string literals?
Should I do it with a resx file (for example), that has some encrypted values, or what would be the best way?
Thanks
EDIT 1:
The scenario is basically the following in pseudo code:
if(hostname)
return hostname
else
return "Literal String"';
It's this "literal string" that I would like to see "secured" in some way..
Don't put your secrets on anyone else's computer if you want them to stay secret.
See my related answer, The #1 Law of Software Licensing
And Eric Lippert's similar answer
First of all, since your executable1 needs to decode that literal in memory, any attacker determined enough will be able to do the same; often it's just as easy as freezing the process after startup (or after it needed to use the string we want), creating a memory dump and use utilities like string over it. There are methods to mitigate the issue (e.g. zeroing the memory used by a sensitive string immediately after using it), but since your code is on a machine where the potential attacker has all the privileges, you can only put roadblocks: in the end your executable is completely in the attacker's hands.
That being said, if your concern is just "not leaving important strings en plein air" you may just run an executable packer/encrypter over your whole dll. This is as easy as adding a post-build step in your solution, the packer will compress/encrypt the whole executable image and build an executable that when launched will decrypt and run it in memory.
This method has the great advantage of not requiring any change to your code: you just run upx over the compiled dll and you get your compressed dll, no XORs or weird literals spread across your code are needed.
Of course, this is quite weak security (basically it will just protect from snooping around in the executable with notepad or a hex editor), but again, storing critical "secrets" in an executable that is going to be distributed is a bad idea in first place.
In the whole answer I "executable" is to be intended in the wide meaning - i.e. also dlls are included.
You probably want to store hardcoded passwords in the library, right? You can XOR the string with some value, and store it, then read it and XOR again. It's the simplest way, but it doesn't protect your string from any kind of disassembling/reverse engineering.