SAFELY get path to running executable in windows API - c++

Hey,
I'm trying to get the path to a dll located in the same folder as my exe file. The way to go seems to be to use one of either QueryFullProcessImageName() or GetModuleFileName() to get the path to the running executable and then use string manipulation to make it a path to the required library instead.
Unfortunately, neither of these two functions provide a way to find out in advance the size buffer required. I've tried passing zero in for the nSize parameter, but this doesn't have the desired effect.
What's the best practice way of doing this?

In practice you can use Windows API MAX_PATH as your buffer size, perhaps add 1 for extra safety.
In theory a Windows path can be much larger. As I recall MAX_PATH is like 270 or thereabouts, while in the NTFS file system a path can be up to around (roughly) 32767 chars. However, for that large size it has to be handled as Unicode, and, importantly, Windows Explorer does not support such large paths, so it's not an issue in practice.
In practice, again, if you should ever encounter such large path, apparently impossible to remove, then you can use the Unicode naming (there's a special prefix to use for long paths), and/or equivalent shortnames (DOS 8.3 names), and/or define logical drives to shorten the path, so that the directory/file can be removed.
Cheers & hth.,

GetModuleFilename returns the number of characters copied to your buffer. If it's less than the size of your buffer, you're fine. If it's equal to the size of your buffer, allocate a larger buffer and try again.

Related

Is there a general way to detect file write errors with c++?

I have an application that writes several files depending on the specific config. Sometimes we use the normal std:ostream, sometimes we write with opencv, or with boost.
For std:ostream rdstate() function.
In my opinion the best solution would be, if I could use some general method (e.g. from the os), that works for each mechanism.
Is there something like this?
If not, can anyone point me to documentation about checks similar to rdstate()?
Thanks
Update:
I write the files with these functions:
OpenCV: cv::VideoWriter << image
boost: boost::iostreams::file_sink->write
I found a way to check the written data for the bosst case. The write function returns the number of written bytes, and I can compare this to the expected bytes.
For the openCV case I used GetFileAttributesEx the determine the filesize and check if that increases.
Is that a good way?
If the underlying operating system call that writes to a file returns an error indication, I would expect std::ostream to end up in a fail() state.
You just have to be diligent about checking the status of the std::ostream, including after an explicit close(), since a write error could be encountered writing out the final bits of buffered data.

Write a C++ struct to a file and read file using another programming language?

I have a challenging situation; we will have programs on Mac, PC, iOS and Android receiving files in a legacy format and parsing data from those files. We cannot change how those files are created.
The files are produced by a C++ program filling a struct with numbers and Strings and then writing it out. Here's a sanitized version.
struct MyObject {
String Kfkj(MAXKYS);
String Oern(MAXKYS);
String Vdflj(MAXKYS, 9);
int Muic;
int Tdfkj;
int VdfkAsdk;
int SsdjsdDsldsk;
int Ndsoief;
String TdflsajPdlj;
String TdckjdfPas;
String AdsfakjIdd;
int IdkfjdKasdkj;
int AsadkjaKadkja(MAXKYS);
int Kasldsdkj;
bool Usadl;
String PsadkjOasdj(9);
String PasdkjOsdkj;
};
Primitives and Strings, as you can see.
Then here is how they write it out to a file:
MyInstance MyObject;
FileName = "C:\MyFile.ab2"
ofstream fout (FileName, ios::binary);
fout.write((char*)& MyInstance, sizeof(MyInstance));
There is no option for us to translate it once and then distribute the file to other platforms; we must translate it on each and every different platform, and this is what we have to work with. I'd appreciate any information on how C++ serializes data, so we know how to parse the file.
EDIT: solution
The feedback I received from multiple answers here was VERY helpful. Using that, I did extensive analysis with hex editors and discovered:
the elements come in the file one after another
a "String," in this case, starts with an int describing how many characters follow the int for that String. If the String does not exist, it will still have that int with a value of 0.
integers, for the files and machines I saw, are two bytes, little-endian, and MOSTLY unsigned (there were a few that were signed, just to keep me on my toes)
the boolean was two bytes, with apparently -1 (FF FF) representing "true"
So far we have not ran into issues with different padding or endianness on different devices, but those are very real concerns. The skilled notes and warnings in these answers provides us with more ammunition to try to convince the client to change to a less fragile alternative, such as XML or JSON, for transferring data online across platforms.
As for those of you asking if the developer was fired... well, let's just say their code is very old, but after multiple conversations we're still having trouble convincing them writing out the C++ struct and trying to read that on different platforms is not a good idea.
You're going to run into many problems.
C++ doesn't have a specific format for serializing data per se. It is highly dependent on the computer architecture/processor that you are running on.
The compiler is allowed to add padding to help alignment on systems. When we say alignment we basically are referring to an architecture/processor's affinity for having data lie on specific byte boundaries. For example, some processors vastly prefer floating point numbers to lie at 4 or 8 byte boundaries - if they don't the processor may work much slower or may not work at all.
So, you can't simply know what padding your system is adding magically.
What you can do is use #pragma pack(1) / #pragma pack(0) to stop your compiler from padding your numbers.
PS: you also have to worry about endianness. What if one computer is running on big-endian and one is little endian? They will interpret bytes differently without a conversion.
Simply put, you either have to fix the application generating the files so it uses a proper serialization scheme OR you need to look at it running on a SPECIFIC computer, look at exactly how it writes the files, and write a translator for every target platform (which is just silly).
Interesting Suggestion
If you're really stuck, write an app that monitors the folder where you write files. Have the app pick up the files (since it's on the same PC it'll be able to read their format without issue). Have it write the files back in XML or some other true serialization format and distribute those instead.
Whoa - that's crazy. So String objects don't contain any pointers? Must not- because you claim this is working code.
Anyway, that code isn't doing any serialization. Its just writing the structure out to file exactly the way it is laid out in memory. The only issue you have is that on some platforms padding and sizeof integral types like int may be different.
You'll have to find the size of the integral types, and use that information in reader/writer for newer platforms to make sure they get laid out the same way on the legacy platform.
You're running a real risk with that code though. As it is, a compiler change could suddenly cause the file layout to change.
The format of your data file is entirely down to the compiler that your C++ program is compiled with, and the definition of your String class. You can rely on the fields being in the order they're declared in, and in this case, I think you can rely on there not being any padding at the start, but that's about all. Some tips that might help you out in this case:-
You don't give the definition of the String class you're using. If it's a typedef for std::string, you're completely screwed, because the contents of the string aren't in the memory. I assume your C++ programmers are using some special local buffer, in which case I'll guess you will find the first bytes of the object are the string, and there is some amount of useless padding afterwards. I hope the struct contains an int at the start telling you how much data in it is useful.
You'll probably find the int fields are four bytes long.
You'll probably find the bool field is one byte long, followed by three bytes of useless padding. Only one bit, most likely the bottom bit, will be set.
That's about all the useful guesswork I can offer you. In your target language, make sure to read the whole file in as the closest thing to a byte array available in the language, and only after that, use the language features to convert it into the right kind of thing in your language. Don't try reading it in as integers, as that won't let you byte-swap if you're on a platform with different endianness to the C++ program. I suggest also looking through the file in a text editor to reverse-engineer it and help you find the offset of each field.
Last piece of advice: consider printing P45s (or pink slips, or whatever you have in your country) for whichever programmers or project managers thought this kind of 'serialization' was a good idea. This kind of sloppy work might have been acceptable in a life-or-death situation, but they have seriously screwed you over in a way you're going to find it very hard to recover from. Writing the code to read in these files will not be that hard, if it's only one struct like this, but keeping it reliable will be a world of pain, and they've effectively made it impossible for themselves to change compilers or compiler version safely.
The way it's done, the struct is written in raw form to a file. So basically what you need to know to parse this file is the binary layout of your struct.
Basically, the fields are just one after the other, so to read an int, you just read 4 bytes and cast that to an int, etc.
Strings are a particular case. It's not clear from your code whether this "String" type is an inline array of characters, or a pointer to such an array. In the first case, you need to know how many characters each string contains and simply read that number of characters sequentially. In the second case, you won't be able to get the string back, since it won't have been written to file. The pointer will be useless to you.
One last concern is whether the struct is packed or not. Since you gave no indication to that, by default struct fields are aligned to 4-bytes boundaries, so there may be space for instance after the boolean field that you need to account for. If the struct is packed, then each field comes directly after the previous.
So, to make a long story short, figure out your struct binary layout using its definition and, if all else fails, inspecting the memory at run-time with the debugger, or use a hex editor to study the output file. Then write that specification down somewhere and this will give you what you need to read from the file. It's impossible to tell exactly what that layout is simply by looking at the pseudo-definition you gave.
Writing in an ofstream does not serialize data. This code write the raw memory content of the struct as it was a string of char. Depending of your compiler, its version, its options and the system it is running on the content will be completely different.
Even the number of bits of a char is allowed to change between c++ implementation.
Data referenced by the object of the struct won't be written (forget the content of std::string).
If you cannot change the writer code. You must know the alignment policy, the size of base type and the data representation. You will have to analyze files produced by hand, for example with an hexadecimal editor like this one
http://www.physics.ohio-state.edu/~prewett/hexedit/
, and probably look at your compiler documentation.
If you can change the writer code. Use proper serialization like json, protocol buffer or simply xml.
No one has pointed out something that sticks out to me as particularly problematic (maybe because I've been bit by it). That problem: the data member bool Usadl;. sizeof(bool) varies across platforms, across compilers, and even across releases of the same compiler. Common values for sizeof(bool) are 4 and 1. This will bite you. It's getting hard to find a big endian machine nowadays, very, very hard to find a computer where CHAR_BIT is not 8 or sizeof(int) is not 4. This is not the case for sizeof(bool).
In agreement with everyone else, Chad's team needs to document the structure of the records in the file, and then make sure the program that produces the file writes this structure explicitly, including element sizes, padding, and endianness. Don't depend on class layout to do this for you. That's just asking for trouble.
The best way would probably be to use JSON or if you want a more robust solution go with something like Avro. Avro has a C++ API and a Java API, so it covers most of the cases you're encountering.

Check if there is sufficient disk space to save a file; reserve it

I'm writing a C++ program which will be printing out large (2-4GB) files.
I'd like to make sure that there's sufficient space on the drive to save the files before I start writing them. If possible, I'd like to reserve this space.
This is taking place on a Linux-based system.
Any thoughts on a good way to do this?
Take a look at posix_fallocate():
NAME
posix_fallocate - allocate file space
SYNOPSIS
int posix_fallocate(int fd, off_t offset, off_t len);
DESCRIPTION
The function posix_fallocate() ensures that disk space is allocated for
the file referred to by the descriptor fd for the bytes in the range
starting at offset and continuing for len bytes. After a successful
call to posix_fallocate(), subsequent writes to bytes in the specified
range are guaranteed not to fail because of lack of disk space.
edit In the comments you indicate that you use C++ streams to write to the file. As far as I know, there's no standard way to get the file descriptor (fd) from a std::fstream.
With this in mind, I would make disk space pre-allocation a separate step in the process. It would:
open() the file;
use posix_fallocate();
close() the file.
This can be turned into a short function to be called before you even open the fstream.
Use aix's answer (posix_fallocate()), but since you're using C++ streams, you'll need a bit of a hack to get the stream's file descriptor.
For that, use the code here: http://www.ginac.de/~kreckel/fileno/.
If you are using C++ 17, you should do it with std::filesystem::resize_file
As shown in this post

Safe counterparts of itoa()?

I am converting some old c program to a more secure version. The following functions are used heavily, could anyone tell me their secure counterparts? Either windows functions or C runtime library functions. Thanks.
itoa()
getchar()
strcat()
memset()
itoa() is safe as long as the destination buffer is big enough to receive the largest possible representation (i.e. of INT_MIN with trailing NUL). So, you can simply check the buffer size. Still, it's not a very good function to use because if you change your data type to a larger integral type, you need to change to atol, atoll, atoq etc.. If you want a dynamic buffer that handles whatever type you throw at it with less maintenance issues, consider an std::ostringstream (from the <sstream> header).
getchar() has no "secure counterpart" - it's not insecure to begin with and has no buffer overrun potential.
Re memset(): it's dangerous in that it accepts the programmers judgement that memory should be overwritten without any confirmation of the content/address/length, but when used properly it leaves no issue, and sometimes it's the best tool for the job even in modern C++ programming. To check security issues with this, you need to inspect the code and ensure it's aimed at a suitable buffer or object to be 0ed, and that the length is computed properly (hint: use sizeof where possible).
strcat() can be dangerous if the strings being concatenated aren't known to fit into the destination buffer. For example: char buf[16]; strcpy(buf, "one,"); strcat(buf, "two"); is all totally safe (but fragile, as further operations or changing either string might require more than 16 chars and the compiler won't warn you), whereas strcat(buf, argv[0]) is not. The best replacement tends to be a std::ostringstream, although that can require significant reworking of the code. You may get away using strncat(), or even - if you have it - asprintf("%s%s", first, second), which will allocate the required amount of memory on the heap (do remember to free() it). You could also consider std::string and use operator+ to concatenate strings.
None of these functions are "insecure" provided you understand the behaviour and limitations. itoa is not standard C and should be replaced with sprintf("%d",...) if that's a concern to you.
The others are all fine to the experienced practitioner. If you have specific cases which you think may be unsafe, you should post them.
I'd change itoa(), because it's not standard, with sprintf or, better, snprintf if your goal is code security. I'd also change strcat() with strncat() but, since you specified C++ language too, a really better idea would be to use std::string class.
As for the other two functions, I can't see how you could make the code more secure without seeing your code.

Seeking and reading large files in a Linux C++ application

I am running into integer overflow using the standard ftell and fseek options inside of G++, but I guess I was mistaken because it seems that ftell64 and fseek64 are not available. I have been searching and many websites seem to reference using lseek with the off64_t datatype, but I have not found any examples referencing something equal to fseek. Right now the files that I am reading in are 16GB+ CSV files with the expectation of at least double that.
Without any external libraries what is the most straightforward method for achieving a similar structure as with the fseek/ftell pair? My application right now works using the standard GCC/G++ libraries for 4.x.
fseek64 is a C function. To make it available you'll have to define _FILE_OFFSET_BITS=64 before including the system headers That will more or less define fseek to be actually fseek64. Or do it in the compiler arguments e.g.
gcc -D_FILE_OFFSET_BITS=64 ....
http://www.suse.de/~aj/linux_lfs.html has a great overviw of large file support on linux:
Compile your programs with "gcc -D_FILE_OFFSET_BITS=64". This forces all file access calls to use the 64 bit variants. Several types change also, e.g. off_t becomes off64_t. It's therefore important to always use the correct types and to not use e.g. int instead of off_t. For portability with other platforms you should use getconf LFS_CFLAGS which will return -D_FILE_OFFSET_BITS=64 on Linux platforms but might return something else on e.g. Solaris. For linking, you should use the link flags that are reported via getconf LFS_LDFLAGS. On Linux systems, you do not need special link flags.
Define _LARGEFILE_SOURCE and _LARGEFILE64_SOURCE. With these defines you can use the LFS functions like open64 directly.
Use the O_LARGEFILE flag with open to operate on large files.
If you want to stick to ISO C standard interfaces, use fgetpos() and fsetpos(). However, these functions are only useful for saving a file position and going back to the same position later. They represent the position using the type fpos_t, which is not required to be an integer data type. For example, on a record-based system it could be a struct containing a record number and offset within the record. This may be too limiting.
POSIX defines the functions ftello() and fseeko(), which represent the position using the off_t type. This is required to be an integer type, and the value is a byte offset from the beginning of the file. You can perform arithmetic on it, and can use fseeko() to perform relative seeks. This will work on Linux and other POSIX systems.
In addition, compile with -D_FILE_OFFSET_BITS=64 (Linux/Solaris). This will define off_t to be a 64-bit type (i.e. off64_t) instead of long, and will redefine the functions that use file offsets to be the versions that take 64-bit offsets. This is the default when you are compiling for 64-bit, so is not needed in that case.
fseek64() isn't standard, the compiler docs should tell you where to find it.
Have you tried fgetpos and fsetpos? They're designed for large files and the implementation typically uses a 64-bit type as the base for fpos_t.
Have you tried fseeko() with the _FILE_OFFSET_BITS preprocessor symbol set to 64?
This will give you an fseek()-like interface but with an offset parameter of type off_t instead of long. Setting _FILE_OFFSET_BITS=64 will make off_t a 64-bit type.
The same for goes for ftello().
Use fsetpos(3) and fgetpos(3). They use the fpos_t datatype , which I believe is guaranteed to be able to hold at least 64 bits.