file handling routines on Windows

file handling routines on Windows - c++

Is it allowed to mix different file handling functions in a one system e.g.
fopen() from cstdio
open() from fstream
CreateFile from Win API ?
I have a large application with a lot of legacy code and it seems that all three methods are used within this code. What are potential risks and side effects ?

Yes, you can mix all of that together. It all boils down to the CreateFile call in any case.
Of course, you can't pass a file pointer to CloseHandle and expect it to work, nor can you expect a handle opened from CreateFile to work with fclose.
Think of it exactly the same way you think of malloc/free vs new/delete in C++. Perfectly okay to use concurrently so long as you don't mix them.

It is perfectly OK to use all of these file methods, as long as they don't need to interact. The minute you need to pass a file opened with one method into a function that assumes a different method, you'll find that they're incompatible.
As a matter of style I would recommend picking one and sticking to it, but if the code came from multiple sources that may not be possible. It would be a big refactoring effort to change the existing code, without much gain.

Your situation isn't that uncommon.
Code that is designed to be portable is usually written using standard file access routines (fopen, open, etc). Code that is OS-specific is commonly written using that OS's native API. Your large application is most likely a combination of these two types of code. You should have no problem mixing the file access styles in the same program as long as you remember to keep them straight (they are not interchangeable).
The biggest risk involved here is probably portability. If you have legacy code that has been around for a while, it probably uses the standard C/C++ file access methods, especially if it pre-dates the Win32 API. Using the Win32 API is acceptable, but you must realize that you are binding your code to the scope and lifetime of that API. You will have to do extra work to port that code to another platform. You will also have to re-work this code if, say, in the future Microsoft obsoletes the Win32 API in favor of something new. The standard C/C++ methods will always be there, constant and unchanging. If you want to help future-proof your code, stick to standard methods and functions as much as possible. At the same time, there are some things that require the Win32 API and can't be done using standard functions.
If you are working with a mix of C-style, C++-style, and Win32-style code, then I would suggest separating (as best as is reasonably possible) your OS-specific code and your portable code into separate modules with well-defined APIs. If you have to re-write your Win32 code in the future, this can make things easier.

Related

How does the C++ standard library work behind the scenes?

This question has been bothering me so much for the past couple of days. I was wondering how the standard library works, in terms of functionality. I couldn't find an answer anywhere, even by checking the source code provided by the LLVM compiler which is, for a beginner like me, a really complicated piece of code.
What I'm basically trying to understand here is how does the C++ standard library work. For example let's take the fstream header file which consist of a bunch of functions that help to write to and read from files.
How does it work? Does it use the OS specific API (since the library is cross platform), or what? And, if the standard library can do it, aren't I supposed to be able to mess with some files as well without calling the standard fstream file (which to my experience I can't do)?
I apologize if my questions are unclear since I'm not a native English speaker: feel free to modify this text so as to make it clearer.

Does it use the OS specific API (since the library is cross platform), or what?
At some point, the OS specific API is used. The fstream implementation does not necessarily call an OS function directly. It might use other classes, which call functions inherited from C, etc., but eventually the call chain will lead to an OS call. (Yes, the details are often too complicated for an intermediate programmer to follow. So, as a self-described beginner, your findings are not surprising.)
The library is cross-platform in the sense that on your end (the C++ programmer), the interface is the same regardless of platform. It is not, however, the same library on every platform. Each platform has its own library, exposing the same interface on the C++ side, but making use of different OS calls. (In fact, the same platform might have multiple standard libraries, as the library implementation is provided by your toolchain, not by the standards committee.)
And, if the standard library can do it, aren't I supposed to be able to mess with some files as well without calling the standard fstream file (which to my experience I can't do)?
Yes, you are allowed to. Apparently, you have not been able to yet, but with some practice and guidance you should be able to. Everything in the standard library can be recreated in your own code. The point of the standard library (and most libraries, for that matter) is to save you time, not to enable something that was otherwise unavailable. For example, you don't have to implement a file stream for every program you write; it's in the standard library so you can focus on more interesting aspects of your project.

A compiler is just a program which create executable file or library. You can use the compiler default libraries to gain time or write your own. The default libraries communicate with the os for file operation or memory allocation and provide a simple standard classes to allow the developper to write only one code which work on all target platforms supported by the compiler and the libraries. If you want to write your own you have to write each function for all your target os.

The standard library is cross-platform in a sense that its interface does not change between platforms but its implementation does - or in practical terms - if you only use C++ and its standard library, you can write your code the same way for Linux / Windows / MacOS / Android / Whatever and if you find a C++ compiler for one of those platforms that supports the language features you used, you will be able to compile your code for that platform without rewriting anything.
So while you can use std::vector or std::fstream or any other feature in the library independently of the platform you're writing for and expect the function definitions, type names, etc. to look the same, you cannot expect the executable which you compiled for PC with Windows 10 to run on a phone with Android. You cannot even expect the same executable to run on the same PC but with different system - that is what I mean by "the implementation is different"
There are two main reasons for this difference:
Processors with different architectures (x86-64 and ARM for example) use different instruction sets and as such the C++ source would need to be compiled to a completely different machine code to run properly
Computers with processors of the same architecture which have a different operating system have different ways of dynamically allocating memory, creating files, creating streams, writing to console, creating and scheduling threads etc. - which is part of the system functionality that you use via the standard library
If you really wanted to you could use HeapAlloc() instead of operator new() or CreateThread() instead of stdlib's std::thread but that would force you to both rewrite your program every time you wanted to compile it for something else than Windows and recompile it with the target platform's compiler (and by proxy learn its API). Standard library saves you from that trouble by abstracting away those system calls.
As for the fstream in particular, here is what it uses internally on most PCs nowadays.

Basically, fstream, iostream and printf works based on a kernel function write(). When your code call printf (we use printf as an example), it will finally call write() to let the kernel work on the IO stuff. After that, write() returns and printf returns and your code continues.
So if you really want to know how the printf works internally, you have to read the source code of the Kernel.
But you shouldn't do that for now.
For a beginner, do not try to go deeper when you haven't got a basic cognition about computer. A computer is a project, just like a building. So the right way to learn it is to learn it level by level. First, learning how to use brick and cement to build a building, this is what you should do for now. What you shouldn't do is that you are learning how to build a building and this is your first time to try to use brick, then you are interested in how to produce a brick and start to focus on brick, this is a wrong way to learn IT.
If you are learning C/C++, just learn it. Remember, learn it level by level. For now, knowing how to use printf is enough.

When opening a file, C or C++?

Currently I'm using C++ for a program I'm writing and I have an, a query regarding Old v New.
Looking online I see people using the C++ syntax to open files. I'm moving to C++ so I should keep up with the times however it occurs to me is it better using the C way or the latter? I'm taking into consideration:
Security.
Speed.
Memory usage. (Although I think I might have an idea on this one.)
Thank you.

Neither plain C nor plain C++ gives you access to security features of the OS. In Windows and Linux/UNIX there are file system related security features and you have to use them in order to set or query file access rights.
Whether you're writing in C or in C++, security remains your responsibility. Neither of the languages frees you from things like input validation and error checks.
File I/O speed should be about the same on the same platform with the same compiler, unless you use different buffering modes or different sizes of buffers. The same should be true for the amount of memory used implicitly in the file I/O functions.
If you're writing in C++, you should generally use C++ I/O functions, unless there's something you can't do with them (e.g. you can't access OS-specific functionality and therefore are forced to use plain C functions provided by your OS).

Use the functionality that the C++ standard library provides you. If, and only if, you run into problems with speed or memory, start profiling and exploring other options.

Which is more efficient:A Win32 func. or a similar CRT func. in a VC++ app.?

I started win32 programming for fun because I like complex things and I like programming(this is all Charles Petzold's and Jeffrey Richter's fault for writing such beautiful books.) and may be because I have a thing for performance code.
Now, the real question:I'll use the example of GetEnvironmentVariable()[a win32 API func.] and getenv()[a standard CRT func.].
Both of these return the value of an environment variable provided as an argument.
So using which one would be more efficient or in other words which one has a shorter call stack of which one is more direct?Think of some func. being called a million times.
I believe that either of them maps to the other.Am I right or I'm missing something here.
Summary:While programming for win32 api, if there are functions available in both the api and the c/c++ libraries that offer same functionality, which one should I use?
thanks.

For most apps, it's unlikely that use of one or other API will be the primary performance concern.
The CRT and C++ Standard Library is mapped onto Win32 APIs so using Win32 direct will be slightly more efficient. If you need to write portable C code, use the CRT though.
In C++, most often, using the Standard Library allows easier production of idiomatically-correct code, and that outweighs any marginal performance gain from going direct to Win32.
getenv is perhaps not a great example because the mapping to Win32 is trivial. Consider instead reproducing <iostream> using the Win32 APIs, and the benefit of a good library becomes clearer.

Stick with the CRT. It maps to the WinAPI, but not necessarily directly. For example, printf() might map to WriteConsole, but with buffering for performance. If GetEnvironmentVariable() doesn't need any code wrapped around it, then getenv() will be the same performance, and if it does (such as buffering), then the CRT shall provide it. And it's "right", not "write".

Both functions are likely to be similar in performance, probably ending reading the values from the registry. But more importantly, there is no reason they should ever become a critical performance issue: registry is a database, if you need using some value from registry again and again, you cache it in some variable.

How to design a C / C++ library to be usable in many client languages? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I'm planning to code a library that should be usable by a large number of people in on a wide spectrum of platforms. What do I have to consider to design it right? To make this questions more specific, there are four "subquestions" at the end.
Choice of language
Considering all the known requirements and details, I concluded that a library written in C or C++ was the way to go. I think the primary usage of my library will be in programs written in C, C++ and Java SE, but I can also think of reasons to use it from Java ME, PHP, .NET, Objective C, Python, Ruby, bash scrips, etc... Maybe I cannot target all of them, but if it's possible, I'll do it.
Requirements
It would be to much to describe the full purpose of my library here, but there are some aspects that might be important to this question:
The library itself will start out small, but definitely will grow to enormous complexity, so it is not an option to maintain several versions in parallel.
Most of the complexity will be hidden inside the library, though
The library will construct an object graph that is used heavily inside. Some clients of the library will only be interested in specific attributes of specific objects, while other clients must traverse the object graph in some way
Clients may change the objects, and the library must be notified thereof
The library may change the objects, and the client must be notified thereof, if it already has a handle to that object
The library must be multi-threaded, because it will maintain network connections to several other hosts
While some requests to the library may be handled synchronously, many of them will take too long and must be processed in the background, and notify the client on success (or failure)
Of course, answers are welcome no matter if they address my specific requirements, or if they answer the question in a general way that matters to a wider audience!
My assumptions, so far
So here are some of my assumptions and conclusions, which I gathered in the past months:
Internally I can use whatever I want, e.g. C++ with operator overloading, multiple inheritance, template meta programming... as long as there is a portable compiler which handles it (think of gcc / g++)
But my interface has to be a clean C interface that does not involve name mangling
Also, I think my interface should only consist of functions, with basic/primitive data types (and maybe pointers) passed as parameters and return values
If I use pointers, I think I should only use them to pass them back to the library, not to operate directly on the referenced memory
For usage in a C++ application, I might also offer an object oriented interface (Which is also prone to name mangling, so the App must either use the same compiler, or include the library in source form)
Is this also true for usage in C# ?
For usage in Java SE / Java EE, the Java native interface (JNI) applies. I have some basic knowledge about it, but I should definitely double check it.
Not all client languages handle multithreading well, so there should be a single thread talking to the client
For usage on Java ME, there is no such thing as JNI, but I might go with Nested VM
For usage in Bash scripts, there must be an executable with a command line interface
For the other client languages, I have no idea
For most client languages, it would be nice to have kind of an adapter interface written in that language. I think there are tools to automatically generate this for Java and some others
For object oriented languages, it might be possible to create an object oriented adapter which hides the fact that the interface to the library is function based - but I don't know if its worth the effort
Possible subquestions
is this possible with manageable effort, or is it just too much portability?
are there any good books / websites about this kind of design criteria?
are any of my assumptions wrong?
which open source libraries are worth studying to learn from their design / interface / souce?
meta: This question is rather long, do you see any way to split it into several smaller ones? (If you reply to this, do it as a comment, not as an answer)

Mostly correct. Straight procedural interface is the best. (which is not entirely the same as C btw(**), but close enough)
I interface DLLs a lot(*), both open source and commercial, so here are some points that I remember from daily practice, note that these are more recommended areas to research, and not cardinal truths:
Watch out for decoration and similar "minor" mangling schemes, specially if you use a MS compiler. Most notably the stdcall convention sometimes leads to decoration generation for VB's sake (decoration is stuff like #6 after the function symbol name)
Not all compilers can actually layout all kinds of structures:
so avoid overusing unions.
avoid bitpacking
and preferably pack the records for 32-bit x86. While theoretically slower, at least all compilers can access packed records afaik, and the official alignment requirements have changed over time as the architecture evolved
On Windows use stdcall. This is the default for Windows DLLs. Avoid fastcall, it is not entirely standarized (specially how small records are passed)
Some tips to make automated header translation easier:
macros are hard to autoconvert due to their untypeness. Avoid them, use functions
Define separate types for each pointer types, and don't use composite types (xtype **) in function declarations.
follow the "define before use" mantra as much as possible, this will avoid users that translate headers to rearrange them if their language in general requires defining before use, and makes it easier for one-pass parsers to translate them. Or if they need context info to auto translate.
Don't expose more than necessary. Leave handle types opague if possible. It will only cause versioning troubles later.
Do not return structured types like records/structs or arrays as returntype of functions.
always have a version check function (easier to make a distinction).
be careful with enums and boolean. Other languages might have slightly different assumptions. You can use them, but document well how they behave and how large they are. Also think ahead, and make sure that enums don't become larger if you add a few fields, break the interface. (e.g. on Delphi/pascal by default booleans are 0 or 1, and other values are undefined. There are special types for C-like booleans (byte,16-bit or 32-bit word size, though they were originally introduced for COM, not C interfacing))
I prefer stringtypes that are pointer to char + length as separate field (COM also does this). Preferably not having to rely on zero terminated. This is not just because of security (overflow) reasons, but also because it is easier/cheaper to interface them to Delphi native types that way.
Memory always create the API in a way that encourages a total separation of memory management. IOW don't assume anything about memory management. This means that all structures in your lib are allocated via your own memory manager, and if a function passes a struct to you, copy it instead of storing a pointer made with the "clients" memory management. Because you will sooner or later accidentally call free or realloc on it :-)
(implementation language, not interface), be reluctant to change the coprocessor exception mask. Some languages change this as part of conforming to their standards floating point error(exception-)handling.
Always pair a callbacks with an user configurable context. This can be used by the user to give the the callback state without defining global variables. (like e.g. an object instance)
be careful with the coprocessor status word. It might be changed by others and break your code, and if you change it, other code might stop working. The status word is generally not saved/restored as part of calling conventions. At least not in practice.
don't use C style varargs parameters. Not all languages allow variable number of parameters in an unsafe way
(*) Delphi programmer by day, a job that involves interfacing a lot of hardware and thus translating vendor SDK headers. By night Free Pascal developer, in charge of, among others, the Windows headers.
(**)
This is because what "C" means binary is still dependant on the used C compiler, specially if there is no real universal system ABI. Think of stuff like:
C adding an underscore prefix on some binary formats (a.out, Coff?)
sometimes different C compilers have different opinions on what to do with small structures passed by value. Officially they shouldn't support it at all afaik, but most do.
structure packing sometimes varies, as do details of calling conventions (like skipping
integer registers or not if a parameter is registerable in a FPU register)
===== automated header conversions ====
While I don't know SWIG that well, I know and use some delphi specific header tools( h2pas, Darth/headconv etc).
However I never use them in fully automatic mode, since more often then not the output sucks. Comments change line or are stripped, and formatting is not retained.
I usually make a small script (in Pascal, but you can use anything with decent string support) that splits a header up, and then try a tool on relatively homogeneous parts (e.g. only structures, or only defines etc).
Then I check if I like the automated conversion output, and either use it, or try to make a specific converter myself. Since it is for a subset (like only structures) it is often way easier than making a complete header converter. Of course it depends a bit what my target is. (nice, readable headers or quick and dirty). At each step I might do a few substitutions (with sed or an editor).
The most complicated scheme I did for Winapi commctrl and ActiveX/comctl headers. There I combined IDL and the C header (IDL for the interfaces, which are a bunch of unparsable macros in C, the C header for the rest), and managed to get the macros typed for about 80% (by propogating the typecasts in sendmessage macros back to the macro declaration, with reasonable (wparam,lparam,lresult) defaults)
The semi automated way has the disadvantage that the order of declarations is different (e.g. first constants, then structures then function declarations), which sometimes makes maintenance a pain. I therefore always keep the original headers/sdk to compare with.
The Jedi winapi conversion project might have more info, they translated about half of the windows headers to Delphi, and thus have enormous experience.

I don't know but if it's for Windows then you might try either a straight C-like API (similar to the WINAPI), or packaging your code as a COM component: because I'd guess that programming languages might want to be able to invoke the Windows API, and/or use COM objects.

Regarding automatic wrapper generation, consider using SWIG. For Java, it will do all the JNI work. Also, it is able to translate complex OO-C++-interfaces properly (provided you follow some basic guidelines, i.e. no nested classes, no over-use of templates, plus the ones mentioned by Marco van de Voort).

Think C, nothing else. C is one of the most popular programming languages. It is widely used on many different software platforms, and there are few computer architectures for which a C compiler does not exist. All popular high-level languages provide an interface to C. That makes your library accessible from almost all platforms in existence. Don't worry too much about providing an Object Oriented interface. Once you have the library done in C, OOP, functional or any other style interface can be created in appropriate client languages. No other systems programming language will give you C's flexibility and potability.

NestedVM I think is going to be slower than pure Java because of the array bounds checking on the int[][] that represents the MIPS virtual machine memory. It is such a good concept but might not perform well enough right now (until phone manufacturers add NestedVM support (if they do!), most stuff is going to be SLOW for now, n'est-ce pas)? Whilst it may be able to unpack JPEGs without error, speed is of no small concern! :)
Nothing else in what you've written sticks out, which isn't to say that it's right or wrong! The principles sound (mainly just listening to choice of words and language to be honest) like roughly standard best practice but I haven't thought through the details of everything you've said. As you said yourself, this really ought to be several questions. But of course doing this kind of thing is not automatically easy just because you're fixed on perhaps a slightly different architecture to the last code base you've worked on...! ;)
My thoughts:
All your comments on C interface compatibility sound sensible to me, pretty much best practice except you don't seem to properly address memory management policy - some sentences a bit ambiguous/vague/wrong-sounding. The design of the memory management will be to a large extent determined by the access patterns made in your application, rather than the functionality per se. I suiggest you study others' attempts at making portable interfaces like the standard ANSI C API, Unix API, Win32 API, Cocoa, J2SE, etc carefully.
If it was me, I'd write the library in a carefully chosen subset of the common elements of regular Java and Davlik virtual machine Java and also write my own custom parser that translates the code to C for platforms that support C, which would of course be most of them. I would suggest that if you restrict yourself to data types of various size ints, bools, Strings, Dictionaries and Arrays and make careful use of them that will help in cross-platform issues without affecting performance much most of the time.

your assumptions seem ok, but i see trouble ahead, much of which you have already spotted in your assumptions.
As you said, you can't really export c++ classes and methods, you will need to provide a function based c interface. What ever facade you build around that, it will remain a function based interface at heart.
The basic problem i see with that is that people choose a specific language and its runtime because their way of thinking (functional or object oriented) or the problem they address (web programming, database,...) corresponds to that language in some way or other.
A library implemented in c will probably never feel like the libraries they are used to, unless they program in c themselves.
Personally, I would always prefer a library that "feels like python" when I use python, and one that feels like java when I do Java EE, even though I know c and c++.
So your effort might be of little actual use (other than your gain in experience), because people will probably want to stick with their mindset, and rather re-implement the functionality than use a library that does the job, but does not fit.
I also fear the desired portability will seriously hamper development. Just think of the infinite build settings needed, and tests for that. I have worked on a project that tried to maintain compatibility for 5 operating systems (all posix-like, but still) and about 10 compilers, the builds were a nightmare to test and maintain.

Give it an XML interface, whether passed as a parameter and return value or as files through a command-line invocation. This may not seem as direct as a normal function interface, but is the most practical way to access an executable from, e.g., Java.

C library vs WinApi

Many of the standard c library (fwrite, memset, malloc) functions have direct equivalents in the Windows Api (WriteFile, FillMemory/ ZeroMemory, GlobalAlloc).
Apart from portability issues, what should be used, the CLIB or Windows API functions?
Will the C functions call the Windows Api functions or is it the other way around?

There's nothing magical about the C library. It's just a standardized API for accessing common services from the OS. That means it's implemented on top of the OS, using the API's provided by the OS.
Use whichever makes sense in your situation. The C library is portable, Win32 isn't. On the other hand, Win32 is often more flexible, and exposes more functionality.

The functions aren't really equivalent with the exception of some simple things like ZeroMemory.
GlobalAlloc for example gives you memory, but it was used for shared memory transfer under win16 as well. Parts of this functionality still exist.
WriteFile will not only write to files but to (among others) named pipes as well. Something fwrite or write can't directly do.
I'd say use c library functions if possible and use the windows functions only if you need the extra functionality or if you get a performance improvement.
This will make porting to other platforms easier later on.

It's probably more information than you're looking for (and maybe not exactly what you asked) but Catch22.net has an article entitled "Techniques for reducing Executable size" that may help point out the differences in Win32 api calls and c runtime calls.

Will the C functions call the winapi functions or is it the other way around?
The C functions (which are implemented in a user-mode library) call the WINAPI functions (which are implemented in the O/S kernel).

If you're going to port your application across multiple platforms I would say that you should create your own set of wrappers, even if you use the standard C functions. That will give you the best flexibility when switching platforms as your code will be insulated from the underlying system in a nicer way.
Of course that means if you're only going to program for Windows platforms then just use the Windows functions, and don't use the standard C library functions of the same type.
In the end you just need to stay consistent with your choice and you'll be fine.

C functions will call the system API, the Standard Runtime C Library (or CRT) is an API used to standardize among systems.
Internally, each system designs its own API directly using system calls or drivers. If you have a commercial version of Visual C++, it used to provide the CRT source code, this is interesting reading.

A few additional points on some examples:
FillMemory, ZeroMemory
Neither these nor the C functions are system calls, so either one might be implemented on top of the other, or they could even have different implementations, coming from a common source or not.
GlobalAlloc
Since malloc() is built on top of operating system primitives exposed by its API, it would be interesting to know if malloc() and direct usage of such allocators coexist happily without problems. I might imagine of some reasons why malloc might silently assume that the heap it accesses is contiguous, even if I would call that a design bug, even if it were documented, unless the additional cost for the safety were non insignificant.

Well, I'm currently trying to avoid including fstream, sstream, iostream and many C standard library files and using winAPIs instead because including any of these libraries increases the size of the output executable from about 10 KB to about 500 KB.
But sometimes it's better to use the C standard library to make your code cross-platform.
So I think it depends on your goal.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js