Wierd output after sorting - c++

I am getting weird output after sorting
If giving input using scanf the line is causing error. The output is in some weird arrangement. (I have commented the line)
If I use cin the output is fine. Also the problem is not present in online compilers. Same thing is happening on different computers.
Eg if I input
5
23 44 32 2 233
the output is
32 23 233 2 44
Code:
#include <iostream>
#include <cstdio>
#include <cstring>
#include <cmath>
#include <algorithm>
#include <iomanip>
using namespace std;
int main()
{
unsigned long long int n=0,i=0;
// cin>>n;
scanf("%llu",&n);
unsigned long long int arr[n];
for(i=0;i<n;i++)
{
// cin>>arr[i]; //if use this no error but if use next line it is
scanf("%llu",&arr[i]); //causing error
}
sort(arr,arr+n);
for(i=0;i<n;i++)
{
// cout<<arr[i]<<" ";
printf("%llu ",arr[i]);
}
return 0;
}

If using cin helps, then probably %llu is wrong flag for given pointer. Check documentation for your compiler and see what long long means for it and check printf/scanf doc of your library.

There is all sorts of possible explanations. Most likely is a compiler (or library or runtime) in which scanf() doesn't properly support the %llu format with long long unsigned types.
long long types were not formally part of C before the 1999 standard, and not part of C++ before (from memory) the 2011 standard. A consequence is that, depending on age of your compiler, support varies from non-existent to partial (what you are seeing) to complete.
Practically, because of how C++ stream functions (operator>>() etc) are organised, it is arguably easier to change the C++ streaming to support a new type (like long long unsigned) than it is to change C I/O. C++ streams, by design, are more modular (operator>>() is a function overloaded for each type, so it is easier to add support for new types without breaking existing code that handles existing types). C I/O functions (like scanf()) are specified in a way that allows more monolithic implementations - which means that, to add support of new types and format specifiers, it is necessary to change/debug/verify/validate more existing code. That means it can reasonably take more effort - and therefore time - to change C I/O than C++ I/O. Naturally, YMMV in this argument - depending on skill of the library developer. But, in practice, it is a fair bet that an implementation of standard C++ streams is more modular - and therefore easier to maintain - by various measures than the C I/O functions.
Also, although you didn't ask, the construct unsigned long long int arr[n] where n is a variable is invalid C++. You are better off using a standard container, like std::vector.

What probably happens is that %llu is wrong for your compiler.
For example, if you compile your program with g++ in MinGW using the -Wall flag in order to get all the warnings, you get this:
a.cpp: In function 'int main()':
a.cpp:16:20: warning: unknown conversion type character 'l' in format [-Wformat=]
scanf("%llu",&n);
So the compiler will ignore the extra l and scan the input as %lu.
If you are compiling on a 32bit system, a long integer will most probably be 32bits long, i.e. 4 bytes.
So the number scanned will occupy 4 bytes in your memory and since the arr[n] array is a long long array, i.e. each element is 64bits - 8 bytes, only the first 4 bytes of each element will be written by scanf.
Assuming you are on a little endian system, these 4 bytes will be the least significant part of the long long element. The most significant part will not be written and will probably contain garbage.
The sort algorithm though, will use the full 8 bytes of each element for the sorting and so will sort the array using the garbage in the most significant part of each element.
Since you are using code blocks on a Windows 32bit system, try replacing all occurrences of "%llu" with "%I64u".

Related

How do I make the math come out correctly using `size_t` numbers? Or what’s a better way than `max_size()` to get the range of a `vector`/`string`?

I’m trying to get an idea of the exact number of bits (if possible) or bytes and the range of numbers the various data types could (if manipulated to do so) have on the compiler I’m using (onlinegdb). I tried to make my own, but I’m having an issue doing math in the compiler with size_t numbers. So far I’ve added to it:
#include <iostream>
#include <string>
#include <climits>
#include <cfloat>
#include <cwchar>
#include <bitset>
#include <vector>
using namespace std;
Getting an output like:
Char: 1 bytes
Range: -128 - 127
Unsigned Char: 1 bytes
Range: 0 - 255
et cetera: wchar_t, string, bitset<1>, bool, short, int, wint_t, long, long long, size_t, float, double, long double, mbstate_t, as well as their unsigned (when existing) and vector versions of all. It works until I got to calculating the range of the string and vectors. In order to calculate the number range of a string or vector I figured I could simply multiply the range of the type by the number of them my string/vector could hold (found using its max_size() function). However the math doesn’t work. (Edit §1)
The massive number produced by max_size() of a vector<char> (18446744073709551615) multiplied by the CHAR_MIN (-128) results in 128. Looks like it’s just CHAR_MIN multiplying by -1… So I investigated. (Edit §2.1)
I added #include <typeinfo> to check if the data types were mismatched, and they were: size_t from the function and int from the macro constants. Converting the size_t number to int just gets -1. So I guess that’s the problem. (Edit §2.2)
So I tried converting the int of CHAR_MIN to size_t, but that got me 18446744073709551488. Which multiplied by the function output still gets 128… A calculator is showing me the correct answer is around -2.3611832414*10^(21). I don’t know for sure what I’m doing wrong (rollover from max-reached?), but if there’s some header I can include that gives me a function that’ll do it for me that’d help too. (Edit §2.3)
Edit:
§1:
§1.1: Since user 463035818_is_not_a_number seems to think this wasn’t included:
This is where I defined “range.” In the second to last sentence of the paragraph following the output. The text there is identical to how it was in the first published draft.
§1.2: Since user Nathan Pierson assumed I’m saving the mathematic result to a variable:
“Output” refers to using an ostream like cout to display something on the console. I didn’t mention anything about storing the number because there is no need to save it to any variable. I’m not sure why a size limitation would be implemented for an output, but if it somehow is then it just circles back to the question I asked in the title.
§2: Since user 463035818_is_not_a_number said it wasn’t clear enough:
(I’m assuming you don’t need the cout part of the code.)
§2.1:
The word “multiplied” means using the * operator between 2 numbers:
vectorCharVariable.max_size() * CHAR_MIN
§2.2:
This is how you use #include <typeinfo> (a standard C++ header I assumed everyone answering would be familiar with) to get the name of the type:
With #include <vector>s data type function:
typeid(vectorCharVariable.max_size()).name()
With #include <climits> macro constant:
typeid(CHAR_MIN).name()
§2.3:
This is how you use type casting (a fundamental feature of C++ requiring no included header that I assumed everyone answering would be familiar with) to change the type of a variable:
Using functional syntax on the vector function’s output variable to make it int instead of size_t resulting in -1:
int(vectorCharVariable.max_size())
Using functional syntax on the macro constant’s output variable to make it size_t instead of int and multiplying it by the max size of the vector resulting in 128:
c.max_size() * size_t(CHAR_MIN)
Furthermore, the math is also wrong when interpreting the maximum:
c.max_size() * size_t(CHAR_MAX)
The expected result of 18446744073709551615 * 127 is 2.3427365e+21, not 18446744073709551489. Clearly the issue is the compiler reading the number in size_t (which is unsigned 64 bit) as signed 64 bit using TCP in the binary to decimal which make c.max_size() (64 1’s in binary) -1 and -1 * 127 is -127 (which is 18446744073709551489 in unsigned 64 bit). Thus the solution to my question would be how to make it stop interpreting the size_t numbers as signed in mathematical calculations.
§3: Since it wasn’t clear to user Daniel Langr:
Between the range of a data type and the number of combinations (“distinct values” as you call them) of a datatype I am trying to get the former (the range). Range is limited below the maximum number of combinations by the separation of variables. It is impossible to store the number 65536 in an array of 2 ‘char’s, one can only store at most the number 127 twice (127*2=254).
The number’s stored are at most 127 n number of times, where n is the maximum number of char variables in it. Maybe you’re thinking about how one could use a function or class to reinterpret the data, but the result of reinterpretations is not the number stored on the vector. If that’s the case you’re assuming extra steps that were not listed because they don’t exist.

effect of using sprintf / printf using %ld format string instead of %d with int data type

We have some legacy code that at one point in time long data types were refactored to int data types. During this refactor a number of printf / sprintf format statements were left incorrect as %ld instead of changed to %d. For example:
int iExample = 32;
char buf[200];
sprintf(buf, "Example: %ld", iExample);
This code is compiled on both GCC and VS2012 compilers. We use Coverity for static code analysis and code like in the example was flagged as a 'Printf arg type mismatch' with a Medium level of severity, CWE-686: Function Call With Incorrect Argument Type I can see this would be definitely a problem had the format string been that of an signed (%d) with an unsigned int type or something along these lines.
I am aware that the '_s' versions of sprintf etc are more secure, and that the above code can also be refactored to use std::stringstream etc. It is legacy code however...
I agree that the above code really should be using %d at the very least or refactored to use something like std::stringstream instead.
Out of curiosity is there any situation where the above code will generate incorrect results? As this legacy code has been around for quite some time and appears to be working fine.
UPDATED
Removed the usage of the word STL and just changed it to be std::stringstream.
As far as the standard is concerned, the behavior is undefined, meaning that the standard says exactly nothing about what will happen.
In practice, if int and long have the same size and representation, it will very likely "work", i.e., behave as if the correct format string has been used. (It's common for both int and long to be 32 bits on 32-bit systems).
If long is wider than int, it could still work "correctly". For example, the calling convention might be such that both types are passed in the same registers, or that both are pushed onto the stack as machine "words" of the same size.
Or it could fail in arbitrarily bad ways. If int is 32 bits and long is 64 bits, the code in printf that tries to read a long object might get a 64-bit object consisting of the 32 bits of the actual int that was passed combined with 32 bits of garbage. Or the extra 32 bits might consistently be zero, but with the 32 significant bits at the wrong end of the 64-bit object. It's also conceivable that fetching 64 bits when only 32 were passed could cause problems with other arguments; you might get the correct value for iExample, but following arguments might be fetched from the wrong stack offset.
My advice: The code should be fixed to use the correct format strings (and you have the tools to detect the problematic calls), but also do some testing (on all the C implementations you care about) to see whether it causes any visible symptoms in practice. The results of the testing should be used only to determine the priority of fixing the problems, not to decide whether to fix them or not. If the code visibly fails now, you should fix it now. If it doesn't, you can get away with waiting until later (presumably you have other things to work on).
It's undefined and depends on the implementation. On implementations where int and long have the same size, it will likely work as expected. But just try it on any system with 32-bit int and 64-bit long, especially if your integer is not the last format argument, and you're likely to get problems where printf reads 64 bits where only 32 were provided, the rest quite possibly garbage, and possibly, depending on alignment, the following arguments also cannot get accessed correctly.

fixed length data types in C/C++

I've heard that size of data types such as int may vary across platforms.
My first question is: can someone bring some example, what goes wrong, when program
assumes an int is 4 bytes, but on a different platform it is say 2 bytes?
Another question I had is related. I know people solve this issue with some typedefs,
like you have variables like u8,u16,u32 - which are guaranteed to be 8bits, 16bits, 32bits, regardless of the platform -- my question is, how is this achieved usually? (I am not referring to types from stdint library - I am curious manually, how can one enforce that some type is always say 32 bits regardless of the platform??)
I know people solve this issue with some typedefs, like you have variables like u8,u16,u32 - which are guaranteed to be 8bits, 16bits, 32bits, regardless of the platform
There are some platforms, which have no types of certain size (like for example TI's 28xxx, where size of char is 16 bits). In such cases, it is not possible to have an 8-bit type (unless you really want it, but that may introduce performance hit).
how is this achieved usually?
Usually with typedefs. c99 (and c++11) have these typedefs in a header. So, just use them.
can someone bring some example, what goes wrong, when program assumes an int is 4 bytes, but on a different platform it is say 2 bytes?
The best example is a communication between systems with different type size. Sending array of ints from one to another platform, where sizeof(int) is different on two, one has to take extreme care.
Also, saving array of ints in a binary file on 32-bit platform, and reinterpreting it on a 64-bit platform.
In earlier iterations of the C standard, you generally made your own typedef statements to ensure you got a (for example) 16-bit type, based on #define strings passed into the compiler for example:
gcc -DINT16_IS_LONG ...
Nowadays (C99 and above), there are specific types such as uint16_t, the exactly 16-bit wide unsigned integer.
Provided you include stdint.h, you get exact bit width types,at-least-that-width types, fastest types with a given minimum widthand so on, as documented in C99 7.18 Integer types <stdint.h>. If an implementation has compatible types, they are required to provide these.
Also very useful is inttypes.h which adds some other neat features for format conversion of these new types (printf and scanf format strings).
For the first question: Integer Overflow.
For the second question: for example, to typedef an unsigned 32 bits integer, on a platform where int is 4 bytes, use:
typedef unsigned int u32;
On a platform where int is 2 bytes while long is 4 bytes:
typedef unsigned long u32;
In this way, you only need to modify one header file to make the types cross-platform.
If there are some platform-specific macros, this can be achieved without modifying manually:
#if defined(PLAT1)
typedef unsigned int u32;
#elif defined(PLAT2)
typedef unsigned long u32;
#endif
If C99 stdint.h is supported, it's preferred.
First of all: Never write programs that rely on the width of types like short, int, unsigned int,....
Basically: "never rely on the width, if it isn't guaranteed by the standard".
If you want to be truly platform independent and store e.g. the value 33000 as a signed integer, you can't just assume that an int will hold it. An int has at least the range -32767 to 32767 or -32768 to 32767 (depending on ones/twos complement). That's just not enough, even though it usually is 32bits and therefore capable of storing 33000. For this value you definitively need a >16bit type, hence you simply choose int32_t or int64_t. If this type doesn't exist, the compiler will tell you the error, but it won't be a silent mistake.
Second: C++11 provides a standard header for fixed width integer types. None of these are guaranteed to exist on your platform, but when they exists, they are guaranteed to be of the exact width. See this article on cppreference.com for a reference. The types are named in the format int[n]_t and uint[n]_t where n is 8, 16, 32 or 64. You'll need to include the header <cstdint>. The C header is of course <stdint.h>.
usually, the issue happens when you max out the number or when you're serializing. A less common scenario happens when someone makes an explicit size assumption.
In the first scenario:
int x = 32000;
int y = 32000;
int z = x+y; // can cause overflow for 2 bytes, but not 4
In the second scenario,
struct header {
int magic;
int w;
int h;
};
then one goes to fwrite:
header h;
// fill in h
fwrite(&h, sizeof(h), 1, fp);
// this is all fine and good until one freads from an architecture with a different int size
In the third scenario:
int* x = new int[100];
char* buff = (char*)x;
// now try to change the 3rd element of x via buff assuming int size of 2
*((int*)(buff+2*2)) = 100;
// (of course, it's easy to fix this with sizeof(int))
If you're using a relatively new compiler, I would use uint8_t, int8_t, etc. in order to be assure of the type size.
In older compilers, typedef is usually defined on a per platform basis. For example, one may do:
#ifdef _WIN32
typedef unsigned char uint8_t;
typedef unsigned short uint16_t;
// and so on...
#endif
In this way, there would be a header per platform that defines specifics of that platform.
I am curious manually, how can one enforce that some type is always say 32 bits regardless of the platform??
If you want your (modern) C++ program's compilation to fail if a given type is not the width you expect, add a static_assert somewhere. I'd add this around where the assumptions about the type's width are being made.
static_assert(sizeof(int) == 4, "Expected int to be four chars wide but it was not.");
chars on most commonly used platforms are 8 bits large, but not all platforms work this way.
Well, first example - something like this:
int a = 45000; // both a and b
int b = 40000; // does not fit in 2 bytes.
int c = a + b; // overflows on 16bits, but not on 32bits
If you look into cstdint header, you will find how all fixed size types (int8_t, uint8_t, etc.) are defined - and only thing differs between different architectures is this header file. So, on one architecture int16_tcould be:
typedef int int16_t;
and on another:
typedef short int16_t;
Also, there are other types, which may be useful, like: int_least16_t
If a type is smaller than you think then it may not be able to store a value you need to store in it.
To create a fixed size types you read the documentation for platforms to be supported and then define typedefs based on #ifdef for the specific platforms.
can someone bring some example, what goes wrong, when program assumes an int is 4 bytes, but on a different platform it is say 2 bytes?
Say you've designed your program to read 100,000 inputs, and you're counting it using an unsigned int assuming a size of 32 bits (32-bit unsigned ints can count till 4,294,967,295). If you compile the code on a platform (or compiler) with 16-bit integers (16-bit unsigned ints can count only till 65,535) the value will wrap-around past 65535 due to the capacity and denote a wrong count.
Compilers are responsible to obey the standard. When you include <cstdint> or <stdint.h> they shall provide types according to standard size.
Compilers know they're compiling the code for what platform, then they can generate some internal macros or magics to build the suitable type. For example, a compiler on a 32-bit machine generates __32BIT__ macro, and previously it has these lines in the stdint header file:
#ifdef __32BIT__
typedef __int32_internal__ int32_t;
typedef __int64_internal__ int64_t;
...
#endif
and you can use it.
bit flags are the trivial example. 0x10000 will cause you problems, you can't mask with it or check if a bit is set in that 17th position if everything is being truncated or smashed to fit into 16-bits.

Maximum amount of data that can be sent using MPI::Send

With the syntax for MPI::Isend as
MPI::Request MPI::Comm::Isend(const void *buf, int count,
const MPI::Datatype& datatype,
int dest, int tag) const;
is the amount of data sent limited by
std::numeric_limits<int>::max()
Many other MPI functions have int parameter. Is this a limitation of MPI?
MPI-2.2 defines data length parameters as int. This could be and usually is a problem on most 64-bit Unix systems since int is still 32-bit. Such systems are referred to as LP64, which means that long and pointers are 64-bit long, while int is 32-bit in length. In contrast, Windows x64 is an LLP64 system, which means that both int and long are 32-bit long while long long and pointers are 64-bit long. Linux for 64-bit x86 CPUs is an example of such a Unix-like system which is LP64.
Given all of the above MPI_Send in MPI-2.2 implementations have a message size limit of 2^31-1 elements. One can overcome the limit by constructing a user-defined type (e.g. a contiguous type), which would reduce the amount of data elements. For example, if you register a contiguous type of 2^10 elements of some basic MPI type and then you use MPI_Send to send 2^30 elements of this new type, it would result in a message of 2^40 elements of the basic type. Some MPI implementations may still fail in such cases if they use int to handle elements count internally. Also it breaks MPI_Get_elements and MPI_Get_count as their output count argument is of type int.
MPI-3.0 addresses some of these issues. For example, it provides the MPI_Get_elements_x and MPI_Get_count_x operations which use the MPI_Count typedef for their count argument. MPI_Count is defined so as to be able to hold pointer values, which makes it 64-bit long on most 64-bit systems. There are other extended calls (all end in _x) that take MPI_Count instead of int. The old MPI_Get_elements / MPI_Get_count operations are retained, but now they would return MPI_UNDEFINED if the count is larger than what the int output argument could hold (this clarification is not present in the MPI-2.2 standard and using very large counts in undefined behaviour there).
As pyCthon has already noted, the C++ bindings are deprecated in MPI-2.2 and were removed from MPI-3.0 as no longer supported by the MPI Forum. You should either use the C bindings or resort to 3rd party C++ bindings, e.g. Boost.MPI.
I haven't done MPI, however, int is the usual limiting size of an array, and I would suspect that is where the limitation comes from.
In practice, this is a fairly high limit. Do you have a need to send more than 4 GB of data? (In a single Isend)
For more information, please see Is there a max array length limit in C++?
Do note that link makes references to size_t, rather than int (Which, for all intents, allows almost unlimited data, at least, in 2012) - however, in the past, 'int' was the usual type for such counts, and while size_t should be used, in practice, a lot of code is still using 'int'.
The maximum size of an MPI_Send will be limited by the maximum amount of memory you can allocate
and most MPI implementations supportsizeof(size_t)
This issue and a number of workarounds (with code) are discussed on https://github.com/jeffhammond/BigMPI. In particular, this project demonstrates how to send more than INT_MAX elements via user-defined datatypes.

c++: working with bytes

My problem is, that I need to load a binary file and work with single bits from the file. After that I need to save it out as bytes of course.
My main problem is - what datatype to choose to work in - char or long int? Can I somehow work with chars?
Unless performance is mission-critical here, use whatever makes your code easiest to understand and maintain.
Before beginning to code any thing make sure you understand endianess, c++ type sizes, and how strange they might be.
The unsigned char is the only type that is a fixed size (natural byte of the machine, normally 8 bits). So if you design for portability that is a safe bet. But it isn't hard to just use the unsigned int or even a long long to speed up the process and use size_of to find out how many bits you are getting in each read, although the code gets more complex that way.
You should know that for true portability none of the internal types of c++ is fixed. An unsigned char might have 9 bits, and the int might be as small as in the range of 0 to 65535, as noted in this and this answer
Another alternative, as user1200129 suggests, is to use the boost integer library to reduce all these uncertainties. This is if you have boost available on your platform. Although if going for external libraries there are many serializing libraries to choose from.
But first and foremost before even start optimizing, make something simple that work. Then you can start profiling when you start experiencing timing issues.
It really just depends on what you are wanting to do, but I would say in general, the best speed will be to stick with the size of integers that your program is compiled in. So if you have a 32 bit program, then choose 32 bit integers, and if you have 64 bit, choose 64 bit.
This could be different if there are some bytes in your file, or if there are integers. Without knowing the exact structure of your file, it's difficult to determine what the optimal value is.
Your sentences are not really correct English, but as far as I can interpret the question you can beter use unsigned char (which is a byte) type to be able to modify each byte separately.
Edit: changed according to comment.
If you are dealing with bytes then the best way to do this is to use a size specific type.
#include <algorithm>
#include <iterator>
#include <cinttypes>
#include <vector>
#include <fstream>
int main()
{
std::vector<int8_t> file_data;
std::ifstream file("file_name", std::ios::binary);
//read
std::copy(std::istream_iterator<int8_t>(file),
std::istream_iterator<int8_t>(),
std::back_inserter(file_data));
//write
std::ofstream out("outfile");
std::copy(file_data.begin(), file_data.end(),
std::ostream_iterator<int8_t>(out));
}
EDIT fixed bug
If you need to enforce how many bits are in an integer type, you need to be using the <stdint.h> header. It is present in both C and C++. It defines type such as uint8_t (8-bit unsigned integer), which are guaranteed to resolve to the proper type on the platform. It also tells other programmers who read your code that the number of bits is important.
If you're worrying about performance, you might want to use the larger-than-8-bits types, such as uint32_t. However, when reading and writing files, you will need to pay attention to the endianess of your system. Notably, if you have a little-endian system (e.g. x86, most all ARM), then the 32-bit value 0x12345678 will be written to the file as the four bytes 0x78 0x56 0x34 0x12, while if you have a big-endian system (e.g. Sparc, PowerPC, Cell, some ARM, and the Internet), it will be written as 0x12 0x34 0x56 0x78. (same goes or reading). You can, of course, work with 8-bit types and avoid this issue entirely.