I looked everywhere and can't find an answer to this specific question :(
I have a string date, which contains the date with all the special characters stripped away. (i.e : yyyymmddhhmm or 201212031204).
I'm trying to convert this string into an int to be able to sort them later. I tried atoi, did not work because the value is too high for the function. I tried streams, but it always returns -858993460 and I suspect this is because the string is too large too. I tried atol and atoll and they still dont give the right answer.
I'd rather not use boost since this is for a homework, I dont think i'd be allowed.
Am I out of options to convert a large string to an int ?
Thank you!
What i'd like to be able to do :
int dateToInt(string date)
{
date = date.substr(6,4) + date.substr(3,2) + date.substr(0,2) + date.substr(11,2) + date.substr(14,2);
int d;
d = atoi(date.c_str());
return d;
}
You get negative numbers because 201212031204 is too large to fit int. Consider using long longs
BTW, You may sort strings as well.
You're on the right track that the value is too large, but it's not just for those functions. It's too large for an int in general. ints only hold up to 32 bits, or a maximum value of 2147483647 (4294967295 if unsigned). A long long is guaranteed to be large enough for the numbers you're using. If you happen to be on a 64-bit system, a long will be too.
Now, if you use one of these larger integers, a stream should convert properly. Or, if you want to use a function to do it, have a look at atoll for a long long or atol for a long. (Although for better error checking, you should really consider strtoll or strtol.)
Completely alternatively, you could also use a time_t. They're integer types under the hood, so you can compare and sort them. And there's some nice functions for them in <ctime> (have a look at http://www.cplusplus.com/reference/ctime/).
typedef long long S64;
S64 dateToInt(char * s) {
S64 retval = 0;
while (*s) {
retval = retval * 10 + (*s - '0');
++s;
}
return retval;
}
Note that as has been stated, the numbers you're working with will not fit into 32 bits.
Related
I am dealing with data in a vector of std::bitset<16>, which i both have to convert to and from unsigned long (through std::bitset::to_ulong()) and to and from strings using a self-made function (the exact algorithm is irrelavant for this question)
the convertions between bitset vector and string does at first seem to work fine, since that if i first convert a vector of bitsets to string and then back to bitset it is identical; which i have proven by making a program which includes this:
for (std::bitset<16>& B : my_bitset16vector) std::cout<<B<<std::endl;//print bitsets before conversion
bitset_to_string(my_bitset16vector,my_str);
string_to_bitset(my_bitset16vector,my_str);
std::cout<<std::endl
for (std::bitset<16>& B : my_bitset16vector) std::cout<<B<<std::endl;//print bitsets after conversion
the output could look somewhat like this (in this case with only 4 bitsets):
1011000011010000
1001010000011011
1110100001101111
1001000011001111
1011000011010000
1001010000011011
1110100001101111
1001000011001111
Judging by this, the bitsets before and after conversion are clearly identical, however despite this, the bitsets converts completely differently when i tell them to convert to unsigned long; in a program which could look like this:
for (std::bitset<16>& B : my_bitset16vector) std::cout<<B<<".to_ulong()="<<B.to_ulong()<<std::endl;//print bitsets before conversation
bitset_to_string(my_bitset16vector,my_str);
string_to_bitset(my_bitset16vector,my_str);
std::cout<<std::endl
for (std::bitset<16>& B : my_bitset16vector) std::cout<<B<<".to_ulong()="<<B.to_ulong()<<std::endl;//print bitsets after conversion
the output could look somewhat like this:
1011000011010000.to_ulong()=11841744
1001010000011011.to_ulong()=1938459
1110100001101111.to_ulong()=22472815
1001000011001111.to_ulong()=18649295
1011000011010000.to_ulong()=45264
1001010000011011.to_ulong()=37915
1110100001101111.to_ulong()=59503
1001000011001111.to_ulong()=37071
firstly it is obvious that the bitsets still beyond all reasonable doubt are identical when displayed as binary, but when converted to unsigned long, the identical bitsets return completely different values (completely ruining my program)
Why is this? can it be that the bitsets are unidentical, even though they print as the same? can the error exist within my bitset to and from string converters, despite the bitsets being identical?
edit: not all programs including my conversations has this problem, it only happens when i have modified the bitset after creating it (from a string), in my case in an attempt to encrypt the bitset, which simply can not be cut down to something simple and short, but in my most compressed way of writing it looks like this:
(and that is even without including the defintion of the public key struct and the modular power function)
int main(int argc, char**argv)
{
if (argc != 3)
{
std::cout<<"only 2 arguments allowed: plaintext user"<<std::endl;
return 1;
}
unsigned long k=123456789;//any huge number loaded from an external file
unsigned long m=123456789;//any huge number loaded from an external file
std::vector< std::bitset<16> > data;
std::string datastring=std::string(argv[1]);
string_to_bitset(data,datastring);//string_to_bitset and bitset_to_string also empties string and bitset vector, this is not the cause of the problem
for (std::bitset<16>& C : data)
{
C =std::bitset<16>(modpow(C.to_ulong(),k,m));//repeated squaring to solve C.to_ulong()^k%m
}
//and now the problem happens
for (std::bitset<16>& C : data) std::cout<<C<<".to_ulong()="<<C.to_ullong()<<std::endl;
std::cout<<std::endl;
bitset_to_string(data,datastring);
string_to_bitset(data,datastring);
//bitset_to_string(data,datastring);
for (std::bitset<16>& C : data) std::cout<<C<<".to_ulong()="<<C.to_ullong()<<std::endl;
std::cout<<std::endl;
return 0;
}
I am well aware that you now all are thinking that i am doing the modular power function wrong (which i guarantee that i am not), but what i am doing to make this happen doesn't actually matter, for my question was not: what is wrong in my program; my question was: why don't the identical bitsets (which prints identical binary 1's and 0's) convert to identical unsigned longs.
other edit: i must also point out that the first printet values of unsigned longs are "correct" in that they when used allow me to decrypt the bitset perfectly, whereas the values of unsigned longs printed afterwards is "wrong" in that it produces a completely wrong result.
The "11841744" value is correct in the lower 16 bits, but has some extra set bits above the 16th. This could be a bug in your STL implementation where to_long accesses bits past the 16 it should be using.
Or (from your comment above) you're adding more bits to the bitset than it can hold and you're experiencing Undefined Behavior.
I just came across an extremely strange problem. The function I have is simply:
int strStr(string haystack, string needle) {
for(int i=0; i<=(haystack.length()-needle.length()); i++){
cout<<"i "<<i<<endl;
}
return 0;
}
Then if I call strStr("", "a"), although haystack.length()-needle.length()=-1, this will not return 0, you can try it yourself...
This is because .length() (and .size()) return size_t, which is an unsigned int. You think you get a negative number, when in fact it underflows back to the maximum value for size_t (On my machine, this is 18446744073709551615). This means your for loop will loop through all the possible values of size_t, instead of just exiting immediately like you expect.
To get the result you want, you can explicitly convert the sizes to ints, rather than unsigned ints (See aslgs answer), although this may fail for strings with sufficient length (Enough to over/under flow a standard int)
Edit:
Two solutions from the comments below:
(Nir Friedman) Instead of using int as in aslg's answer, include the header and use an int64_t, which will avoid the problem mentioned above.
(rici) Turn your for loop into for(int i = 0;needle.length() + i <= haystack.length();i ++){, which avoid the problem all together by rearranging the equation to avoid the subtraction all together.
(haystack.length()-needle.length())
length returns a size_t, in other words an unsigned int. Given the size of your strings, 0 and 1 respectively, when you calculate the difference it underflows and becomes the maximum possible value for an unsigned int. (Which is approximately 4.2 billions for a storage of 4 bytes, but could be a different value)
i<=(haystack.length()-needle.length())
The indexer i is converted by the compiler into an unsigned int to match the type. So you're gonna have to wait until i is greater than the max possible value for an unsigned int. It's not going to stop.
Solution:
You have to convert the result of each method to int, like so,
i <= ( (int)haystack.length() - (int)needle.length() )
I am using C++ and I've heard and experienced that the maximum value that can be stored in a int
and a long are same.
But my problem is that I need to store a number that exceed the maximum value
of long variable. The size of double variable is pretty enough.
But the problem is using double variable
avoid me using the operator % which is necessary to code my function more easily and some times there
seems to be no other ways than using it.
So please would you kindly tell me a way to achieve my target?
It depends on the purpose. For a better answer, give us more context
Have a look at (unsigned) long long or GMP
You can use type long long intor unsigned long long int
To know the maximum value that an untegral type can contain you can use the following construction as for example
std::numeric_limits<long long>::max();
To use it you have to include header <limits>
So, you want to compute the modulo of large integers. It's 99% likely you're doing encryption, which is hard stuff. Your question kind of implies that maybe you should look for some off-the-shelf solution for your top-level problem (the encryption).
Anyway, the standard answer is otherwise to use a library for large-precision integers, such as GNU MP.
#include <cmath>
int main ()
{
double max_uint = 4294967295.0;
double max1 = max_uint + 2.0;
double max2 = (max1 + 1.0) * (max_uint + 1.0);
double f = fmod(max2,max1);
return 0;
}
max1 and max2 are both over unsigned int limit, and fmod returns correct max2 % max1 result, which is also over unsigned int limit: f == max_uint + 1.0.
Edit:
good hint from anatolyg: this method works only for integers up to 2^52. This is because mantissa of double has 52 bit, and every higher integer is representable only with precision loss. E.g. 2^80 could be == (2^80)+1 and == (2^80)+2 and so on. The higher the integers, the higher the inprecision, because densitiy of representable integers gets wider there.
But if you just need to have 20 extra bit compared to int with 32 bit, and have no other possibility to achieve this with an built-in integral type (with which the regular % will be faster I think), then you can use this...
first there's a difference between int and long type
but for To fix the your problem you can use
unsigned long long int
here is a list of some of the sizes you would expect in C++:
char : 1 byte
short : 2 bytes
int : 4 bytes
long : 4 bytes
long long : 8 bytes
float : 4 bytes
double : 8 bytes
I think this clearly explains why you are experiencing difficulties and gives you a hint on how to solve them
I have an app which is creating unique ids in the form of unsigned long ints. The app needs this precision.
However, I have to send these ids in a protocol that only allows for ints. The receiving application – of the protocol – does not need this precision. So my questions is: how can I convert an unsigned long int to an int, especially when the unsigned long int is larger than an int?
edit:
The protocol only supports int. I would be good to know how to avoid "roll-over problems"
The application sending the message needs to know the uniqueness for a long period of time, whereas the receiver needs to know the uniqueness only over a short period of time.
Here's one possible approach:
#include <climits>
unsigned long int uid = ...;
int abbreviated_uid = uid & INT_MAX;
If int is 32 bits, for example, this discards all but the low-order 31 bits of the UID. It will only yield non-negative values.
This loses information from the original uid, but you indicated that that's not a problem.
But your question is vague enough that it's hard to tell whether this will suit your purposes.
Boost has numeric_cast:
unsigned long l = ...;
int i = boost::numeric_cast<int>(l);
This will throw an exception if the conversion would overflow, which may or may not be what you want.
Keith Thompson's "& INT_MAX" is only necessary if you need to ensure that abbreviated_uid is non-negative. If that's not an issue, and you can tolerate negative IDs, then a simple cast (C-style or static_cast()) should suffice, with the benefit that if sizeof(unsigned long int)==sizeof(int), then the binary representation will be the same on both ends (and if you cast it back to unsigned long int on the receiving end it will be the same value as on the sending end).
Does the receiver send responses back to the sender regarding the IDs, and does the original sender (now the receiver of the response) need to match this up with the original unsigned long int ID? If so, you'll need some additional logic to match up the response with the original ID. If so, post an edit indicating such requirement and I (or others) can suggest ways of addressing that issue. One possible solution to that issue would be to break up the ID into multiple int pieces and reconstruct it into the exact same unsigned long int value on the other end. If you need help with that, I or someone else can help with that.
As you know, one cannot in theory safely convert an unsigned long int to an int in the general case. However, one can indeed do so in many practical cases of interest, in which the integer is not too large.
I would probably define and use this:
struct Exc_out_of_range {};
int make_int(const unsigned long int a) {
const int n = static_cast<int>(a);
const unsigned long int a2 = static_cast<unsigned long int>(n);
if (a2 != a) throw Exc_out_of_range();
return n;
}
An equivalent solution using the <limits> header naturally is possible, but I don't know that it is any better than the above. (If the code is in a time-critical loop and portability is not a factor, then you could code it in assembly, testing the bit or bits of interest directly, but except as an exercise in assembly language this would be a bother.)
Regarding performance, it is worth noting that -- unless your compiler is very old -- the throw imposes no runtime burden unless used.
#GManNickG adds the advice to inherit from std::exception. I personally don't have a strong feeling about this, but the advice is well founded and appreciated, and I see little reason not to follow it. You can read more about such inheritance here.
I came along this, since I had to have a solution for converting larger integer types to smaller types, even when potentially loosing information.
I came up with a pretty neat solution using templates:
template<typename Tout, typename Tin>
Tout toInt(Tin in)
{
Tout retVal = 0;
if (in > 0)
retVal = static_cast<Tout>(in & std::numeric_limits<Tout>::max());
else if (in < 0)
retVal = static_cast<Tout>(in | std::numeric_limits<Tout>::min());
return retVal;
}
You can try to use std::stringstream and atoi():
#include <sstream>
#include <stdlib.h>
unsigned long int a = ...;
std::stringstream ss;
ss << a;
std::string str = ss.str();
int i = atoi(str.c_str());
So, simple procedure, calculate a factorial number. Code is as follows.
int calcFactorial(int num)
{
int total = 1;
if (num == 0)
{
return 0;
}
for (num; num > 0; num--)
{
total *= num;
}
return total;
}
Now, this works fine and dandy (There are certainly quicker and more elegant solutions, but this works for me) for most numbers. However when inputting larger numbers such as 250 it, to put it bluntly, craps out. Now, the first couple factorial "bits" for 250 are { 250, 62250, 15126750, 15438000, 3813186000 } for reference.
My code spits out { 250, 62250, 15126750, 15438000, -481781296 } which is obviously off. My first suspicion was perhaps that I had breached the limit of a 32 bit integer, but given that 2^32 is 4294967296 I don't think so. The only thing I can think of is perhaps that it breaches a signed 32-bit limit, but shouldn't it be able to think about this sort of thing? If being signed is the problem I can solve this by making the integer unsigned but this would only be a temporary solution, as the next iteration yields 938043756000 which is far above the 4294967296 limit.
So, is my problem the signed limit? If so, what can I do to calculate large numbers (Though I've a "LargeInteger" class I made a while ago that may be suited!) without coming across this problem again?
2^32 doesn't give you the limit for signed integers.
The signed integer limit is actually 2147483647 (if you're developing on Windows using the MS tools, other toolsuites/platforms would have their own limits that are probably similar).
You'll need a C++ large number library like this one.
In addition to the other comments, I'd like to point out two serious bugs in your code.
You have no guard against negative numbers.
The factorial of zero is one, not zero.
Yes, you hit the limit. An int in C++ is, by definition, signed. And, uh, no, C++ does not think, ever. If you tell it to do a thing, it will do it, even if it is obviously wrong.
Consider using a large number library. There are many of them around for C++.
If you don't specify signed or unsigned, the default is signed. You can modify this using a command line switch on your compiler.
Just remember, C (or C++) is a very low-level language and does precisely what you tell it to do. If you tell it to store this value in a signed int, that's what it will do. You as the programmer have to figure out when that's a problem. It's not the language's job.
My Windows calculator (Start-Run-Calc) tells me that
hex (3813186000) = E34899D0
hex (-481781296) = FFFFFFFFE34899D0
So yes, the cause is the signed limit. Since factorials can by definition only be positive, and can only be calculated for positive numbers, both the argument and the return value should be unsigned numbers anyway. (I know that everybody uses int i = 0 in for loops, so do I. But that left aside, we should use always unsigned variables if the value can not be negative, it's good practice IMO).
The general problem with factorials is, that they can easily generate very large numbers. You could use a float, thus sacrificing precision but avoiding the integer overflow problem.
Oh wait, according to what I wrote above, you should make that an unsigned float ;-)
If i remember well:
unsigned short int = max 65535
unsigned int = max 4294967295
unsigned long = max 4294967295
unsigned long long (Int64 )= max 18446744073709551615
Edited source:
Int/Long Max values
Modern Compiler Variable