On the DMOJ online judge, used for competitive programming, one of the tips for a faster execution time (C++) was to add this macro on top if the problem only requires unsigned integral data types to be read.
How does this work and what are the advantages and disadvantages of using this?
#define scan(x) do{while((x=getchar())<'0'); for(x-='0'; '0'<=(_=getchar()); x= (x<<3)+(x<<1)+_-'0');}while(0)
char _;
Source: https://dmoj.ca/tips/#cpp-io
First let's reformat this a bit:
#define scan(dest) \
do { \
while((dest = getchar()) < '0'); \
for(dest -= '0'; '0' <= (temp = getchar()); dest = (dest<<3) + (dest<<1) + temp - '0');
} while(0)
char temp;
First, the outer do{...}while(0) is just to ensure proper parsing of the macro. See here for more info.
Next, while((dest = getchar()) < '0'); - this might as well just be dest = getchar() but it does some additional work by discarding any characters below (but not above) the '0' character. This can be useful since whitespace characters are all "less than" the 0 character in ascii.
The meat of the macro is the for loop. First, the initialization expression dest -= '0', sets dest to the actual integer value represented by the character by taking advantage of the fact that the 0-9 characters in ascii encoding are adjacent and sequential. So if the first character were '5' (value 53), subtracting '0' (value 48) results in the integer value 5.
The condition statement, '0' <= (temp = getchar()), does several things - first, it gets the next character and assigns it to temp, then checks to see if it is greater than or equal to the '0' character (so will fail on whitespace).
As long as the character is a numeral (or at least equal to '0'), the increment expression is evaluated. dest = (dest<<3) + (dest<<1) + temp - '0' - the temp - '0' expression does the same adjustment as before from ascii to numeric value, and the shifts and adds are just an obscure way of multiplying by 10. In other words, it is equivalent to temp -= '0'; dest = dest * 10 + temp;. Multiplying by 10 and adding the next digit's value is what builds the final value.
Finally, char temp; declares the temporary character storage for use in subsequent macro invocations in the program.
As far as why you'd use it, I'm skeptical that it would provide any measurable benefit compared to something like scanf or atoi.
What this does is it reads a number character by character with a bunch of premature optimizations. See #MooseBoys' answer for more details.
About its advantages and disadvantages, I don't see any benefit to using this at all. Stuff like (x<<3)+(x<<1) which is equal to x * 10 are optimizations that should be done by the compiler, not you.
As far as I know, cin and cout is fast enough for all competitive programming purposes especially if you disable syncing with stdio. I've been using it since I started competitive programming and never had any problems.
Also, my own testing shows cin and cout isn't slower than C I/O, despite the popular belief. You can try testing the performance of this yourself. Make sure you have optimizations enabled.
Apparently, some competitive programmers focus way too much on stuff like fast I/O when their algorithm is the thing that matters most.
Related
Is there any good way to optimize this function in terms of execution time? My final goal is to parse a long string composed of several integers (thousands of integer per line, and thousands of lines). This was my initial solution.
int64_t get_next_int(char *newLine) {
char *token=strtok(newLine, " ");
if( token == NULL ) {
exit(0);
}
return atoll(token);
}
More details: I need the "state" based implementation of strtok, so the padding implemented by strtok should exist in the final string. Atoll does not need of any kind of verification.
Target system: Intel x86_64 (Xeon series)
Related topics:
atoi optimization: C++ most efficient way to convert string to int (faster than atoi)
First off: I find optimizing string conversion routines in signal processing chains most of the time to be totally in vain. The speed at which your system loads data in string form (which will probably happen from some mass storage, where it was put by something that didn't care about performance, since it wouldn't have chosen a string format in the first place, otherwise), and if you compare read speeds of all but clusters of SSDs attached via PCIe with how fast atoll is, you'll notice that you're losing a negligible amount of time on inefficient conversion. If you pipeline loading parts of that string with conversion, the time spent waiting for storage will not even be remotely filled up with converting, so even without any algorithmic optimization, pipelining/multi-threading will eliminate practically all time spent on conversion.
I'm going to go ahead and assume your integer-containing string is sufficiently large. Like, tens of millions of integers. Otherwise, all optimization might be pretty premature, considering there's little to complain about std::iostream performance.
Now, the trick is that no performance optimization can be done once the performance of your conversion routine hits the memory bandwidth barrier. To push that barrier as far as possible, it's crucial to optimize usage of CPU caches – hence, doing linear access and shuffling memory as little as possible is crucial here. Also, if you care for speed, you don't want to call a function every time you need to convert a few-digit number – the call overhead (saving/restoring stack, jumping back and forth) will be significant. So if you're after performance, you'll do the conversion of the whole string at once, and then just access the resulting integer array.
So you'd have roughly something like, on a modern, SSE4.2 capable x86 processor
Outer loop, jumps in steps of 16:
load 128 bit of input string into 128 bit SIMD register
run something like __mm_cmpestri to find indices of delimiters and \0 terminator in all these 16 bytes at once
inner loop over the found indices
Use SSE copy/shift/immediate instructions to isolate substrings; fill the others with 0
prepend saved "last characters" from previous iteration (if any – should only be the case for first inner loop iteration per outer loop iteration)
subtract 0 from each of the digits, again using SSE instructions to do up to 16 subtractions with a single instruction (_mm_sub_epi8)
convert the eight 16bit subwords to eight 128 bit words containing two packed 64bit integers each (one instruction per 16bit, _mm_cvtepi8_epi64, I think)
initialize a __mm128 register with [10^15 10^14], let's call it powers
loop over pairs dual-64bit words: (each step should be one SSE instruction)
multiply first with powers
divide powers by [100 100]
multiply second with powers
add results to dual-64bit accumulator
sum the two values in accumulator
store the result to integer array
I'd rather use something along the lines of a std::istringstream:
int64_t get_next_int(std::istringstream& line) {
int64_t token;
if(!(line >> token))
exit(0);
return token;
}
std::istringstream line(newLine);
int64_t i = get_next_int(line);
strtok() has well known drawbacks, and you don't want to use it at all.
What about
int n= 0;
// Find the token
for ( ; *newline == ' '; newline++)
;
if (*newline == 0)
// Not found
exit(0);
// Scan and convert the token
for ( ; unsigned(*newline - '0') < 10; newline++)
n= 10 * n + *newline - '0';
return n;
AFA I get from your code at first splitting it will return. It seems at first parsing(before space character) it will returun 0 if it is non-number entry or combined alphabetic and number in such a way that alphabetic at beginning . If combined and number at beginning, it will return the number merely. Namely, you just need a string for the conversion. So you don't need tokenizing just check the string is null or not. You can change return type as well. Because, if you need a type with _exactly_ 64 bits, use (u)int64_t, if you need _at least_ 64 bits, (unsigned) long long is perfectly fine, as would be (u)int_least64_t. I think your code is little gobbledygook. Show what you exactly want without simplification.
/*
* ascii-to-longlong conversion
*
* no error checking; assumes decimal digits
*
* efficient conversion:
* start with value = 0
* then, starting at first character, repeat the following
* until the end of the string:
*
* new value = (10 * (old value)) + decimal value of next character
*
*/
long long my_atoll(char *instr)
{
if(str[0] == '\0')
return -1;
long long retval;
int i;
retval = 0;
for (; *instr; instr++) {
retval = 10*retval + (*instr - '0');
}
return retval;
}
I have this code which handles Strings like "19485" or "10011010" or "AF294EC"...
long long toDecimalFromString(string value, Format format){
long long dec = 0;
for (int i = value.size() - 1; i >= 0; i--) {
char ch = value.at(i);
int val = int(ch);
if (ch >= '0' && ch <= '9') {
val = val - 48;
} else {
val = val - 55;
}
dec = dec + val * (long long)(pow((int) format, (value.size() - 1) - i));
}
return dec;
}
this code works for all values which are not in 2's complement.
If I pass a hex-string which is supposed to be a negativ number in decimal I don't get the right result.
If you don't handle the minus sign, it won't handle itself.
Check for it, and memorize the fact you've seen it. Then, at
the end, if you'd seen a '-' as the first character, negate
the results.
Other points:
You don't need (nor want) to use pow: it's just
results = format * results + digit each time through.
You do need to validate your input, making sure that the digit
you obtain is legal in the base (and that you don't have any
other odd characters).
You also need to check for overflow.
You should use isdigit and isalpha (or islower and
isupper) for you character checking.
You should use e.g. val -= '0' (and not 48) for your
conversion from character code to digit value.
You should use [i], and not at(i), to read the individual
characters. Compile with the usual development options, and
you'll get a crash, rather than an exception, in case of error.
But you should probably use iterators, and not an index, to go
through the string. It's far more idiomatic.
You should almost certainly accept both upper and lower case
for the alphas, and probably skip leading white space as well.
Technically, there's also no guarantee that the alphabetic
characters are in order and adjacent. In practice, I think you
can count on it for characters in the range 'A'-'F' (or
'a'-'f', but the surest way of converting character to digit
is to use table lookup.
You need to know whether the specified number is to be interpreted as signed or unsigned (in other words, is "ffffffff" -1 or 4294967295?).
If signed, then to detect a negative number test the most-significant bit. If ms bit is set, then after converting the number as you do (generating an unsigned value) take the 1's complement (bitwise negate it then add 1).
Note: to test the ms bit you can't just test the leading character. If the number is signed, is "ff" supposed to be -1 or 255?. You need to know the size of the expected result (if 32 bits and signed, then "ffffffff" is negative, or -1. But if 64 bits and signed, "ffffffff' is positive, or 4294967295). Thus there is more than one right answer for the example "ffffffff".
Instead of testing ms bit you could just test if unsigned result is greater than the "midway point" of the result range (for example 2^31 -1 for 32-bit numbers).
I was studying hash-based sort and I found that using prime numbers in a hash function is considered a good idea, because multiplying each character of the key by a prime number and adding the results up would produce a unique value (because primes are unique) and a prime number like 31 would produce better distribution of keys.
key(s)=s[0]*31(len–1)+s[1]*31(len–2)+ ... +s[len–1]
Sample code:
public int hashCode( )
{
int h = hash;
if (h == 0)
{
for (int i = 0; i < chars.length; i++)
{
h = MULT*h + chars[i];
}
hash = h;
}
return h;
}
I would like to understand why the use of even numbers for multiplying each character is a bad idea in the context of this explanation below (found on another forum; it sounds like a good explanation, but I'm failing to grasp it). If the reasoning below is not valid, I would appreciate a simpler explanation.
Suppose MULT were 26, and consider
hashing a hundred-character string.
How much influence does the string's
first character have on the final
value of 'h'? The first character's value
will have been multiplied by MULT 99
times, so if the arithmetic were done
in infinite precision the value would
consist of some jumble of bits
followed by 99 low-order zero bits --
each time you multiply by MULT you
introduce another low-order zero,
right? The computer's finite
arithmetic just chops away all the
excess high-order bits, so the first
character's actual contribution to 'h'
is ... precisely zero! The 'h' value
depends only on the rightmost 32
string characters (assuming a 32-bit
int), and even then things are not
wonderful: the first of those final 32
bytes influences only the leftmost bit
of `h' and has no effect on the
remaining 31. Clearly, an even-valued
MULT is a poor idea.
I think it's easier to see if you use 2 instead of 26. They both have the same effect on the lowest-order bit of h. Consider a 33 character string of some character c followed by 32 zero bytes (for illustrative purposes). Since the string isn't wholly null you'd hope the hash would be nonzero.
For the first character, your computed hash h is equal to c[0]. For the second character, you take h * 2 + c[1]. So now h is 2*c[0]. For the third character h is now h*2 + c[2] which works out to 4*c[0]. Repeat this 30 more times, and you can see that the multiplier uses more bits than are available in your destination, meaning effectively c[0] had no impact on the final hash at all.
The end math works out exactly the same with a different multiplier like 26, except that the intermediate hashes will modulo 2^32 every so often during the process. Since 26 is even it still adds one 0 bit to the low end each iteration.
This hash can be described like this (here ^ is exponentiation, not xor).
hash(string) = sum_over_i(s[i] * MULT^(strlen(s) - i - 1)) % (2^32).
Look at the contribution of the first character. It's
(s[0] * MULT^(strlen(s) - 1)) % (2^32).
If the string is long enough (strlen(s) > 32) then this is zero.
Other people have posted the answer -- if you use an even multiple, then only the last characters in the string matter for computing the hash, as the early character's influence will have shifted out of the register.
Now lets consider what happens when you use a multiplier like 31. Well, 31 is 32-1 or 2^5 - 1. So when you use that, your final hash value will be:
\sum{c_i 2^{5(len-i)} - \sum{c_i}
unfortunately stackoverflow doesn't understad TeX math notation, so the above is hard to understand, but its two summations over the characters in the string, where the first one shifts each character by 5 bits for each subsequent character in the string. So using a 32-bit machine, that will shift off the top for all except the last seven characters of the string.
The upshot of this is that using a multiplier of 31 means that while characters other than the last seven have an effect on the string, its completely independent of their order. If you take two strings that have the same last 7 characters, for which the other characters also the same but in a different order, you'll get the same hash for both. You'll also get the same hash for things like "az" and "by" other than in the last 7 chars.
So using a prime multiplier, while much better than an even multiplier, is still not very good. Better is to use a rotate instruction, which shifts the bits back into the bottom when they shift out the top. Something like:
public unisgned hashCode(string chars)
{
unsigned h = 0;
for (int i = 0; i < chars.length; i++) {
h = (h<<5) + (h>>27); // ROL by 5, assuming 32 bits here
h += chars[i];
}
return h;
}
Of course, this depends on your compiler being smart enough to recognize the idiom for a rotate instruction and turn it into a single instruction for maximum efficiency.
This also still has the problem that swapping 32-character blocks in the string will give the same hash value, so its far from strong, but probably adequate for most non-cryptographic purposes
would produce a unique value
Stop right there. Hashes are not unique. A good hash algorithm will minimize collisions, but the pigeonhole principle assures us that perfectly avoiding collisions is not possible (for any datatype with non-trivial information content).
How do you count unicode characters in a UTF-8 file in C++? Perhaps if someone would be so kind to show me a "stand alone" method, or alternatively, a short example using http://icu-project.org/index.html.
EDIT: An important caveat is that I need to build counts of each character, so it's not like I'm counting the total number of characters, but the number of occurrences of a set of characters.
In UTF-8, a non-leading byte always has the top two bits set to 10, so just ignore all such bytes. If you don't mind extra complexity, you can do more than that (to skip ahead across non-leading bytes based on the bit pattern of a leading byte) but in reality, it's unlikely to make much difference except for short strings (because you'll typically be close to the memory bandwidth anyway).
Edit: I originally mis-read your question as simply asking about how to count the length of a string of characters encoded in UTF-8. If you want to count character frequencies, you probably want to convert those to UTF-32/UCS-4, then you'll need some sort of sparse array to count the frequencies.
The hard part of this deals with counting code points vs. characters. For example, consider the character "À" -- the "Latin capital letter A with grave". There are at least two different ways to produce this character. You can use codepoint U+00C0, which encodes the whole thing in a single code point, or you can use codepoint U+0041 (Latin capital letter A) followed by codepoint U+0300 (Combining grave accent).
Normalizing (with respect to Unicode) means turning all such characters into the the same form. You can either combine them all into a single codepoint, or separate them all into separate code points. For your purposes, it's probably easier to combine them into into a single code point whenever possible. Writing this on your own probably isn't very practical -- I'd use the normalizer API from the ICU project.
If you know the UTF-8 sequence is well formed, it's quite easy. Count up each byte that starts with a zero bit or two one bits. The first condition will chatch every code point that is represented by a single byte, the second will catch the first byte of each multi-byte sequence.
while (*p != 0)
{
if ((*p & 0x80) == 0 || (*p & 0xc0) == 0xc0)
++count;
++p;
}
Or alternatively as remarked in the comments, you can simply skip every byte that's a continuation:
while (*p != 0)
{
if ((*p & 0xc0) != 0x80)
++count;
++p;
}
Or if you want to be super clever and make it a 2-liner:
for (p; *p != 0; ++p)
count += ((*p & 0xc0) != 0x80);
The Wikipedia page for UTF-8 clearly shows the patterns.
A discussion with a full routine written in C++ is at http://www.daemonology.net/blog/2008-06-05-faster-utf8-strlen.html
I know, it's late for this thread but, it could help
with ICU stuff, I did it like this:
string TheString = "blabla" ;
UnicodeString uStr = UnicodeString::fromUTF8( theString.c_str() ) ;
cout << "length = " << uStr.length( ) << endl ;
I wouldn't consider this a language-centric question. The UTF-8 format is fairly simple; decoding it from a file should be only a few lines of code in any language.
open file
until eof
if file.readchar & 0xC0 != 0x80
increment count
close file
I was wonder if there is a simpler (single) way to calculate the remaining space in a circular buffer than this?
int remaining = (end > start)
? end-start
: bufferSize - start + end;
If you're worried about poorly-predicted conditionals slowing down your CPU's pipeline, you could use this:
int remaining = (end - start) + (-((int) (end <= start)) & bufferSize);
But that's likely to be premature optimisation (unless you have really identified this as a hotspot). Stick with your current technique, which is much more readable.
Hmmm....
int remaining = (end - start + bufferSize) % bufferSize;
13 tokens, do I win?
If your circular buffer size is a power of two, you can do even better by having start and end represent positions in a virtual stream instead of indices into the circular buffer's storage. Assuming that start and end are unsigned, the above becomes:
int remaining= bufferSize - (end - start);
Actually getting elements out of the buffer is a little more complicated, but the overhead is usually small enough with a power of 2 sized circular buffer (just masking with bufferSize - 1) to make all the other logic of your circular buffer much simpler and cleaner. Plus, you get to use all the elements since you no longer worry about end==start!
According to the C++ Standard, section 5.6, paragraph 4:
The binary / operator yields the quotient, and the binary % operator yields the remainder from the division of the first expression by the second. If the second operand of / or % is zero the behavior is undefined; otherwise (a/b)*b + a%b is equal to a. If both operands are nonnegative then the remainder is nonnegative; if not, the sign of the remainder is implementation-defined.
A footnote suggests that rounding the quotient towards zero is preferred, which would leave the remainder negative.
Therefore, the (end - start) % bufferSize approaches do not work reliably. C++ does not have modular arithmetic (except in the sense offered by unsigned integral types).
The approach recommended by j_random_hacker is different, and looks good, but I don't know that it's any actual improvement in simplicity or speed. The conversion of a boolean to an int is ingenious, but requires mental parsing, and that fiddling could be more expensive than the use of ?:, depending on compiler and machine.
I think you've got the simplest and best version right there, and I wouldn't change it.
Lose the conditional:
int remaining = (end + bufferSize - start - 1) % bufferSize + 1
Edit: The -1 and +1 are for the case when end == start. In that case, this method will assume the buffer is empty. Depending on the specific implementation of your buffer, you may need to adjust these to avoid an off-by-1 situation.
Older thread I know but thought this might be helpful.
Not sure how fast this implements in C++ but in rtl we do this if size is n^2
remaining = (end[n] ^ start[n])
? start[n-1:0] - end[n-1:0]
: end[n-1:0] - start[n-1:0];
or
remaining = if (end[n] ^ start[n]) {
start[n-1:0] - end[n-1:0]
} else {
end[n-1:0] - start[n-1:0]
};