I wrote a program that simulates the Game of Life. Basically the world is implemented by a bi-dimensional std::vector of bool. If the bool is true the cell is alive, if is false the cell is dead. The output of the program is the system at each time step, completely in ASCII code:
[ ][0][ ]
[ ][ ][0]
[0][0][0]
The problem is that the program runs obviously fast and each time step is printed too quickly: I can't see how the system evolves. Is there some trick to slow down the output (or directly the program)?
EDIT: I'm on Mac OS X 10.7. My compiler is GCC 4.7.
You can use standard C++ (C++11):
#include <thread>
#include <chrono>
#include <iostream>
int main() {
while (true) {
// draw loop
std::this_thread::sleep_for(std::chrono::milliseconds(20));
}
}
Alternatively, you could use a library that lets you specify an interval at which to call your draw function. OS X has Grand Central Dispatch (a.k.a. libdispatch). Using GCD you could create a dispatch timer source that calls your draw function with a specified frequency.
dispatch_source_t timer = dispatch_source_create(
DISPATCH_SOURCE_TYPE_TIMER, 0, 0, dispatch_get_main_queue());
dispatch_source_set_timer(timer, DISPATCH_TIME_NOW,
duration_cast<nanoseconds>(milliseconds(20)).count(),
duration_cast<nanoseconds>(milliseconds( 5)).count());
// the API is defined to use nanoseconds, but I'd rather work in milliseconds
// so I use std::chrono to do the conversion above
dispatch_source_set_event_handler(timer,
[]{ your_draw_function(); });
// I'm not sure if GCC 4.7 actually supports converting C++11 lambdas to
// Apple's C blocks, or if it even supports blocks. Clang supports this.
dispatch_resume(timer);
dispatch_main();
libdispatch reference
Whatever system you are using, it will have some kind of sleep function that you can call that will suspend your program for a specified period of time. You do not specify what OS you use, so I cant give exact details, but it sounds like the approach you are looking for.
If you call sleep for a certain length of time after drawing each update of the image, your program will sleep for that time before resuming and drawing the next update. This should give you chance to actually see the changes
If you want higher resolution time sleep you can look at nanosleep and usleep
1.You can use
int tmp; std::cin >> tmp;
and program will ask you before to go further.
2.You can use loop over some calculations. Like
double Tmp[1000000];
for( int i = 0; i < 1000000; i++ )
Tmp[i] = i;
for( int i = 0; i < 1000000; i++ )
Tmp[i] = sin(sin(sin(Tmp[i])));
3.You can check which delay-functions you have available for you. Example is "Sleep(nSeconds)" here
4.You can save and verify you system time. Like:
while (time() < time_end){};
Related
I have some extremely simple C++ code that I was certain would run 3x faster with multithreading but somehow only runs 3% faster (or less) on both GCC and MSVC on Windows 10.
There are no mutex locks and no shared resources. And I can't see how false sharing or cache thrashing could be at play since each thread only modifies a distinct segment of the array, which has over a billion int values. I realize there are many questions on SO like this but I haven't found any that seem to solve this particular mystery.
One hint might be that moving the array initialization into the loop of the add() function does make the function 3x faster when multithreaded vs single-threaded (~885ms vs ~2650ms).
Note that only the add() function is being timed and takes ~600ms on my machine. My machine has 4 hyperthreaded cores, so I'm running the code with threadCount set to 8 and then to 1.
Any idea what might be going on? Is there any way to turn off (when appropriate) the features in processors that cause things like false sharing (and possibly like what we're seeing here) to happen?
#include <chrono>
#include <iostream>
#include <thread>
void startTimer();
void stopTimer();
void add(int* x, int* y, int threadIdx);
namespace ch = std::chrono;
auto start = ch::steady_clock::now();
const int threadCount = 8;
int itemCount = 1u << 30u; // ~1B items
int itemsPerThread = itemCount / threadCount;
int main() {
int* x = new int[itemCount];
int* y = new int[itemCount];
// Initialize arrays
for (int i = 0; i < itemCount; i++) {
x[i] = 1;
y[i] = 2;
}
// Call add() on multiple threads
std::thread threads[threadCount];
startTimer();
for (int i = 0; i < threadCount; ++i) {
threads[i] = std::thread(add, x, y, i);
}
for (auto& thread : threads) {
thread.join();
}
stopTimer();
// Verify results
for (int i = 0; i < itemCount; ++i) {
if (y[i] != 3) {
std::cout << "Error!";
}
}
delete[] x;
delete[] y;
}
void add(int* x, int* y, int threadIdx) {
int firstIdx = threadIdx * itemsPerThread;
int lastIdx = firstIdx + itemsPerThread - 1;
for (int i = firstIdx; i <= lastIdx; ++i) {
y[i] = x[i] + y[i];
}
}
void startTimer() {
start = ch::steady_clock::now();
}
void stopTimer() {
auto end = ch::steady_clock::now();
auto duration = ch::duration_cast<ch::milliseconds>(end - start).count();
std::cout << duration << " ms\n";
}
You may be simply hitting the memory transfer rate of your machine, you are doing 8GB of reads and 4GB of writes.
On my machine your test completes in about 500ms which is 24GB/s (which is similar to the results given by a memory bandwidth tester).
As you hit each memory address with a single read and a single write the caches aren't much use as you aren't reusing memory.
Your problem is not the processor. You ran against the RAM read and write latency. As your cache is able to hold some megabytes of data and you exceed this storage by far. Multi-threading is so long useful, as long as you can shovel data into your processor. The cache in your processor is incredibly fast, compared to your RAM. As you exceed your cache storage, this results in a RAM latency test.
If you want to see the advantages of multi-threading, you have to choose data sizes in range of your cache size.
EDIT
Another thing to do, would be to create a higher workload for the cores, so the storage latency goes unrecognized.
sidenote: keep in mind, your core has several execution units. one or more for each type of operation - integer, float, shift and so on. That means, one core can execute more then one command per step. In particular one operation per execution unit. You can keep the data size of the test data and do more stuff with it - be creative =) Filling the queue with integer operations only, will give you an advantage in multi-threading. If you can variate in your code, when and where you do different operations, do it, this also will show impact on the speedup. Or avoid it, if you want to see a nice speedup on multi-threading.
to avoid any kind of optimization, you should use randomized test data. so neither the compiler nor the processor itself can predict what the outcome of your operation is.
Also avoid doing branches like if and while. Each decision the processor has to predict and execute, will slow you down and alter the result. With branch-prediction, you will never get a deterministic result. Later in a "real" program, be my guest and do what you want. But when you want to explore the multi-threading world, this could lead you to wrong conclusions.
BTW
Please use a delete for every new you use, to avoid memory leaks. AND even better, avoid plain pointers, new and delete. You should use RAII. I advice to use std::array or std::vector, simple a STL-container. This will save you tons of debugging time and headaches.
Speedup from parallelization is limited by the portion of the task that remains serial. This is called Amdahl's law. In your case, a decent amount of that serial time is spent initializing the array.
Are you compiling the code with -O3? If so, the compiler might be able to unroll and/or vectorize some of the loops. The loop strides are predictable, so hardware prefetching might help as well.
You might want to also explore if using all 8 hyperthreads are useful or if it's better to run 1 thread per core (I am going to guess that since the problem is memory-bound, you'll likely benefit from all 8 hyperthreads).
Nevertheless, you'll still be limited by memory bandwidth. Take a look at the roofline model. It'll help you reason about the performance and what speedup you can theoretically expect. In your case, you're hitting the memory bandwidth wall that effectively limits the ops/sec achievable by your hardware.
I try to find out the exact execution time for "for loop" with 2e6 iteritions.
The following code is ran within 10ms after compiled from g++ for c++ file.
People told me that is optimization code automatically done by C++ compiler so you
get meaningless execution time. In other words,since there is no any output call
such as printf or cout<< for variable a,b,c so the optimized code will do nothing for
that "for loop" that is why I got really short program execution time in 10ms. Right ? Why they said the time result is meaningless for "for loop".
Please advise
int main(){
int max = 2e6;
int a,b,c;
// CODE YOU WANT TO TIME
int start = getMilliCount();
for (int i = 0; i < max; i++) {
a = 1234 + 5678 + i;
b = 1234 * 5678 + i;
c=1234/2+i;
}
int milliSecondsElapsed = getMilliSpan(start);
printf("\n\nElapsed time = %u milliseconds %d\n", milliSecondsElapsed,max);
return 0;
}
The run-time is absolutely not meaningless. It proves at least one important point: the optimizer is smarter than given credit, and it's able to deduce the loop has no side effects, so it cuts it out.
So even if the profile result only proves this one thing, it does have meaning.
To address what you want:
I try to find out the exact execution time for "for loop" with 2e8 iteritions.
The execution time of a for loop with 2e8 can be 0 if there are no observable effects. Or very large if they are. That's why you usually profile actual code using dedicated tools.
The compiler can change the program in any way that does not change anything observable, i.e. all outputs etc. must be exactly the same as the outputs of the un-optimized code. In your example, the compiler may notice that the values of a, b and c after the loop are never used and the loop does nothing else, so it might as well remove the loop from your program.
It could also observe that the value of the variables depend directly on max and just skip all but the last iteration.
In both cases, the result would not depend on max. It still is not meaningless, it just means that you underestimate your compiler.
Edit:
I tested this scenario with g++ -O2, the loop gets completely removed and does not run at all.
Given a while loop and the function ordering as follows:
int k=0;
int total=100;
while(k<total){
doSomething();
if(approx. t milliseconds elapsed) { measure(); }
++k;
}
I want to perform 'measure' every t-th milliseconds. However, since 'doSomething' can be close to the t-th millisecond from the last execution, it is acceptable to perform the measure after approximately t milliseconds elapsed from the last measure.
My question is: how could this be achieved?
One solution would be to set timer to zero, and measure it after every 'doSomething'. When it is withing the acceptable range, I perform measures, and reset. However, I'm not which c++ function I should use for such a task. As I can see, there are certain functions, but the debate on which one is the most appropriate is outside of my understanding. Note that some of the functions actually take into account the time taken by some other processes, but I want my timer to only measure the time of the execution of my c++ code (I hope that is clear). Another thing is the resolution of the measurements, as pointed out below. Suppose the medium option of those suggested.
High resolution timing is platform specific, and you have not specified in the question. The standard library clock() function returns a count that increments at CLOCKS_PER_SEC per second. On some platforms this may be fast enough to give you the resolution you need but you should check your system's tick rate since it is implementation defined. However if you find it is high enough then:
#define SAMPLE_PERIOD_MS 100
#define SAMPLE_PERIOD_TICKS ((CLOCKS_PER_SEC * SAMPLE_PERIOD_MS) / 1000)
int k=0;
int total=100;
clock_t measure_time = clock() + SAMPLE_PERIOD_TICKS ;
while(k<total)
{
doSomething();
if( clock() - measure_time > 0 )
{
measure();
measure_time += SAMPLE_PERIOD_TICKS ;
++k;
}
}
You might replace clock() with some other high-resolution clock source if necessary.
However note a couple of issues. This method is a "busy-loop"; unless either doSomething() or measure() yield the CPU, the process will take all the cpu cycles it can. If this is the only code running on a target, that may not matter. On the other hand is this is running on a general purpose OS such as Windows or Linux which are not real-time, the process may be pre-empted by other processes, and this may affect the accuracy of the sampling periodicity. If you need accurate timing use of an RTOS and performing doSomething() and measure() in separate threads would be better. Even in a GPOS that would be better. For example a general pattern (using a made-up API in teh absence of any specification) would be:
int main()
{
StartThread( measure_thread, HIGH_PRIORITY ) ;
for(;;)
{
doSomething() ;
}
}
void measure_thread()
{
for(;;)
{
measure() ;
sleep( SAMPLE_PERIOD_MS ) ;
}
}
The code for measure_thread() is only accurate if measure() takes a negligible time to run. If it takes significant time you may need to account for that. If it is non-deterministic, you may even have to measure its execution time in order to subtract it the sleep period.
I don't know if this is true, but when I was reading FAQ on one of the problem providing sites, I found something, that poke my attention:
Check your input/output methods. In C++, using cin and cout is too slow. Use these, and you will guarantee not being able to solve any problem with a decent amount of input or output. Use printf and scanf instead.
Can someone please clarify this? Is really using scanf() in C++ programs faster than using cin >> something ? If yes, that is it a good practice to use it in C++ programs? I thought that it was C specific, though I am just learning C++...
Here's a quick test of a simple case: a program to read a list of numbers from standard input and XOR all of the numbers.
iostream version:
#include <iostream>
int main(int argc, char **argv) {
int parity = 0;
int x;
while (std::cin >> x)
parity ^= x;
std::cout << parity << std::endl;
return 0;
}
scanf version:
#include <stdio.h>
int main(int argc, char **argv) {
int parity = 0;
int x;
while (1 == scanf("%d", &x))
parity ^= x;
printf("%d\n", parity);
return 0;
}
Results
Using a third program, I generated a text file containing 33,280,276 random numbers. The execution times are:
iostream version: 24.3 seconds
scanf version: 6.4 seconds
Changing the compiler's optimization settings didn't seem to change the results much at all.
Thus: there really is a speed difference.
EDIT: User clyfish points out below that the speed difference is largely due to the iostream I/O functions maintaining synchronization with the C I/O functions. We can turn this off with a call to std::ios::sync_with_stdio(false);:
#include <iostream>
int main(int argc, char **argv) {
int parity = 0;
int x;
std::ios::sync_with_stdio(false);
while (std::cin >> x)
parity ^= x;
std::cout << parity << std::endl;
return 0;
}
New results:
iostream version: 21.9 seconds
scanf version: 6.8 seconds
iostream with sync_with_stdio(false): 5.5 seconds
C++ iostream wins! It turns out that this internal syncing / flushing is what normally slows down iostream i/o. If we're not mixing stdio and iostream, we can turn it off, and then iostream is fastest.
The code: https://gist.github.com/3845568
http://www.quora.com/Is-cin-cout-slower-than-scanf-printf/answer/Aditya-Vishwakarma
Performance of cin/cout can be slow because they need to keep themselves in sync with the underlying C library. This is essential if both C IO and C++ IO is going to be used.
However, if you only going to use C++ IO, then simply use the below line before any IO operations.
std::ios::sync_with_stdio(false);
For more info on this, look at the corresponding libstdc++ docs.
Probably scanf is somewhat faster than using streams. Although streams provide a lot of type safety, and do not have to parse format strings at runtime, it usually has an advantage of not requiring excessive memory allocations (this depends on your compiler and runtime). That said, unless performance is your only end goal and you are in the critical path then you should really favour the safer (slower) methods.
There is a very delicious article written here by Herb Sutter "The String Formatters of Manor Farm" who goes into a lot of detail of the performance of string formatters like sscanf and lexical_cast and what kind of things were making them run slowly or quickly. This is kind of analogous, probably to the kind of things that would affect performance between C style IO and C++ style. The main difference with the formatters tended to be the type safety and the number of memory allocations.
I just spent an evening working on a problem on UVa Online (Factovisors, a very interesting problem, check it out):
http://uva.onlinejudge.org/index.php?option=com_onlinejudge&Itemid=8&category=35&page=show_problem&problem=1080
I was getting TLE (time limit exceeded) on my submissions. On these problem solving online judge sites, you have about a 2-3 second time limit to handle potentially thousands of test cases used to evaluate your solution. For computationally intensive problems like this one, every microsecond counts.
I was using the suggested algorithm (read about in the discussion forums for the site), but was still getting TLEs.
I changed just "cin >> n >> m" to "scanf( "%d %d", &n, &m )" and the few tiny "couts" to "printfs", and my TLE turned into "Accepted"!
So, yes, it can make a big difference, especially when time limits are short.
If you care about both performance and string formatting, do take a look at Matthew Wilson's FastFormat library.
edit -- link to accu publication on that library: http://accu.org/index.php/journals/1539
The statements cin and cout in general use seem to be slower than scanf and printf in C++, but actually they are FASTER!
The thing is: In C++, whenever you use cin and cout, a synchronization process takes place by default that makes sure that if you use both scanf and cin in your program, then they both work in sync with each other. This sync process takes time. Hence cin and cout APPEAR to be slower.
However, if the synchronization process is set to not occur, cin is faster than scanf.
To skip the sync process, include the following code snippet in your program right in the beginning of main():
std::ios::sync_with_stdio(false);
Visit this site for more information.
There are stdio implementations (libio) which implements FILE* as a C++ streambuf, and fprintf as a runtime format parser. IOstreams don't need runtime format parsing, that's all done at compile time. So, with the backends shared, it's reasonable to expect that iostreams is faster at runtime.
Yes iostream is slower than cstdio.
Yes you probably shouldn't use cstdio if you're developing in C++.
Having said that, there are even faster ways to get I/O than scanf if you don't care about formatting, type safety, blah, blah, blah...
For instance this is a custom routine to get a number from STDIN:
inline int get_number()
{
int c;
int n = 0;
while ((c = getchar_unlocked()) >= '0' && c <= '9')
{
// n = 10 * n + (c - '0');
n = (n << 3) + ( n << 1 ) + c - '0';
}
return n;
}
The problem is that cin has a lot of overhead involved because it gives you an abstraction layer above scanf() calls. You shouldn't use scanf() over cin if you are writing C++ software because that is want cin is for. If you want performance, you probably wouldn't be writing I/O in C++ anyway.
Of course it's ridiculous to use cstdio over iostream. At least when you develop software (if you are already using c++ over c, then go all the way and use it's benefits instead of only suffering from it's disadvantages).
But in the online judge you are not developing software, you are creating a program that should be able to do things Microsoft software takes 60 seconds to achieve in 3 seconds!!!
So, in this case, the golden rule goes like (of course if you dont get into even more trouble by using java)
Use c++ and use all of it's power (and heaviness/slowness) to solve the problem
If you get time limited, then change the cins and couts for printfs and scanfs
(if you get screwed up by using the class string, print like this: printf(%s,mystr.c_str());
If you still get time limited, then try to make some obvious optimizations (like avoiding too many embedded for/while/dowhiles or recursive functions). Also make sure to pass by reference objects that are too big...
If you still get time limited, then try changing std::vectors and sets for c-arrays.
If you still get time limited, then go on to the next problem...
#include <stdio.h>
#include <unistd.h>
#define likely(x) __builtin_expect(!!(x), 1)
#define unlikely(x) __builtin_expect(!!(x), 0)
static int scanuint(unsigned int* x)
{
char c;
*x = 0;
do
{
c = getchar_unlocked();
if (unlikely(c==EOF)) return 1;
} while(c<'0' || c>'9');
do
{
//*x = (*x<<3)+(*x<<1) + c - '0';
*x = 10 * (*x) + c - '0';
c = getchar_unlocked();
if (unlikely(c==EOF)) return 1;
} while ((c>='0' && c<='9'));
return 0;
}
int main(int argc, char **argv) {
int parity = 0;
unsigned int x;
while (1 != (scanuint(&x))) {
parity ^= x;
}
parity ^=x;
printf("%d\n", parity);
return 0;
}
There's a bug at the end of the file, but this C code is dramatically faster than the faster C++ version.
paradox#scorpion 3845568-78602a3f95902f3f3ac63b6beecaa9719e28a6d6 ▶ make test
time ./xor-c < rand.txt
360589110
real 0m11,336s
user 0m11,157s
sys 0m0,179s
time ./xor2-c < rand.txt
360589110
real 0m2,104s
user 0m1,959s
sys 0m0,144s
time ./xor-cpp < rand.txt
360589110
real 0m29,948s
user 0m29,809s
sys 0m0,140s
time ./xor-cpp-noflush < rand.txt
360589110
real 0m7,604s
user 0m7,480s
sys 0m0,123s
The original C++ took 30sec the C code took 2sec.
Even if scanf were faster than cin, it wouldn't matter. The vast majority of the time, you will be reading from the hard drive or the keyboard. Getting the raw data into your application takes orders of magnitude more time than it takes scanf or cin to process it.
I am trying to add a timed delay in a C++ program, and was wondering if anyone has any suggestions on what I can try or information I can look at?
I wish I had more details on how I am implementing this timed delay, but until I have more information on how to add a timed delay I am not sure on how I should even attempt to implement this.
An updated answer for C++11:
Use the sleep_for and sleep_until functions:
#include <chrono>
#include <thread>
int main() {
using namespace std::this_thread; // sleep_for, sleep_until
using namespace std::chrono; // nanoseconds, system_clock, seconds
sleep_for(nanoseconds(10));
sleep_until(system_clock::now() + seconds(1));
}
With these functions there's no longer a need to continually add new functions for better resolution: sleep, usleep, nanosleep, etc. sleep_for and sleep_until are template functions that can accept values of any resolution via chrono types; hours, seconds, femtoseconds, etc.
In C++14 you can further simplify the code with the literal suffixes for nanoseconds and seconds:
#include <chrono>
#include <thread>
int main() {
using namespace std::this_thread; // sleep_for, sleep_until
using namespace std::chrono_literals; // ns, us, ms, s, h, etc.
using std::chrono::system_clock;
sleep_for(10ns);
sleep_until(system_clock::now() + 1s);
}
Note that the actual duration of a sleep depends on the implementation: You can ask to sleep for 10 nanoseconds, but an implementation might end up sleeping for a millisecond instead, if that's the shortest it can do.
In Win32:
#include<windows.h>
Sleep(milliseconds);
In Unix:
#include<unistd.h>
unsigned int microsecond = 1000000;
usleep(3 * microsecond);//sleeps for 3 second
sleep() only takes a number of seconds which is often too long.
#include <unistd.h>
usleep(3000000);
This will also sleep for three seconds. You can refine the numbers a little more though.
Do you want something as simple like:
#include <unistd.h>
sleep(3);//sleeps for 3 second
Note that this does not guarantee that the amount of time the thread sleeps will be anywhere close to the sleep period, it only guarantees that the amount of time before the thread continues execution will be at least the desired amount. The actual delay will vary depending on circumstances (especially load on the machine in question) and may be orders of magnitude higher than the desired sleep time.
Also, you don't list why you need to sleep but you should generally avoid using delays as a method of synchronization.
You can try this code snippet:
#include<chrono>
#include<thread>
int main(){
std::this_thread::sleep_for(std::chrono::nanoseconds(10));
std::this_thread::sleep_until(std::chrono::system_clock::now() + std::chrono::seconds(1));
}
You can also use select(2) if you want microsecond precision (this works on platform that don't have usleep(3))
The following code will wait for 1.5 second:
#include <sys/select.h>
#include <sys/time.h>
#include <unistd.h>`
int main() {
struct timeval t;
t.tv_sec = 1;
t.tv_usec = 500000;
select(0, NULL, NULL, NULL, &t);
}
`
I found that "_sleep(milliseconds);" (without the quotes) works well for Win32 if you include the chrono library
E.g:
#include <chrono>
using namespace std;
main
{
cout << "text" << endl;
_sleep(10000); // pauses for 10 seconds
}
Make sure you include the underscore before sleep.
Yes, sleep is probably the function of choice here. Note that the time passed into the function is the smallest amount of time the calling thread will be inactive. So for example if you call sleep with 5 seconds, you're guaranteed your thread will be sleeping for at least 5 seconds. Could be 6, or 8 or 50, depending on what the OS is doing. (During optimal OS execution, this will be very close to 5.) Another useful feature of the sleep function is to pass in 0. This will force a context switch from your thread.
Some additional information:
http://www.opengroup.org/onlinepubs/000095399/functions/sleep.html
The top answer here seems to be an OS dependent answer; for a more portable solution you can write up a quick sleep function using the ctime header file (although this may be a poor implementation on my part).
#include <iostream>
#include <ctime>
using namespace std;
void sleep(float seconds){
clock_t startClock = clock();
float secondsAhead = seconds * CLOCKS_PER_SEC;
// do nothing until the elapsed time has passed.
while(clock() < startClock+secondsAhead);
return;
}
int main(){
cout << "Next string coming up in one second!" << endl;
sleep(1.0);
cout << "Hey, what did I miss?" << endl;
return 0;
}
to delay output in cpp for fixed time, you can use the Sleep() function by including windows.h header file
syntax for Sleep() function is Sleep(time_in_ms)
as
cout<<"Apple\n";
Sleep(3000);
cout<<"Mango";
OUTPUT. above code will print Apple and wait for 3 seconds before printing Mango.
Syntax:
void sleep(unsigned seconds);
sleep() suspends execution for an interval (seconds).
With a call to sleep, the current program is suspended from execution for the number of seconds specified by the argument seconds. The interval is accurate only to the nearest hundredth of a second or to the accuracy of the operating system clock, whichever is less accurate.
Many others have provided good info for sleeping. I agree with Wedge that a sleep seldom the most appropriate solution.
If you are sleeping as you wait for something, then you are better off actually waiting for that thing/event. Look at Condition Variables for this.
I don't know what OS you are trying to do this on, but for threading and synchronisation you could look to the Boost Threading libraries (Boost Condition Varriable).
Moving now to the other extreme if you are trying to wait for exceptionally short periods then there are a couple of hack style options. If you are working on some sort of embedded platform where a 'sleep' is not implemented then you can try a simple loop (for/while etc) with an empty body (be careful the compiler does not optimise it away). Of course the wait time is dependant on the specific hardware in this case.
For really short 'waits' you can try an assembly "nop". I highly doubt these are what you are after but without knowing why you need to wait it's hard to be more specific.
On Windows you can include the windows library and use "Sleep(0);" to sleep the program. It takes a value that represents milliseconds.