On PC hardware, there's basically three timers that are generally 

1) the "tick" counter. This is a millisecond-type timer, and is 
   read by calls such as timeGetTime() or GetTickCount(). It is 
   driven by the age-old ISA interrupt controller (which by now 
   is probably virtualized into some do-everything-legacy chip 
   three bridges away from the CPU on the motherboard).
   This is sort-of millisecond precise. Reading it requires a 
   bus transaction all the way out to the LPC ISA interface, so 
   it's fairly slow as such things go (say, 1-2 microseconds).
   (boo!) If there's heavy bus traffic, such as running a game 
   which makes good use of AGP and DMA for graphics and audio, 
   this timer will glitch and lose a tick or two every so often; 
   slowly falling behind. (boo!)

2) the "performance" counter. This is a microsecond-type timer, 
   and is read by the QueryPerformanceCounter() call. It 
   typically advances by little over 1 million ticks per second, 
   or little over 3 million ticks per second, depending on bus. 
   It is derived from, I think, some PCI controller counter.
   Reading it requires, at least, a CPU->northbridge bus 
   transaction, and as such is not super snappy. Call it 1 
   microsecond. (boo!)
   If there's heavy bus traffic, on many, common chip sets, even 
   very new ones, this timer may suddenly jump between one and 
   four seconds forward in time (boo!). I believe there is some 
   internal race WRT this counter being 64 bits, but kept in two 
   separate 32-bit registers, but don't hold me to that.

3) the "time stamp" counter. This is a register internal to the 
   CPU, which counts, using a 64 bit timer, the number of CPU 
   clocks executed since system power-on (more or less). On a 
   four GHz machine, it will take over a hundred and thirty years 
   for this counter to wrap (yay!). Reading this counter stays 
   within the CPU, and as such is really fast (call it 40 cycles)
   (yay!). Because it really measures CPU clock ticks on all but 
   the most exotic architectures (read: Transmeta :-), this 
   counter is extremely useful for determining exactly how long a 
   piece of code takes to execute, on a micro-benchmarking scale. 
   You still have to compensate for the fact that you may take an 
   interrupt in the middle of measurement, so run the same piece 
   of code 100 times and pick the BEST value -- that's how well 
   your code can perform, which is what you want to measure unless 
   you're trying to model starting with cold caches. (yay!)

   Unfortunately, this third model, if you want to turn it into 
   real time, needs the relation between CPU ticks and seconds. 
   This can be had by profiling the CPU on start-up, using some of 
   the other timers available. Such profiling always has a little 
   bit of imprecision (boo!). What's worse, on laptops, the rate 
   of advance on this register will fluctuate wildly, because, you 
   guessed it, the CPU speed fluctuates wildly. (boo!)

Thus, there is no fool-proof way of measuring elapsed time on a PC 
in a really accurate manner. If you're trying to keep a simulation 
in sync over a network, say, this may be of high importance to you 
so you'll probably, in the end, come up with some system which 
uses a combination of all three. I use the cycle counter for most 
measurements, because it's fast and cheap, and then I continually 
re-calibrate by using the other two counters and having them vote. 
If they vote differently, I discard that sample and wait until 
later. This seems to work reasonably well.

On the CPU-speed-switching machines, you are mostly toast. The best 
you can do is selectively instantiate another class for your clock, 
and hope that the effect of jumping timer inaccuracy won't totally 
kill your application.

I wrote the following answer to a similar question on the
GDAlgorithms mailing list:

We use QueryPerformanceCounter()/QueryPerformanceFrequency()
to get the initial two measurements, putting a Sleep(50) in
the middle, and looping back if QPC says we slept for more
than 200 milliseconds. There's a watch-dog, so after 5
tries, it gives up, and accepts whatever numbers we got as

Also, the QPC call itself isn't allowed to take more than
some number of RDTSC cycles (I think we settled on 100,000)
to avoid pre-emption or interrupts during the timing from
throwing us off.

As I've said before on this list, we use rdtsc during
intra-frame measurements, but once a second or so, we
re-calibrate the timers by calling timeGetTime() and
QueryPerformanceCounter() to vote about who slipped the
least, and update the baseline. We also use these
measurements to update the estimate for CPU frequency. We
use the highest CPU frequency ever detected, because of
SpeedStep; this means that during CPU idle, time using RDTSC
will advance more slowly, and then skip to catch up when we
next re-calibrate the clock. For us, this is better than
sometimes returning time values that are ahead, because the
CPU slowed down and we don't know it yet.

No, there are no good real-time timers on current laptops.
The next generation of chip sets will support the "advanced
timer" feature, which will hopefully make this better -- but
that was the idea with the PCI timers (QPC), too, and bugs
in implementation made them much less useful than they
should have been.

Thus, the code looks something like:

 void read_counters( __int64 * otsc, __int64 * opc ) {
   __int64 rdtsc_a, rdtsc_b;
   __int64 pca;
   int cnt = 0;
   rdtsc_a = rdtsc();
   QueryPerformanceCounter( &pca );
   rdtsc_b = rdtsc();
   if( rdtsc_b - rdtsc_a > READING_THRESHOLD && cnt++ < 5 ) goto again;
   *otsc = (rdtsc_a + rdtsc_b)/2;
   *opc = pca;

 __int64 calibrate_rdtsc()
   __int64 pca, pcb, pcf, tsca, tscb;
   double interval;

   QueryPerformanceFrequency( &pcf );

   for( int ix = 0; ix < 5; ++ix ) {
     read_counters( &tsca, &pca );
     Sleep( 50 );
     read_counters( &tscb, &pcb );
     interval = double(pcb-pca)/double(pcf);
     if( interval < 0.2 ) {
   return (tscb-tsca) / interval;