Flushing denormalised numbers to zero

Flushing denormalised numbers to zero - c++

I've scoured the web to no avail.
Is there a way for Xcode and Visual C++ to treat denormalised numbers as 0? I would have thought there's an option in the IDE preferences to turn on this option but can't seem to find it.
I'm doing some cross-platform audio stuff and need to stop certain processors hogging resources.
Cheers

You're looking for a platform-defined way to set FTZ and/or DAZ in the MXCSR register (on x86 with SSE or x86-64); see https://stackoverflow.com/a/2487733/567292
Usually this is called something like _controlfp; Microsoft documentation is at http://msdn.microsoft.com/en-us/library/e9b52ceh.aspx
You can also use the _MM_SET_FLUSH_ZERO_MODE macro: http://msdn.microsoft.com/en-us/library/a8b5ts9s(v=vs.71).aspx - this is probably the most cross-platform portable method.

For disabling denormals globally I use these 2 macros:
//warning these macros has to be used in the same scope
#define MXCSR_SET_DAZ_AND_FTZ \
int oldMXCSR__ = _mm_getcsr(); /*read the old MXCSR setting */ \
int newMXCSR__ = oldMXCSR__ | 0x8040; /* set DAZ and FZ bits */ \
_mm_setcsr( newMXCSR__ ); /*write the new MXCSR setting to the MXCSR */
#define MXCSR_RESET_DAZ_AND_FTZ \
/*restore old MXCSR settings to turn denormals back on if they were on*/ \
_mm_setcsr( oldMXCSR__ );
I call the first one at the beginning of the process and the second at the end.
Unfortunately this seems to not works well on Windows.
To flush denormals locally I use this
const Float32 k_DENORMAL_DC = 1e-25f;
inline void FlushDenormalToZero(Float32& ioFloat)
{
ioFloat += k_DENORMAL_DC;
ioFloat -= k_DENORMAL_DC;
}

See update (4 Aug 2022 at the end of this entry
To do this, use the Intel Intrinsics macros during program startup. For example:
#include <immintrin.h>
int main() {
_MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);
}
In my version of MSVC, this emitted the following assembly code:
stmxcsr DWORD PTR tv805[rsp]
mov eax, DWORD PTR tv805[rsp]
bts eax, 15
mov DWORD PTR tv807[rsp], eax
ldmxcsr DWORD PTR tv807[rsp]
MXCSR is the control and status register, and this code is setting bit 15, which turns flush zero mode on.
One thing to note: this only affects denormals resulting from a computation. If you want to also set denormals to zero if they're used as input, you also need to set the DAZ flag (denormals are zero), using the following command:
_MM_SET_DENORMALS_ZERO_MODE(_MM_DENORMALS_ZERO_ON);
See https://software.intel.com/en-us/cpp-compiler-developer-guide-and-reference-setting-the-ftz-and-daz-flags for more information.
Also note that you need to set MXCSR for each thread, as the values contained are local to each thread.
Update 4 Aug 2022
I've now had to deal with ARM processors as well. The following is a cross-platform macro that works on ARM and Intel:
#ifndef __ARM_ARCH
extern "C" {
extern unsigned int _mm_getcsr();
extern void _mm_setcsr(unsigned int);
}
#define MY_FAST_FLOATS _mm_setcsr(_mm_getcsr() | 0x8040U)
#else
#define MY_FPU_GETCW(fpcr) __asm__ __volatile__("mrs %0, fpcr" : "=r"(fpcr))
#define MY_FPU_SETCW(fpcr) __asm__ __volatile__("msr fpcr, %0" : : "r"(fpcr))
#define MY_FAST_FLOATS \
{ \
uint64_t eE2Hsb4v {}; /* random name to avoid shadowing warnings */ \
MY_FPU_GETCW(eE2Hsb4v); \
eE2Hsb4v |= (1 << 24) | (1 << 19); /* FZ flag, FZ16 flag; flush denormals to zero */ \
MY_FPU_SETCW(eE2Hsb4v); \
} \
static_assert(true, "require semi-colon after macro with this assert")
#endif

Related

Is there a way to check if virtualisation is enabled on the BIOS using C++?

I'm trying to check if virtualisation (AMD-V or Intel VT) is enabled programmatically. I know the bash commands that gives you this information but I'm trying to achieve this in C++.
On that note, I'm trying to avoid using std::system to execute shell code because of how hacky and unoptimal that solution is. Thanks in advance!

To check if VMX or SVM (Intel and AMD virtualization technologies) are enabled you need to use the cpuid instruction.
This instruction comes up so often in similar tests that the mainstream compilers all have an intrinsic for it, you don't need inline assembly.
For Intel's CPUs you have to check CPUID.1.ecx[5], while for AMD's ones you have to check CPUID.1.ecx[2].
Here's an example. I only have gcc on an Intel CPU, you are required to properly test and fix this code for other compilers and AMD CPUs.
The principles I followed are:
Unsupported compilers and non-x86 architecture should fail to compile the code.
If run on an x86 compatible CPU that is neither Intel nor AMD, the function return false.
The code assumes cpuid exists, for Intel's CPUs this is true since the 486. The code is designed so that cpuid_ex can return false to denote that cpuid is not present, but I didn't implement a test for it because that would require inline assembly. GCC and clang have a built-in for this but MSVCC doesn't.
Note that if you only build 64-bit binaries you always have cpuid, you can remove the 32-bit architecture check so that if someday somebody tries to use a 32-bit build, the code fails to compile.
#if defined(_MSC_VER)
#include <intrin.h>
#elif defined(__clang__) || defined(__GNUC__)
#include <cpuid.h>
#endif
#include <string>
#include <iostream>
#include <cstdint>
//Execute cpuid with leaf "leaf" and given the temporary array x holding the
//values of eax, ebx, ecx and edx (after cpuid), copy x[start:end) into regs
//converting each item in a uint32_t value (so this interface is the same even
//for esoteric ILP64 compilers).
//If CPUID is not supported, returns false (NOTE: this check is not done, CPUID
//exists since the 486, it's up to you to implement a test. GCC & CLANG have a
//nice macro for this. MSVCC doesn't. ICC I don't know.)
bool cpuid_ex(unsigned int leaf, uint32_t* regs, size_t start = 0, size_t end = 4)
{
#if ( ! defined(__x86_64__) && ! defined(__i386__) && ! defined(_M_IX86) && ! defined(_M_X64))
//Not an x86
return false;
#elif defined(_MSC_VER) || defined(__INTEL_COMPILER)
//MS & Intel
int x[4]
__cpuid((int*)x, leaf);
#elif defined(__clang__) || defined(__GNUC__)
//GCC & clang
unsigned int x[4];
__cpuid(leaf, x[0], x[1], x[2], x[3]);
#else
//Unknown compiler
static_assert(false, "cpuid_ex: compiler is not supported");
#endif
//Conversion from x[i] to uint32_t is safe since GP registers are 32-bit at least
//if we are using cpuid
for (; start < end; start++)
*regs++ = static_cast<uint32_t>(x[start]);
return true;
}
//Check for Intel and AMD virtualization.
bool support_virtualization()
{
//Get the signature
uint32_t signature_reg[3] = {0};
if ( ! cpuid_ex(0, signature_reg, 1))
//cpuid is not supported, this returns false but you may want to throw
return false;
uint32_t features;
//Get the features
cpuid_ex(1, &features, 2, 3);
//Is intel? Check bit5
if (signature_reg[0] == 0x756e6547 && signature_reg[1] == 0x6c65746e && signature_reg[2] == 0x49656e69)
return features & 0x20;
//Is AMD? check bit2
if (signature_reg[0] == 0x68747541 && signature_reg[1] == 0x69746e65 && signature_reg[2] == 0x444d4163)
return features & 0x04;
//Not intel or AMD, this returns false but you may want to throw
return false;
}
int main()
{
std::cout << "Virtualization is " << (support_virtualization() ? "": "NOT ") << "supported\n";
return 0;
}

No, there is no way to detect virtualization support using only standard C++ facilities. The C++ standard library does not have anything related to hardware virtualization. (Its exposure of low level details like that is extremely limited.)
You would need to use OS-specific facilities to detect this.

How to get the Process Environment Block (PEB) address using assembler (x64 OS)?

I'm trying to get PEB address of the current process with assembler.
the cpp file:
#include <iostream>
//#include <windows.h>
extern "C" int* __ptr64 Get_Ldr_Addr();
int main(int argc, char **argv)
{
std::cout << "asm " << Get_Ldr_Addr() << "\n";
//std::cout <<"peb "<< GetModuleHandle(0) << "\n";
return 0;
}
the asm file:
.code
Get_Ldr_Addr proc
push rax
mov rax, GS:[30h]
mov rax, [rax + 60h]
pop rax
ret
Get_Ldr_Addr endp
end
But I get different addresses from the GetModuleHandle(0) and the Get_Ldr_Addr()!
what is the problem? doesn't is suppose to be the same?
Q: If the function is external, it will check the PEB of the process that called it or of the function's dll (it suppose to be a dll)?
Tnx

If you don't mind C. Works in Microsoft Visual Studio 2015.
Uses the "__readgsqword()" intrinsic.
#include <winnt.h>
#include <winternl.h>
// Thread Environment Block (TEB)
#if defined(_M_X64) // x64
PTEB tebPtr = reinterpret_cast<PTEB>(__readgsqword(reinterpret_cast<DWORD_PTR>(&static_cast<NT_TIB*>(nullptr)->Self)));
#else // x86
PTEB tebPtr = reinterpret_cast<PTEB>(__readfsdword(reinterpret_cast<DWORD_PTR>(&static_cast<NT_TIB*>(nullptr)->Self)));
#endif
// Process Environment Block (PEB)
PPEB pebPtr = tebPtr->ProcessEnvironmentBlock;

Just two comments.
No need to push/pop rax because it's a scratch or volatile register on Windows, see the caller/callee saved registers. In particular, rax will hold the return value for your function.
It often helps to step through the machine code when you call GetModuleHandle() and compare it with your own assembly code. You'll probably encounter something like this implementation.

I like Sirmabus' answer but I much prefer it with simple C casts and the offsetof macro:
PPEB get_peb()
{
#if defined(_M_X64) // x64
PTEB tebPtr = (PTEB)__readgsqword(offsetof(NT_TIB, Self));
#else // x86
PTEB tebPtr = (PTEB)__readfsdword(offsetof(NT_TIB, Self));
#endif
return tebPtr->ProcessEnvironmentBlock;
}

Get_Ldr_Addr didnt save your result.
you should not protect rax by push and pop because rax is the return value

Programmatically get processor details from Mac OS X

My application running on Mac OS X that needs to retrieve details about the machine it is running on to report for system information. One of the items I need is details about the processor(s) installed in the computer.
My code currently works, but is far from an ideal solution, in fact I consider it a bad solution, but I have had no luck in finding a better one.
The information I report currently and after some formatting looks like:
Processor: Intel Core 2 Duo 2.1 GHz, Family 6 Model 23 Stepping 6
All of the info I get is through command-line utilities called from a popen(). The readable part of the processor description is taken from the "system_profiler" command output and the Family, Model, and Stepping values are taken from the "sysctl" command.
These command-line utilities must be getting there information from somewhere. I'm wondering if there is an programmatic interface available to get this same info?
Related:
How can display driver version be obtained on the Mac?

Use sysctlbyname rather than sysctl, e.g.
#include <stdio.h>
#include <stdint.h>
#include <sys/types.h>
#include <sys/sysctl.h>
uint64_t get_cpu_freq(void)
{
uint64_t freq = 0;
size_t size = sizeof(freq);
if (sysctlbyname("hw.cpufrequency", &freq, &size, NULL, 0) < 0)
{
perror("sysctl");
}
return freq;
}
You can get a list of the names that can be passed to systctlbyname by looking at the output of sysctl -a from the command line.

You need to look at the IOKit APIs. The IORegistryExplorer application (part of the standard devtools installation) will help you locate what you're looking for.
For instance, on my MacBook Pro, in IORegistryExplorer I select 'IODeviceTree' from the pull-down at the top-left of the window, and I can see two CPUs in the tree view below. Selecting either one gets me the following information:
IORegistryExplorer screenshot http://blog.alanquatermain.net/images/IORegistryExplorer-CPUs.png
'bus-frequency' and 'clock-frequency', and 'timebase-frequency' are all 32-bit integers wrapper in data objects, and must therefore be byte-swapped to interpret here (little-endian i386 machine words), and work out to the following values:
bus-frequency: 1064000000 Hz => 1.064 GHz
clock-frequency:2530000000 Hz => 2.53 GHz
timebase-frequency: 1000000000 HZ => 1.0 GHz
If you're reading these via IOKit however, you'll get back a CFDataRef, and can just copy the bytes into your own uint32_t like so:
uint32_t bus_frequency = 0;
CFDataGetBytes( theData, (UInt8 *) &bus_frequency, sizeof(uint32_t) );
Next, you can get processor info using the NXArchInfo() call obtained by including <mach-o/arch.h>. This will return a structure containing cpu type and subtype codes along with C-string names and descriptions. If that doesn't include a stepping ID, the only way I can think of to obtain that (off the top of my head) is via the CPUID instruction. Create a .s and .h file, and put in the following code:
.s file:
#ifdef __i386__ || __x86_64__
.macro ENTRY
.text
.private_extern $0
.align 4, 0x90
$0:
.endmacro
// __private_extern__ unsigned long GetCPUSteppingID( void )
ENTRY _GetCPUSteppingID
push %ebp // store existing frame pointer
mov %esp,%ebp // make a new frame pointer from stack pointer
#if __x86_64__
push %rbx
#endif
push %ebx // we save %ebx because the cpuid instruction
// will overwrite it, and it's expected
// to be unchanged in the caller after
// calling another function
movl $1,%eax // fetch cpu info
cpuid // stepping-id is in low 4 bits of %edx now
and $0x0000000f,%edx // clear out everything we don't want
#if __x86_64__
mov %edx,%rax // %rax is 64-bit arch result register
#else
mov %edx,%eax // %eax is 32-bit arch result register
#endif
pop %ebx // restore saved value of %ebx
#if __x86_64__
pop %rbx // restore saved value of %rbx
#endif
leave // restores prior stack frame from %ebp
ret // returns to caller
#endif // __i386__ || __x86_64__
.h file:
#ifndef __GET_STEPPING_ID__
#define __GET_STEPPING_ID__
/* unsigned long is register-sized on both 32-bit and 64-bit OS X */
__private_extern__ unsigned long GetSteppingID( void );
#endif /* __GET_STEPPING_ID__ */
Please note that I'm not certain about the x86_64 bit above; in theory what I've typed there will ensure that the same code compiles for 64-bit, and will return a 64-bit value in that case. It will also save/restore the %rbx register, the 64-bit version of the %ebx register. Theoretically that will cover all bases.

sysctl(3) is probably a good place to start. You probably want the stuff defined by the CTL_HW selectors.

A variant of Paul R's method
#include <iostream>
#include <stdio.h>
#include <stdint.h>
#include <sys/types.h>
#include <sys/sysctl.h>
void show_cpu_info(void)
{
char buffer[1024];
size_t size=sizeof(buffer);
if (sysctlbyname("machdep.cpu.brand_string", &buffer, &size, NULL, 0) < 0) {
perror("sysctl");
}
std::cout << buffer << '\n';
}
will directly show something like Intel(R) Core(TM)2 Duo CPU L9400 # 1.86GHz.

If you specifically want CPU information then use cpuid (in C __asm cpuid) instruction. It gives all possible information of a CPU including its family, model, company, number of cores etc. Primarily all APIs use this instruction to retrieve CPU information. You can get detailed information on CPUID on the web, including sample code and tutorials.

Inline io wait using MASM

How to convert this to use VC++ and MASM
static __inline__ void io_wait(void)
{
asm volatile("jmp 1f;1:jmp 1f;1:");
}
I know asm changes to __asm and we remove the volatile but whats next?
I am trying to create the function to place in the code below
#define PIC1 0x20
#define PIC2 0xA0
#define PIC1_COMMAND PIC1
#define PIC1_DATA (PIC1+1)
#define PIC2_COMMAND PIC2
#define PIC2_DATA (PIC2+1)
#define PIC_EOI 0x20
#define ICW1_ICW4 0x01 /* ICW4 (not) needed */
#define ICW1_SINGLE 0x02 /* Single (cascade) mode */
#define ICW1_INTERVAL4 0x04 /* Call address interval 4 (8) */
#define ICW1_LEVEL 0x08 /* Level triggered (edge) mode */
#define ICW1_INIT 0x10 /* Initialization - required! */
#define ICW4_8086 0x01 /* 8086/88 (MCS-80/85) mode */
#define ICW4_AUTO 0x02 /* Auto (normal) EOI */
#define ICW4_BUF_SLAVE 0x08 /* Buffered mode/slave */
#define ICW4_BUF_MASTER 0x0C /* Buffered mode/master */
#define ICW4_SFNM 0x10 /* Special fully nested (not) */
void remap_pics(int pic1, int pic2)
{
UCHAR a1, a2;
a1=ReadPort8(PIC1_DATA);
a2=ReadPort8(PIC2_DATA);
WritePort8(PIC1_COMMAND, ICW1_INIT+ICW1_ICW4);
io_wait();
WritePort8(PIC2_COMMAND, ICW1_INIT+ICW1_ICW4);
io_wait();
WritePort8(PIC1_DATA, pic1);
io_wait();
WritePort8(PIC2_DATA, pic2);
io_wait();
WritePort8(PIC1_DATA, 4);
io_wait();
WritePort8(PIC2_DATA, 2);
io_wait();
WritePort8(PIC1_DATA, ICW4_8086);
io_wait();
WritePort8(PIC2_DATA, ICW4_8086);
io_wait();
WritePort8(PIC1_DATA, a1);
WritePort8(PIC2_DATA, a2);
}

I think you'll have better luck by telling us what you're trying to do with this code. Neither of the platforms supported by VC++ will wait for IO completion by executing an unconditional jump.
Nevertheless, given your example, I see several problems you need to address first:
"1f" needs to have a suffix indicating that it's hexadecimal. In VC++ you can use either C-style (0x1f) or assembly style (1fh) suffixes in inline assembly
it seems that you've got two "1" labels. Besides the fact that two labels of the same name are going to collide, I believe VC++ doesn't support label names containing only digits
1fh is a strange address to jump to. In Real mode it's IRQ area, in Protected mode it's inside the first page, which most of the OSes keep not-present to catch NULL dereference.
Barring that, your code can be translated to VC++ should look like this:
__asm {
jmp 1fh
a1:
jmp 1fh
b1:
}
But this will not get you anything useful. So please state what you're trying to accomplish

Seems GNU gas syntax, jmp 1f means jump to label 1 forward.
static __inline__ void io_wait(void)
{
#ifdef __GNUC__
asm volatile("jmp 1f;1:jmp 1f;1:");
#else
/* MSVC x86 supports inline asm */
__asm {
jmp a1
a1:
jmp b1
b1:
}
#endif
}

How to detect what CPU is being used during runtime?

How can I detect which CPU is being used at runtime ? The c++ code needs to differentiate between AMD / Intel architectures ? Using gcc 4.2.

The cpuid instruction, used with EAX=0 will return a 12-character vendor string in EBX, EDX, ECX, in that order.
For Intel, this string is "GenuineIntel". For AMD, it's "AuthenticAMD". Other companies that have created x86 chips have their own strings.The Wikipedia page for cpuid has many (all?) of the strings listed, as well as an example ASM listing for retrieving the details.
You really only need to check if ECX matches the last four characters. You can't use the first four, because some Transmeta CPUs also start with "Genuine"
For Intel, this is 0x6c65746e
For AMD, this is 0x444d4163
If you convert each byte in those to a character, they'll appear to be backwards. This is just a result of the little endian design of x86. If you copied the register to memory and looked at it as a string, it would work just fine.
Example Code:
bool IsIntel() // returns true on an Intel processor, false on anything else
{
int id_str; // The first four characters of the vendor ID string
__asm__ ("cpuid":\ // run the cpuid instruction with...
"=c" (id_str) : // id_str set to the value of EBX after cpuid runs...
"a" (0) : // and EAX set to 0 to run the proper cpuid function.
"eax", "ebx", "edx"); // cpuid clobbers EAX, ECX, and EDX, in addition to EBX.
if(id_str==0x6c65746e) // letn. little endian clobbering of GenuineI[ntel]
return true;
else
return false;
}
EDIT: One other thing - this can easily be changed into an IsAMD function, IsVIA function, IsTransmeta function, etc. just by changing the magic number in the if.

If you're on Linux (or on Windows running under Cygwin), you can figure that out by reading the special file /proc/cpuinfo and looking for the line beginning with vendor_id. If the string is GenuineIntel, you're running on an Intel chip. If you get AuthenticAMD, you're running on an AMD chip.
void get_vendor_id(char *vendor_id) // must be at least 13 bytes
{
FILE *cpuinfo = fopen("/proc/cpuinfo", "r");
if(cpuinfo == NULL)
; // handle error
char line[256];
while(fgets(line, 256, cpuinfo))
{
if(strncmp(line, "vendor_id", 9) == 0)
{
char *colon = strchr(line, ':');
if(colon == NULL || colon[1] == 0)
; // handle error
strncpy(vendor_id, 12, colon + 2);
fclose(cpuinfo);
return;
}
}
// if we got here, handle error
fclose(cpuinfo);
}
If you know you're running on an x86 architecture, a less portable method would be to use the CPUID instruction:
void get_vendor_id(char *vendor_id) // must be at least 13 bytes
{
// GCC inline assembler
__asm__ __volatile__
("movl $0, %%eax\n\t"
"cpuid\n\t"
"movl %%ebx, %0\n\t"
"movl %%edx, %1\n\t"
"movl %%ecx, %2\n\t"
: "=m"(vendor_id), "=m"(vendor_id + 4), "=m"(vendor_id + 8) // outputs
: // no inputs
: "%eax", "%ebx", "%edx", "%ecx", "memory"); // clobbered registers
vendor_id[12] = 0;
}
int main(void)
{
char vendor_id[13];
get_vendor_id(vendor_id);
if(strcmp(vendor_id, "GenuineIntel") == 0)
; // it's Intel
else if(strcmp(vendor_id, "AuthenticAMD") == 0)
; // it's AMD
else
; // other
return 0;
}

On Windows, you can use the GetNativeSystemInfo function
On Linux, try sysinfo

You probably should not check at all. Instead, check whether the CPU supports the features you need, e.g. SSE3. The differences between two Intel chips might be greater than between AMD and Intel chips.

You have to define it in your Makefile arch=uname -p 2>&1 , then use #ifdef i386 some #endif for diferent architectures.

I have posted a small project:
http://sourceforge.net/projects/cpp-cpu-monitor/
which uses the libgtop library and exposes data through UDP. You can modify it to suit your needs. GPL open-source. Please ask if you have any questions regarding it.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Flushing denormalised numbers to zero - c++

Related

Is there a way to check if virtualisation is enabled on the BIOS using C++?

How to get the Process Environment Block (PEB) address using assembler (x64 OS)?

Programmatically get processor details from Mac OS X

Inline io wait using MASM

How to detect what CPU is being used during runtime?

Categories

Resources