64bit Applications and Inline Assembly - c++

I am using Visual C++ 2010 developing 32bit windows applications. There is something I really want to use inline assembly. But I just realized that visual C++ does not support inline assembly in 64bit applications. So porting to 64bit in the future is a big issue.
I have no idea how 64bit applications are different from 32bit applications. Is there a chance that 32bit applications will ALL have to be upgraded to 64bit in the future? I heard that 64bit CPUs have more registers. Since performance is not a concern for my applications, using these extra registers is not a concern to me. Are there any other reasons that a 32bit application needs to be upgraded to 64bit? Would a 64 bit application process things differently when compared with a 32bit application, apart from that the 64bit applications may use registers or instructions that are unique to 64bit CPUs?
My application needs to interact with other OS components e.g. drivers, which i know must be 64bit in 64bit windows. Would my 32bit application compatible with them?

Visual C++ does not support inline assembly for x64 (or ARM) processors, because generally using inline assembly is a bad idea.
Usually compilers produce better assembly than humans.
Even if you can produce better assembly than the compiler, using inline assembly generally defeats code optimizers of any type. Sure, your bit of hand optimized code might be faster, but the fact that code around it can't be optimized will generally lead to a slower overall program.
Compiler intrinsics are available from pretty much every major compiler that let you access advanced CPU features (e.g. SSE) in a manner that's consistent with the C and C++ languages, and does not defeat the optimizer.
I am wondering would there be a chance that 32bit applications will ALL have to be upgraded to 64bit in the future.
That depends on your target audience. If you're targeting servers, then yes, it's reasonable to allow users to not install the WOW64 subsystem because it's a server -- you know it'll probably not be running too much 32 bit code. I believe Windows Server 2008 R2 already allows this as an option if you install it as a "server core" instance.
Since performance is not a concern for my appli so using the extra 64bit registers is not a concern to me. Is there any other reasons that a 32bit appli has to be upgraded to 64bit in the future?
64 bit has nothing to do with registers. It has to do with size of addressable virtual memory.
Would a 64 bit app process different from a 32bit appl process apart from that the 64bit appli is using some registers/instructions that is unique to 64bit CPUs?
Most likely. 32 bit applications are constrained in that they can't map things more than ~2GB into memory at once. 64 bit applications don't have that problem. Even if they're not using more than 4GB of physical memory, being able to address more than 4GB of virtual memory is helpful for mapping files on disk into memory and similar.
My application needs to interact with other OS components e.g. drivers, which i know must be 64bit in 64bit windows. Would my 32bit application compatible with them?
That depends entirely on how you're communicating with those drivers. If it's through something like a "named file interface" then your app could stay as 32 bit. If you try to do something like shared memory (Yikes! Shared memory accessible from user mode with a driver?!?) then you're going to have to build your app as 64 bit.

Apart form #Billy's great write up, if you really feel the need to use 64bit assembly, then you can use an external assembler like MASM to get that done, see this. (its also possible to speed this up with prebuild scripts).

the Intel C Compiler 15 has inline capability in 64bit too.
And you could integrate the IC in Visual Studio as a toolset: then you'd have VC++ 64bit with inline assembly.
One catch though -its expensive
cheers

While we're at it, MinGW also has 64-bit inline assembly language; and it's pretty fast, and free. It used to be slow on some math; so I'd start out comparing performances of MSVC vs. MinGW to see if its a decent starting place for your application.
Also, as to hand-coded assembly being slower:
Actually, humans very often do code assembly that runs more efficiently than compilers - or at least that was always the common wisdom when I was learning programming in the 70's and 80's and continued to be the case through ~2000.
You can always code it in "C" or C++, compile that to assembly, and tweak it to see if you can improve that. That way, you can learn from optimizations; and see if you can improve on them.
Assembly very much can have a place in code that needs high optimization, no matter what M$ says. You won't really know if assembly will or won't speed up code until you try it. Everything else is just pontificating.
As above, I favor the approach of compiling c++ code into assembly, and then hand-optimizing that. It saves you the trouble of writing much of it; and with a little experimentation, you may get something that tests out faster. FWIW, I've never needed to with a modern program. Often, other things can speed it up just as much or more - e.g. such as multi-threading, using look-up tables, moving time-expensive operations out of loops, using static analyzers, using real-time analyzers such as valgrind (if you're on Linux), etc. However, for performance-critical applications, I see no reason not to try; and just use it if it works. M$ is just being lazy by dropping inline assembly.
As to is 64-bit or 32-bit faster, this is similar to the situation with 16-bit vs. 32-bit. The wider bandwidth can sling huge amounts of data faster. If both run on a 64-bit OS, they run at exactly the same clock speed; so the 32-bit program shouldn't be faster. Yet, I've observed the CPU clock on 32-bit Win7 to run slightly faster than 64-bit Win7. Thus for the same number of threads, and for more CPU intensive operations, a 32-bit app on 32-bit Win7 would be faster. However, the difference isn't much; and 64-bit instructions can really make a difference. However, a given user will only have one OS installed; and so the 64-bit app will be either faster for that OS; or at best the same speed if running a 32-bit app on a 64-bit OS. It will be a larger download, however. You might as well go for the possibly faster speed with 64-bits; unless you are dealing with a dedicated system running code you know won't be moving large amounts of data.
Also, note that I benchmarked a 64-bit and a 32-bit app on OSs of the respective sizes, using the respective versions of MinGW. It did a lot of 64-bit floating point number crunching, and I was sure the 64-bit version would have the edge. It didn't!! My guess is that the floating point registers in the built-in math coprocessor run in equal numbers of clock cycles on both OSs, and perhaps slightly slower on 64-bit Win7. My benchmarks were so close in both versions, that one was not clearly faster. Perhaps long number-crunching operations were slower on 64-bit, but the 64-bit program code ran a little faster - causing nearly equal results.
Basically, the only time 32-bits makes sense, IMHO, is when you think you might have an in-house app that would run faster on a 32-bit OS; you want a really small executable, or when you are delivering to users on 32-bit OS machines (many developers still offer both versions), or a 32-bit embedded system.
Edited to reflect that some of my remarks pertain to my specific experience with Win7 x86 vs. x64.

Related

Is it possible to use 64-bit instruction in a 32-bit application on intel/64-bit win7

My environment is 64-bit win7, VC2010.
Of course, the intel inside is 64-bit CPU.
Can I use the 64-bit instruction/native machine word(64-bit) in a 32-bit application? Since most of my code is 32-bit, I do not want to port it to 64-bit.
For some performance critical hot code, I want to manually optimize it by compiler intrinsic or inline assembly(Also 64-bit VC compiler do NOT support inline assembly), is it possible to run 64-bit code fox example mov rax, rbx in a 32-bit mode application?
Actually, you kind of can. The mechanism used to switch from 32 to 64 bit mode (which clearly must exist on a 64 bit OS capable of running 32 bit code) is not in any way protected, and can be used from user code. On Windows it is called Heaven's Gate, and it's pretty simple: just a far call with a segment selector of 33h.
So, how to run 64bit code?
call 33h:your64bitcode
...
your64bitcode:
; do something
retf
There are, of course, some limitations to what you can do from 64bit code entered in that way, because you're not truly in a 64bit process.
If your application uses the 32-bit instruction set, then no, you cannot elect to use 64-bit instructions "sometimes" within the same application. There are some options, however:
Use SIMD instructions, such as SSE, AVX, etc. This is documented here: http://msdn.microsoft.com/en-us/library/t467de55(v=vs.90).aspx
Port your 32-bit code to 64-bit. Maybe it's not so hard?
Split your program in two: a 32-bit process with your legacy code, and a 64-bit process with new code. You could map memory between them, and one could launch the other, so it's a bit like a multithreaded program then. Whether it would perform well depends a lot on details we don't know at this time.

What to watch out for when writing 32-bit software on 64-bit machine?

I am purchasing a comfortable laptop for development but the only operating system available to me is 64-bit (Win7) , now I am basically aware that 64-bit has 8-byte integers and can utilize more RAM, that is about it.
My programming will vary (C++, sometimes PHP) but would like to know:
Can I build my C++ application to be 32-bit portable (run in a 32-bit computer without needing to build under a 32-bit virtual machine?)
What are the simple gotchas to watch out for when writing an application (casting, etc)
Processors have been 64 bit for some time. I'm perplexed about why people are afraid to make the move to a 64 bit operating system. A 32 bit OS can't address much more than 3Gb of RAM, so that's good enough reason to make the upgrade in my book!
When you're coding, the biggest difference I've encountered to look out for is the size of a pointer!
On a 32-bit compiled program, a pointer is normally 4 bytes. On a 64 bit compiled program, a pointer is normally 8 bytes.
Why does this matter?
Lets say your program uses sockets to communicate a data structure from one process to another. Maybe the server process is 32-bit and the client process is 64 bit.
While the struct might be defined identically in both 32 and 64 bit programs, the 64 bit exe will reserve 8 bytes per pointer (and structures normally contain pointers to other structs as in linked lists etc.).
This can lead to data misalignment when a 32 bit exe communicates a struct to a 64 bit exe.
In (almost?) all cases, communicating pointer values between processes is meaningless anyway, e.g. their data doesn't matter and can be omitted.
So you might think that communicating pointer values would be an uncommon practise - but the easy way to communicate a struct is to memcpy its contents over a socket, pointers and all!
This is the most significant snag I've found so far, when coding 64 bit clients, when our server software is 32-bit.
The simplest thing would be to simply build a 32-bit executable. With Visual Studio, just choose Win32 as the build type. With gcc, use the -m32 switch.
Specifically for windows, take a look at the msdn documentation of WOW64.
Windows 64 bit runs 32 bit applications in an emulator, so you can build an run 32 bit apps for your system. This has some (rather positive) effects on your app, e.g. virtual address space for your process increases, as the OS may use higher 64 bit addresses your 32 bit process isn't even aware of.
The MSVC++ data type sizes are also documented on MSDN. But if you are worried about casting, you should use bigger types that'll certainly fit your needs. The C++ standard doesn't define the exact size of types (afaik), only their relative sizes (short is shorter than int and so on). So, you cannot rely on the exact size of these types unless you use some int32 or __int32 which obviously won't change for 64 bit apps.
Cross compiling (for different platforms, pointer sizes, endianess, etc.) has been around for ages and as long as you use the right tools and flags to build your 32-bit executable, the build platform really shouldn't matter.
In case of the Microsoft compiler, it should as simple as firing up Visual Studio and compiling your program using the default "Win32 configuration. In case you prefer the command line, be sure to select the 32-bit tools by invoking the Visual Studio Command Prompt (as opposed the x64 version).
BTW, if you are running 64-bit Enterprise Server, you can turn on the Hypervisor Role, install 32-bit Win7 inside a virtual machine and actually test your built program. Finally, it's always a good idea to test the program on the actual target platform on which it wil execute :)...

Is there any real point compiling a Windows application as 64-bit?

I'd confidently say 99% of applications we write don't need to address more than 2Gb of memory. Of course, there's a lot of obvious benefit to the OS running 64-bit to address more RAM, but is there any particular reason a typical application would be compiled 64bit?
There are performance improvements that might see with 64-bit. A good example is that some parameters in function calls are passed via registers (less things to push on the stack).
Edit
I looked up some of my old notes from when I was studying some of the differences of running our product with a 64-bit build versus a 32-bit build. I ran the tests on a quad core 64-bit machine. So there is the question of comparing apples to oranges since the 32-bit was running under the emulation mode obviously. However, it seems that many things I read such as this, for example, consistently say that the speed hit for WOW64 is not significant. But even if that statement is not true, your application will almost certainly be run on a 64-bit OS. Thus a comparison of a 32-bit build versus 64-bit on a 64-bit machine has value.
In the testing I performed (certainly not comprehensive), I did not find any cases where the 32-bit build was faster. However many of the SQL intensive operations I ran (high CPU and high I/O) were 20% to 50% faster when running with the 64-bit build. These tests involved some fairly “ugly” SQL statements and also some TPCC tests with high concurrency. Of course, a lot depends on compiler switches quite a bit, so you need to do your own testing.
Building them as 64-bit now, even if you never release the build, can help you find and repair problems that you will encounter later when you're forced to build and release as 64-bit.
x64 has eight more general purpose registers that aren't available when running 32-bit code. That's three times as many (twice as many if you count ESI, EDI, EBP and ESP as general purpose; I don't). That can save a lot of loads and stores in functions that use more than four variables.
Don't underestimate the marketing value of offering a native 64-bit version of your product.
Also, you might be surprised just how many people work on apps that require as much memory as they can get.
I'd say only do it if you need more that 2GB.
One thing is 64-bit compilation means (obviously) 64-bit pointers. That means the code and data structures get a bit bigger, meaning that the app. will benefit a little less from cache and will hit the virtual memory a bit more often etc.
So, if you don't need it, the basic effect is to make your app a bit slower and more bloated for no reason.
That said, as time goes on, you'll care more about 64 bit anyway just because that's what all the tools and libraries etc will be written for. Even if your app can live quite happily in 64K, you're unlikely to use 16 bit code - the gains don't really matter (it's a small fast app anyway) and are certainly outweighed by the hassle involved. In time, we'll see 32-bit much the same way.
You could consider it as future-proofing. It may be a long way away, but consider some years in to the future, where 64-bit OS and CPUs are ubiquitous (consider how 16-bit faded away when 32-bit took over). If your application is 32-bit and all your competitors have moved on to 64-bit by then, your application could be seen as (or accused by your competitors as) out of date, slower, or incapable of change. Maybe even one day support for 32-bit applications will be dropped or incomplete (can Windows 7 run 16-bit apps properly?). If you're already building a 64-bit version of your application today, you avoid these problems. If you put it off till later, you might write a lot more code between now and when you port, then your port will be even harder.
For a lot of applications there aren't many compelling technical reasons, but if it's easy, porting now might save you effort in future.
If you don't need the extended address space, delivering in 64 bits mode offers nothing and has some disadvantage like increasing the memory consumption and the cache pressure.
While we offer 64 bits builds, our customer who are at the limit are pushing us to reduce the memory consumption so that they get these advantages.
All applications that may need lots of memory: database servers that want to cache lots of data in memory, scientific applications that handle lots of data, ...
I've recently read this article,Optimizing software in C++. In chapter 2.3 Choice of operating system there is a comparison between advantadges and disavantages of 64 and 32 bits system, with some specific observations regarding Windows.
Mark Wilkins already noted in this thread about more registers for function calls. Another interesting property of 64 bit system is this:
The SSE2 instruction set is supported on all 64-bit CPUs and operating systems.
SSE2 instructions can provide excellent optimizations and they are being increasingly used, so in my opinion this is a notable feature.
Fastcall makes calling subroutines faster by keeping the first four parameters in registers.
When you say that 99% of apps won't benefit from 64-bit, that may well be true for you personally, but during the day I use Visual Studio and Xcode to compile C++ with a large codebase, search the multi-Gb repositories with Google Desktop and Spotlight. Then I come home to write music using a sequencer using several Gb of sound libraries, and do some photoshopping on my 20Gb of photos, and maybe do a bit of video editing with my holiday clips.
So for me (and I dare say many other users), having 64-bit versions of many of these apps will be a great advantage. Word processor, web browser, email client: maybe not. But anything involved with large media will really benefit.
More data can be processed per clock cycle, which can deliver performance improvements to e.g. crypto, video encoding, etc. applications

Porting 32 bit C++ code to 64 bit - is it worth it? Why?

I am aware of some the obvious gains of the x64 architecture (higher addressable RAM addresses, etc)... but:
What if my program has no real need to run in native 64 bit mode. Should I port it anyway?
Are there any foreseeable deadlines for ending 32 bit support?
Would my application run faster / better / more secure as native x64 code?
x86-64 is a bit of a special case - for many architectures (eg. SPARC), compiling an application for 64 bit mode doesn't give it any benefit unless it can profitably use more than 4GB of memory. All it does is increase the size of the binary, which can actually make the code slower if it impacts on cache behaviour.
However, x86-64 gives you more than just a 64 bit address space and 64 bit integer registers - it also doubles the number of general purpose registers, which on a register-deficient architecture like x86 can result in a significant performance increase, with just a recompile.
It also lets the compiler assume that many extensions, like SSE and SSE2, are present, which can also significantly improve code optimisation.
Another benefit is that x86-64 adds PC-relative addressing, which can significantly simplify position-independent code.
However, if the app isn't performance sensitive, then none of this is really important either.
One possible benefit I haven't seen mentioned yet is that it might uncover latent bugs. Once you port it to 64-bit, a number of changes are made. The sizes of some datatypes change, the calling convention changes, the exception handling mechanism (at least on Windows) changes.
All of this might lead to otherwise hidden bugs surfacing, which means that you can fix them.
Assuming your code is correct and bug-free, porting to 64-bit should in theory be as simple as flicking a compiler switch. If that fails, it is because you're relying on things not guaranteed by the language, and so, they're potential sources of errors.
Here's what 64-bit does for you:
64-bit allows you to use more memory than a 32-bit app.
64-bit makes all pointers 64-bits, which makes your code footprint larger.
64-bit gives you more integer and floating point registers, which causes less spilling registers to memory, which should speed up your app somewhat.
64-bit can make 64-bit ALU operations faster (only helpful if you're using 64-bit data types).
You DO NOT get any extra security (another answer mentioned security, I'm not aware of any benefits like that).
You're limited to only running on 64-bit operating systems.
I've ported a number of C++ apps and seen about a 10% speedup with 64-bit code (same system, same compiler, the only change was a 32-bit vs 64-bit compiler mode), but most of those apps were doing a fair amount of 64-bit math. YMMV.
I wouldn't worry about 32-bit support going away any time soon.
(Edited to include notes from comments - thanks!)
Although its true that 32-bit will be around for a while in some form or another, Windows Server 2008 R2 ships with a 64-bit SKU only. I would not be surprised to see WOW64 as an install option as early as Windows 8 as more software migrates to 64-bit. WOW64 is a install, memory and performance overhead. The 3.5GB RAM limit in 32-bit Windows along with increasing RAM densities will encourage this migration. I'd rather have more RAM than CPU...
Embrace 64-bit! Take the time to make your 32-bit code 64-bit compatible, its a no brainer and straightforward. For normal applications the changes are more accurately describes as code corrections. For drivers the choice is: adapt or lose users. When the time comes you'll be ready to deploy on any platform with a recompile.
IMO the current cache related issues are moot; silicon improvements in this area and further 64-bit optimisation will be forthcoming.
If your program has no need to run under 64-bit, why would you? If you are not memory bound, and you don't have huge datasets, there is no point. The new Miata doesn't have bigger tires, because it doesn't NEED them.
32-bit support (even if only via emulation) will extend long past when your software ceases to be useful. We still emulate Atari 2600s, right?
No, in all likelyhood, your application will be slower in 64-bit mode, simply because less of it will fit in the processor's cache. It might be slightly more secure, but good coders don't need that crutch :)
Rico Mariani's post on why Microsoft isn't porting Visual Studio to 64-bit really sums it up Visual Studio: Why is there no 64 bit version? (yet)
It depends on whether your code is an application or a reusable library. For a library, keep in mind that the client of that library may have good reasons to run in 64-bit mode, so you have to ensure that your scenario works. This may also apply to applications when they are extensible via plugins.
If you don't have any real need now, and likely never will, for 64-bit mode, you shouldn't do porting.
If you don't have the need now, but may have it some day, you should try to estimate how much effort it will be (e.g. by turning on all respective compiler warnings, and attempting a 64-bit compilation). Expect that some things aren't trivial, so it will be useful to know how what problems you would likely encounter, and how long it would likely take to fix them.
Notice that a need may also arise from dependencies: if your program is a library (e.g. a DLL), it may be necessary to port it to 64-bit mode just because some host application gets ported.
For a foreseeable future, 32-bit applications will continue to be supported.
Unless there's a business reason to go to 64 bit, then there's no real "need" to support 64 bit.
However, there are some good reasons for going to 64 bit at some point, aside from all those that others have already mentioned.
It's getting harder to buy PCs that aren't 64 bit. Even though 32 bit apps will run in compatibility mode for years to come, any new PCs being sold today or in the future are likely to be 64 bit. If I have a shiny 64 bit operating system I don't really want to run "smelly old 32 bit apps" in compatibility mode!
Some things just don't run properly in comptibility mode - it's not the same thing as running on a 32-bit OS on 32-bit hardware. I've run into a few issues (e.g. registry access across the 32/64 bit registry hives, programs that fail because they're not in the folder they expect to be in, etc) when running in compatibility mode. I always feel nervous about running my code in compatibility mode - it's simply "not the real thing", and it often shows.
If you have written your code cleanly, then chances are you only have to recompile it as a 64 bit exe and it'll work fine, so there's no real reason not to give it a try.
the earlier you build a native 64 bit version, the easier it will be to keep it working on 64 bit as you add new features. That's a much better plan than continuing to develop in the dark ages for another 'n' years and then trying to jump out into the light.
When you go for your next job interview, you will be able to say that you have 64-bit expeirence and 32->64 porting experience.
You are already aware of the x64 advantages (most importantly the increased RAM size) and you are not interested in any, then don't port an executable (exe). Usually performance degrades after a port, mainly due to the increase in size of a x64 module over x86: all pointers now require double length, and this percolates everywhere, including code size (some jumps, function calls, vtables, virtual invokes, global symbols etc etc). Is not a significant degradation, but is usually measurable (3-5% speed decrease, depends on many factors).
DLLs are worth porting because you gain a new 'audience' in x64 apps that are able to consume your DLL.
Some OSs or configurations are unable to run 32-bit programs. A minimal Linux without 32-bit libc installed for example. Also IIRC I usually compile out the 32-bit support from the kernel.
If these OSs or configurations are part of your potential user base then yes, you should port it.
If you need more speed, then you should also port it (as others have said, x86-64 has more registers and cool instructions that speed it up).
Or, of course, if you want to mmap() or otherwise map a large file or lots of memory. Then 64-bit helps.
For example, if you had written 32-bit code (GNU C/++) as below
EDIT: format code
struct packet {
unsigned long name_id;
unsigned short age;
};
for network messaging, then you need to do porting while re-compiling on a 64 bit system, because of htonl/ntohl etc, communication get broken in the case of the network peer is still using the 32 bit system (using the same code as yours); you know sizeof(long) will be changed from 32 to 64 too at your side.
See more notes about 32/64 porting at http://code.google.com/p/effocore/downloads/list, document name EffoCoreRef.pdf.
It's pretty unlikely that you'd see any benefit unless you're in need of extreme security measures or obscene amounts of RAM.
Basically, you'd most likely know intuitively if your code was a good candidate for 64-bit porting.
Regarding deadlines. I would not worry, things like 32bit will be around for a good while natively and for a foreseeable future in some other form.
See my answer to this question here. I closed out that post saying that a 64-bit computer can store and retrieve much more information than a 32-bit computer. For most users this really doesn't mean a whole lot because things like browsing the web, checking email and playing Solitaire all work comfortably within the confines of 32-bit addressing. Where the 64-bit benefit will really shine is in areas where you have a lot of data the computer will have to churn through. Digital signal processing, gigapixel photography and advanced 3D gaming are all areas where their massive amounts of data processing would see a big boost in a 64-bit environment.
As for your code running faster/better, it's entirely up to your code and the requirements imposed on it.
As to performance issues, it depends on your program actually. If your program is pointer-intensive, porting to 64-bit may cause performance downgrading, since for CPU cache with the same size, each 64-bit pointer occupy more space on cache, and virtual-to-physical mappings also occupies more TLB space. Otherwise, if your program is not pointer-intensive, its performance will benefit from x64.
Of course performance is not the only reason for porting, other issues like porting effort, time scheduling should also be considered.
I would recommend porting it to 64 bit just so you are running "native" (Also, I use OpenBSD. In their AMD64 port, they do not provide any 32 bit emulation support, everything must be 64 bit)
Also, stdint.h is your best friend! By porting your application, you should learn how to code portably. Which will make your code work right when we have 128 bit processors too (in a few decades hopefully)
I've ported 2 or 3 things to 64 bit and now develop for both (which is very easy if you use stdint.h) and on my first project porting to 64 bit caused like 2 or 3 bugs to show up, but that was it. Most of it was a simple recompile and now I don't worry about the differences between 32 and 64 bit when making new code because I just automatically code portably. (using intptr_t and size_t and such)
In the case of a dll being called from a 64 bits process then the dll have to be 64 bits as well.
Then it does not matter if it's worth it, you simply have no choice.
One issue to keep in mind is the software libraries available. For instance, my company develops an application that uses multiple OpenGL libraries, and we do so on the OpenSuSE OS. In older versions of the OS, one could download a 32-bit versions of these libraries on the x86_64 architecture. Newer versions, however, don't have this. It made it easier to just compile in 64-bit mode.
64 bit will run a lot faster, when 64 bits compilers become mature, but when it will occur I dont know

How should application perform in 64-bit vs. 32-bit intel architectures?

I want to know the relative performances of a normal C++ application in the following scenarios:
Built as 32-bit app, run on Intel 64-bit processor (x64-64)
Built as 32-bit app, run on Intel 32-bit processor (x86)
Built as 64-bit app.
Also, what factors should I consider when modifying / developing the application to make it to run faster on 64-bit processors?
Short answer: you probably won't notice much of a difference.
Longer answer: 64-bit x86 has more general purpose registers, which gives the compiler more of an opportunity to optimize local variables into registers for faster access. the compiler can also assume more modern features, eg. not having to optimize code for a 386, and can assume your CPU has stuff like SSE instead of the old x87 FPU for floating point math. but pointers will be twice as wide, which is worse for the cache.
CPU-intensive programs might be noticeably faster on 64-bit. The processor has 16 instead of 8 general purpose registers available which are also twice as wide (64 instead of 32 bits).
Also the number of registers for SSE instructions is doubled from 8 to 16 which helps for multimedia-applications or other applications which do a lot of floating-point computations.
For details see x86-64 on Wikipedia.
One thing that has not been mentioned yet is that 64-bit versions of operating systems such as Windows and Linux use a different calling convention for function calls on 64-bit systems; instead of passing arguments on the stack, arguments are (preferrably) passed in registers, which is in principle faster. So software will be faster because there is less function call overhead.
The performance will very likely depend on your application, and can vary a lot, depending on whether or not you use libraries that have optimizations for 64-bit environments. If you want to count on speed up, you should focus on improving your algorithms, rather than considering the instruction set architecture.
As for preparing/developing for 64-bit... the key thing is to not make assumptions with regard to types and their respective sizes. If you need a type with a specific size, use the types defined in <stdint.h>. Whenever you see functions that use size_t or ptrdiff_t, you should use the typedefs rather than some other type.
In general, you won't find equivalent processors that differ only in their support for 64-bit operation, so it'll be hard to give any concrete comparisons between 1) and 2). On the other hand, the difference between building for 32 and 64 bit mode is entirely dependent on the application. A 64-bit version might be slightly slower or slightly faster than the 32-bit version. If your application uses a lot of temporary variables, then the increased register set of 64-bit mode can make a very large difference in performance.
From experience I've tended to find a 64-bit re-compile of a 32-bit application generally makes things about 30% faster. Its a rough figure but it holds for quite a number of applications i've ported to 64-bit. Basically its for the reasons explained above. You have more registers which is a godsend and allows for much less swapping in and out of memory (which will probably be cached anyway making the win quite small). Certain optimisations can be made much more easily as well. HOWEVER, you do suffer the problem of larger pointers that does wipe out some of the gain, not to mention that doing a context switch requires more memory to be used due to the larger register set.
Careful hand optimisation in 64-bit can provide HUGE performance wins, however.
Your best plan is to recompile as 64-bit and profile. ie See which is better.
Do you have any requirement for > 4G of memory? Exploiting gobs of memory is really the big reason to go 64-bit.
do you guys know anything about multi-channels MC concurrent data bus burst, IMC, and multi-core features of new x86_64 architectures? at least, memcpy can be optimized faster if 64 bits because of using 64 bits bus and registers regardless of concurrent burst. at least new archs are able to prefetch data from multiple memory modules into cache concurrently. and more...