Where is the buffer allocated in opencl?

Where is the buffer allocated in opencl? - c++

I was trying to create a memory buffer in OpenCL with C++ binding. The sentence looks like
cl::Buffer buffer(context,CL_MEM_READ_ONLY,sizeof(float)*(100));
This sentence confuses me because it doesn't specify which device the memory is allocated on. In principle context contains all devices, including cpu and gpu, on the chosen platform. Is it true that the buffer is put in a common region shared by all the devices?

The spec does not define where the memory is. For the API user, it is "in the context".
If you have one device only, probably (99.99%) is going to be in the device. (In rare cases it may be in the host if the device does not have enough memory for the time being)
In case of many different devices, it will be in one of them at the creation. But it may move transparently to another device depending on the kernel launches.
This is the reason why the call clEnqueueMIgrateMemObjects (OpenCL 1.2 only) exists.
It allows the user to tell some hints to the API about where the memory will be needed, and prepare the copy in advance.
Here is the definition of what it does:
clEnqueueMIgrateMemObjects provides a mechanism for assigning which device an OpenCL memory object resides. A user may wish to have more explicit control over the location of their memory objects on creation. This could be used to:
Ensure that an object is allocated on a specific device prior to usage.
Preemptively migrate an object from one device to another.
Typically, memory objects are implicitly migrated to a device for which enqueued commands, using the memory object, are targeted

Related

C Declaration of an custom Jump-Table

Good Evening,
i would like to make use of a custom jump table in an current design.
As i dont know how to do this properly, i do like to ask for help.
Setup:
My project is going to run on an ARM Cortex-M4F Device.
The Device does have 256KB of Flash. I am going to allocate 128kB for System-Code and 128kB for User-Code. The Device does have 96kB of Ram. I am going to allocate 48kB for System-Code and 48kB for User-Code.
System-Code is code flashed to the device once via SWD. It does handle all the Peripherals and so on.
User-Code is code, which can be flashed via Bootloader (located in System-Code).
User-Code must not use any blocking calls or ISRs or so forth.
The goal of the project is not to provide a tasker or any sort of kernel.
My Goal:
Upon an external Event (Pin-Trigger ISR) the System-Code (declaring the associated ISR) does start a series of function calls. It does read external devices and performs some calculations. Afterwards, the System-Code does write some RAM data into the User-RAM section (fixed Adress and Size via Linker script). Then the System-Code should invoke the User-Code.
The User-Code does consist of an array of up to 2048 functions (Static in application, but unknown at compile time of System-Code as User-Code can be reflashed).
These functions must not return any values or take any arguments.
I would like to know, how to properly allocated and subsequently call these functions (Non-Blocking, no ISRs, shared Stack).
My requirements:
User-Functions do not block. They run to completion. No preemption and so forth is required.
User-Functions can be interrupted by ISRs.
The User-functions must not have any parameters and must not return any values.
They can use the common stack.
My approach:
Define a Section for the User-Functions-Table in both linker scripts (System and User - seperate projects and compilation units) at the Same adresses (1x Flash-Page e.g.)
Treat that page as an array of pointers to functions in the System-Code. Call as required (Handling max. index and so on seperatly).
In the user-code allocate an array of pointers to functions in this section as well. Implement the functions somewhere else and assign them to the array. Decalre them as attribute(isr) to make sure registers are preserved.
My questions:
Is tis approach possible at all?
Are there simpler approaches?
What issue can you see, which i cant? (Novice)
How would it be possible to pass parameters from system-code to these function calls? Is the signature of a pointer to a function like void DoX(char x, int z) the same as to void DoY(void)?
Thank you very much.

How to share HGLOBAL with another application?

I'm trying to understand something about HGLOBALs, because I just found out that what I thought is simply wrong.
In app A I GlobalAlloc() data (with GMEM_SHARE|GMEM_MOVABLE) and place the string "Test" in it. Now, what can I give to another application to get to that data?
I though (wrongfully!) that HGLOBALs are valid in all the processes, which is obviously wrong, because HGLOBAL is a HANDLE to the global data, and not a pointer to the global data (that's where I said "OHHHH!").
So how can I pass the HGLOBAL to another application?
Notice: I want to pass just a "pointer" to the data, not the data itself, like in the clipboard.
Thanks a lot! :-)

(This is just a very long comment as others have already explained that Win32 takes different approach to memory sharing.)
I would say that you are reading into books (or tutorials) on Windows programming which are quite old and obsolete as Win16 is virtually dead for quite some time.
16-bit Windows (3.x) didn't have the concept of memory isolation (or virtual /flat/ address space) that 32-bit (and later) Windows versions provide. Memory there used to be divided into local (to the process) and global sections, both living in the same global address space. Descriptors like HGLOBAL were used to allow memory blocks to be moved around in physical memory and still accessed correctly despite their new location in the address space (after proper fixation with LocalLock()/GlobalLock()). Win32 uses pointers instead since physical memory pages can be moved without affecting their location in the virtual address space. It still provides all of the Global* and Local* API functions for compatibility reasons but they should not be used anymore and usual heap management should be used instead (e.g. malloc() in C or the new operator in C++). Also several different kind of pointers existed on Win16 in order to reflect on the several different addressing modes available on x86 - near (same segment), far (segment:offset) and huge (normalised segment:offset). You can still see things like FARPTR in legacy Win16 code that got ported to Win32 but they are defined to be empty strings as in flat mode only near pointers are used.

Read the documentation. With the introduction of 32-bit processing, GlobalAlloc() does not actually allocate global memory anymore.
To share a memory block with another process, you could allocate the block with GlobalAlloc() and put it on the clipboard, then have the other process retreive it. Or you can allocate a block of shared memory using CreateFileMapping() and MapViewOfFile() instead.

Each process "thinks" that it owns the full memory space available on the computer. No process can "see" the memory space of another process. As such, normally, nothing a process stores can be seen by another process.
Because it can be necessary to pass information between processess, certain mechanisms exists to provide this functionality.
One approach is message passing; one process issues a message to another, for example over a pipe, or a socket, or by a Windows message.
Another is shared memory, where a given block of memory is made available to two or more processes, such that whatever one process writes can be seen by the others.

Don't be confused with GMEM_SHARE flag. It does not work the way you possibly supposed. From MSDN:
The following values are obsolete, but are provided for compatibility
with 16-bit Windows. They are ignored.
GMEM_SHARE
GMEM_SHARE flag explained by Raymond Chen:
In 16-bit Windows, the GMEM_SHARE flag controlled whether the memory
should outlive the process that allocated it.
To share memory with another process/application you instead should take a look at File Mappings: Memory-mapped files and how they work.

the Direct3D VertexBuffer Lock() and Unlock() function implemented by different D3DPOOL

the IDirect3DVertexBuffer9 has that methods
STDMETHOD(Lock)(THIS_ UINT OffsetToLock,UINT SizeToLock,void** ppbData,DWORD Flags) PURE
STDMETHOD(Unlock)(THIS) PURE
I don't know the internal implementation of these functions.
I do expect next.
The 'Lock' method maps a VertexBuffer's video memory to ppbData. it gains much fater performance.
The 'Lock' method is allocs a system-mem and make ppbData to point it. And the 'UnLock' method is copy that memory to real video memory. this approach by placing a abstraction layer in hardware eliminates differences.
i guess that :
in 'D3DPOOL_SYSTEMMEM' mode, it implemented by way 2.
in 'D3DPOOL_DEFAULT' mode, it implemented by way 1.
/* Pool types */
typedef enum _D3DPOOL {
D3DPOOL_DEFAULT = 0,
D3DPOOL_MANAGED = 1,
D3DPOOL_SYSTEMMEM = 2,
D3DPOOL_SCRATCH = 3,
D3DPOOL_FORCE_DWORD = 0x7fffffff
} D3DPOOL;
but i don't know how implemented in every D3DPOOL mode.
help me~~

In D3DPOOL_DEFAULT, buffer contents are lost when fullscreen device loses focus and device is "lost" (D3DERR_DEVICENOTRESET or D3DERR_DEVICELOST). In this case, data within buffer is expected to be stored in video memory.
In D3DPOOL_MANAGED, a copy of data (that is stored within video memory) is stored in system memory, and as a result driver will restore it when device is lost.
D3DPOOL_SCRATCH is unsupported for vertex buffers.
D3DPOOL_SYSTEMMEM, will not guarantee better performance, because you'll be transferring data - frequently - from system memory to video memory in order to use this vertex buffer. For better performance on buffers that are frequently updated there are dynamic vertex buffers (see D3DUSAGE_DYNAMIC, D3DLOCK_DISACRD, D3DLOCK_NOOVERWRITE), which are located in D3DPOOL_DEFAULT. Also, Direct3D9 documentation says that resources created in system memory are normally not accessible for D3D9 device. For rendering from system memory there are DrawIndexedPrimitiveUP and DrawPrimitiveUP which are bound to cause problems on pure D3D9 device.
Also, there's absolutely no guarantee that either flag makes device work as you think. If common sense tells you it should work this way, but this is not documented in specification, according to Murphy's law, it probably doesn't work the way it should. For all practical purposes, driver implementation could be written by insane lunatic, as long as it conforms to Direct3D specification.
Another thing is that those functions are documented. DirectX SDK comes with several help files - *.chm that can be read on any windows system, .HxI/.HxS that integrate into visual studio, plus there's online help on MSDN, which includes explanation for D3DPOOL. If you're asking questions like this, you haven't done the homework and did not read documentation. So go ahead and read it. If Direct3D9 documentation is no longer included into latest SDK, then simply get older version of it (summer of 2004).

Shared memory API, where a process can attach shared memory to other process

Can any one look into this and suggest me with an API.
We have APIs for a process which can create and/or attach a shared memory to its own process. But I don't find an API to attach a shared memory to one process by other process(for e.g., process A should call one API(like shmat()) to attach the shared memory to process B).

Shared memory doesn't belong to any particular process (unless you create it with a private IPC_PRIVATE key). It belongs to the system.
So, when you use shmget with a non-private key (and the IPC_CREAT flag), you will either create a shared memory block or attach to an existing one.
You need a way for both processes to use the same IPC key and this is often done by using ftok which uses a file specification and an identifier to give you an IPC key for use in the shmget call (and other IPC type calls, such as msgget or semget).
For example, in the programs pax1 and pax2, you may have a code segment like:
int getMyShMem (void) {
key_t mykey = ftok ("/var/pax.cfg", 0); // only one shm block so use id of 0
if (mykey == (key_t)-1) // no go.
return -1;
return shmget (mykey, 1024, IPC_CREAT); // get (or make) a 1K block.
}
By having both processes use the same file specification and ID, they'll get the same shared memory block.
You can use different IDs to give you distinct shared memory blocks all based on the same file (you may, for example, want one for a configuration shared memory block and another for storing shared state).
And, given that it's your configuration file the IPC key is based on, the chances of other programs using it is minuscule (I think it may be zero but I'm not 100% sure).
You can't forcefully inject shared memory into a process from outside that process (well, you may be able to but it would be both dangerous and require all sorts of root-level permissions). That would break the protected process model and turn you system into something about as secure as MS-DOS :-)

Let's see, allow one process to force a shared memory segment on to another? What is the receiver going to do with it? How will it know it now has mapped this block in - what is expected of it.
You're thinking about the problem the wrong way - simply hoisting a block of memory on to a second process is not going to allow you to do what you want. You need to notify the second process also that it has now mapped this block and so can start doing stuff with it. I suggest you take a step back and really look at your design and what you are doing. My recommended approach would be
A connects to B via some other IPC (say socket)
A informs B that it should attach with the details (name etc.)
B then attaches - and now B is aware of it and can start doing stuff with it. (say for example once the attach completes, B confirms to A, and then they can start talking over the shared memory block).
As for wrapping shared memory in a nice library - consider boost::interprocess.

You are asking to attach the process memory of other process, right?
Just open(2) the file /proc/<pid>/mem and use it. Check the /proc/<pid>/map for the list of usable address in the file.

Using shared memory under Windows. How to pass different data

I currently try to implement some interprocess communication using the Windows CreateFileMapping mechanism. I know that I need to create a file mapping object with CreateFileMapping first and then create a pointer to the actual data with MapViewOfFile. The example then puts data into the mapfile by using CopyMemory.
In my application I have an image buffer (1 MB large) which I want to send to another process. So now I inquire a pointer to the image and then copy the whole image buffer into the mapfile. But I wonder if this is really necessary. Isn't it possible to just copy an actual pointer in the shared memory which points to the image buffer data? I tried a bit but didn't succeed.

Different processes have different address spaces. If you pass a valid pointer in one process to another process, it will probably point to random data in the second process. So you will have to copy all the data.

I strongly recommend you use Boost::interprocess. It has lots of goodies to manage this kind of stuff & even includes some special Windows-only functions in case you need to interoperate w/ other processes that use particular Win32 features.
The most important thing is to use offset pointers rather than regular pointers. Offset pointers are basically relative pointers (they store the difference between where the pointer is and where the thing pointed to is). This means that even if the two pointers are mapped to different address spaces, as long as the mappings are identical in structure then you are fine.
I've used all kinds of complicated data structures with offset smart pointers and it worked like a charm.

Shared Memory doesn't mean sending and receiving of Data. Its a memory created for number of processes without violation. For that you have to follow some mechanisms like locks so that the data will not corrupt.
In process 1 :
CreateFileMapping() : It will create the Shared Memory Block, with the name provided in last parameter, if it is not already present and returns back a handle (you may call it a pointer), if successful.
MapViewOfFile() : It maps (includes) this shared block in the process address space and returns a handle (again u can say a pointer).
With this pointer returned by MapViewOfFile() only you can access that shared block.
In process 2 :
OpenFileMapping() : If the shared memory block is successfully created by CreateFileMapping(), you can use it with the same name (name used to create the shared memory block).
UnmapViewOfFile() : It will unmap (you can remove the shared memory block from that process address space). When you are done using the shared memory (i.e. access, modification etc) call this function .
Closehandle() : finally to detach the shared memory block from process , call this with argument,handle returned by OpenFileMapping() or CreateFileMapping().
Though these functions look simple, the behaviour is tricky if the flags are not selected properly.
If you wish to read or write shared memory, specify PAGE_EXECUTE_READWRITE in CreateFileMapping().
Whenever you wish to access shared memory after creating it successfully, use FILE_MAP_ALL_ACCESS in MapViewOfFile().
It is better to specify FALSE (do not inherit handle from parent process) in OpenFileMapping() as it will avoid confusion.

You CAN get shared memory to use the same address over 2 processes for Windows. It's achieveable with several techniques.
Using MapViewOfFileEx, here's the significant experpt from MSDN.
If a suggested mapping address is
supplied, the file is mapped at the
specified address (rounded down to the
nearest 64K-boundary) if there is
enough address space at the specified
address. If there is not enough
address space, the function fails.
Typically, the suggested address is
used to specify that a file should be
mapped at the same address in multiple
processes. This requires the region of
address space to be available in all
involved processes. No other memory
allocation can take place in the
region that is used for mapping,
including the use of the VirtualAlloc
or VirtualAllocEx function to reserve
memory.
If the lpBaseAddress parameter
specifies a base offset, the function
succeeds if the specified memory
region is not already in use by the
calling process. The system does not
ensure that the same memory region is
available for the memory mapped file
in other 32-bit processes.
Another related technique is to use a DLL with a section marked Read + Write + Shared. In this case, the OS will pretty much do the MapViewOfFileEx call for you and for any other process which loads the DLL.
You may have to mark your DLL to a FIXED load address, not relocateable etc.. naturally.

You can use Marshalling of pointers.

If it's possible, it would be best to have the image data loaded/generated directly into the shared memory area. This eliminates the memory copy and puts it directly where it needs to be. When it's ready you can signal the other process, giving it the offset into your shared memory where the data begins.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js