I have a process where i need to warm my cache, like -
while (signal != true)
{
//spin and wait
memcpy(dummy, src, dummy_size);
# for keeping src in cache
}
memcpy(dest, src, size);
I want the src buffer to be in cache, but i dont want to use an additional buffer to keep copying the contents. Is there any way to do this? (just fetch part).
edit - Is there a way by which dummy is not needed.
Related
I'm writing a server that will compress files and send them over an http socket.
Unfortunately, they're not really files, they're more like database entries from a remote source.
I want to compress each entry in memory and then send them out over my http server, and each entry is potentially large, like 1GB each.
I receive data from the source in chunks, for instance 16mb (but could be any chunk size that makes sense).
Conceptually, this is what is happening, although this is a little bit of pseudo-code:
archive *_archive = archive_write_new();
//set to zip format
bool ok = true;
ok |= archive_write_set_format( _archive, ARCHIVE_FORMAT_ZIP );
ok |= archive_write_add_filter( _archive, ARCHIVE_FILTER_NONE );
char *_archiveBuffer = malloc(8192);
size_t _used;
ok = archive_write_open_memory( _archive, _archiveBuffer, 8192, &_used );
if (!ok) return ERROR;
archive_entry *_archiveEntry = archive_entry_new();
//fetch metadata about the object by id
QString id = "123456789";
QJsonObject metadata = database.fetchMetadata(id);
int size = metadata["size"].toInt();
//write the http header
httpd.writeHeader(size);
archive_entry_set_pathname( _archiveEntry, "entries/"+id );
archive_entry_set_size( entry, size );
archive_entry_set_filetype( _archiveEntry, AE_IFREG );
//archive_entry_set_perm( entry, ... );
archive_write_header( _archive, _archiveEntry );
int chunksize = 16777216;
for (int w = 0; w < size; w+=chunksize)
{
QByteArray chunk = database.fetchChunk(id,chunksize);
archive_write_data( _archive, chunk.data(), (size_t) chunk.size() );
//accumulate data, then fetch compressed data from _archiveBuffer and write to httpd
if (_used > 0)
{
httpd.writeData(_archiveBuffer);
//clear archive buffer?
}
}
archive_entry_free(_archiveEntry);
archive_write_close(_archive);
httpd.writeData(_archiveBuffer);
archive_write_free(_archive);
The question is, how do I know when data has been compressed to _archiveBuffer, and when it has, how can I read the buffer and then clear it, resetting the _used counter. I assume if _used>0, a compress/flush has happened.
Also, does the _archiveBuffer need to be greater than my chunksize?
Seems like I may need to use a callback, but unclear how to use archive_write_open with a callback and a memory buffer.
I can't seem to find examples online.
Any help would be appreciated!
The solution was much easier than I thought... just took a minute to realize it.
I'm sure it's obvious to those familiar with the library and streams.
Using callbacks was the answer. Don't care about opening in memory, as that creates an extra layer which is not useful, as the library already manages itself.
Depending on how your multithreading is configured, the callback will execute when something interesting happens on the archive stream, so for instance you can write single bytes to the archive over and over, but only when it's saturated will a callback happen. At that moment you can write to network or wherever in the callback. So the void *client_data is key because that links back to your main classes and API.
In my case I didn't want to write an http header until (archive) data was available, because any error could happen when fetching, which may result in a different http header.
When data is done, the close and free functions will also do their work with callbacks, so destructors need to happen after those callbacks complete.
Now the task is to multithread these requests... which seems simple now that I get the library.
If anyone is interested, I can post new pseudo-code.
This might be an X Y problem, so here's my issue:
I'm trying to send a command buffer to the GPU that adds values to a shader buffer, eg:
#version 450
#define INPUT_ARRAY_SIZE 1024
layout(std430, binding = 1) buffer InputArray{
float array[ ];
}input_array;
void main()
{
uint index = gl_GlobalInvocationID.x;
if (index >= INPUT_ARRAY_SIZE){
return;
}
InputArray.input_array[index] += 3;
}
I would like to be able to swap out the VkBuffer I use to back the shader buffer with other buffers. ie:
void addValue(device, queue, command_buffer, buffer);
or
void addValue(device, queue, command_buffer, descriptor_set);
where I would swap out buffer for other buffers I want to add values to.
Unfortunately I don't see a way to do that with out re-recording my command buffer. As far as I can tell my only options for minimizing the command buffer impact (which is large when my invocations take nano seconds), is to use secondary command buffers, and use pipeline cache some how. Otherwise I would have to create a command buffer for every single new buffer, which is not feasible when I have more than 100 commands. It doesn't seem to be possible to use VkUpdateDescriptorSets with out re-recording as well.
Is there a way to use pre-recorded command buffers, and change the VkBuffer used behind the shader buffer at will with out re-recording the command?
Not without the EXT_descriptor_index extension. Descriptor values (the location of the GPU resources they represent) are supposed to be baked into the CB at write time, not read from some external source.
Even with descriptor index, you need to ensure that the CB is not being executed before you can update the descriptor. So that would require a GPU/CPU sync (which may or may not be bad, depending on your semaphore/submission code structure).
Otherwise I would have to create a command buffer for every single new buffer, which is not feasible when I have more than 100 commands.
You should not put each command in its own buffer. You should bundle as many commands together as possible.
In general, the cost of building command buffers is pretty low. Coupled with threading their construction, they shouldn't be your primary concern here. Especially when the number of commands is as low as "more than 100 commands;" Vulkan users routinely issue thousands of commands into CBs repeatedly, every frame.
I'm attempting to Play a Raw (int16 PCM) encoded audio file in my android application. I've been following and reading through the Oboe documentation/samples to try to get one of my own audio files to play.
The audio file I need to play is roughly 6kb, or 1592 frames (stereo).
Either no sound plays, or sound/jitter plays on startup (with varying output - see bellow)
Troubleshooting
update
I have switched to floats for buffer queuing, instead of keeping everything to int16_t (and converting back to int16_t when done), although now I'm back to no sound.
The audio seems to be either not playing, or playing on startup (which is wrong). The sound should play after I press 'start'.
When the app was implemented with int16_t only, the premature sound was relative to how big the buffer size was. If the buffer size is smaller than the audio file, the sound is very fast and clipped (more drone-like at lower buffer sizes). Bigger than the Raw audio size it seems like it plays on a loop and gets quieter at higher buffer sizes. The sound would also get "softer" when the start button is pressed. I'm not even entirely sure this means the raw audio was playing, it could just be random nonsense jitters from Android.
When filling the buffers with floats, and converting to int16_t afterwards, no audio is played.
(I have tried running systrace, but I honestly don't know what I'm looking for)
The stream opens fine.
The buffer size fails to be ajusted in createPlaybackStream() (although somehow it still sets it to twice the burst size)
The stream starts fine.
The Raw resources are being loaded fine.
Implementation
What I am currently trying in the builder:
Setting the callback to this, or onAudioReady()
Setting the performance mode to LowLatency
Setting the sharing mode to Exclusive
Setting the buffer capacity to (anything bigger than my audio file frame count)
Setting the burst size (frames per call back) to (anything equal to or lower than the buffer capacity / 2)
I am using the Player class and the AAssetManager class from the Rhythm Game sample here: https://github.com/google/oboe/blob/master/samples/RhythmGame. I am using these classes to load my resources and play the sound. Player.renderAudio writes the audio data to the output buffer.
Here are the relevant methods from my audio engine:
void AudioEngine::createPlaybackStream() {
// // Load the RAW PCM data files into memory
std::shared_ptr<AAssetDataSource> soundSource(AAssetDataSource::newFromAssetManager(assetManager, "sound.raw", ChannelCount::Mono));
if (soundSource == nullptr) {
LOGE("Could not load source data for sound");
return;
}
sound = std::make_shared<Player>(soundSource);
AudioStreamBuilder builder;
builder.setCallback(this);
builder.setPerformanceMode(PerformanceMode::LowLatency);
builder.setSharingMode(SharingMode::Exclusive);
builder.setChannelCount(mChannelCount);
Result result = builder.openStream(&stream);
if (result == Result::OK && stream != nullptr) {
mSampleRate = stream->getSampleRate();
mFramesPerBurst = stream->getFramesPerBurst();
int channelCount = stream->getChannelCount();
if (channelCount != mChannelCount) {
LOGW("Requested %d channels but received %d", mChannelCount, channelCount);
}
// Set the buffer size to (burst size * 2) - this will give us the minimum possible latency while minimizing underruns
stream->setBufferSizeInFrames(mFramesPerBurst * 2);
if (setBufferSizeResult != Result::OK) {
LOGW("Failed to set buffer size. Error: %s", convertToText(setBufferSizeResult.error()));
}
// Start the stream - the dataCallback function will start being called
result = stream->requestStart();
if (result != Result::OK) {
LOGE("Error starting stream. %s", convertToText(result));
}
} else {
LOGE("Failed to create stream. Error: %s", convertToText(result));
}
}
DataCallbackResult AudioEngine::onAudioReady(AudioStream *audioStream, void *audioData, int32_t numFrames) {
int16_t *outputBuffer = static_cast<int16_t *>(audioData);
sound->renderAudio(outputBuffer, numFrames);
return DataCallbackResult::Continue;
}
// When the 'start' button is pressed, it calls this method with true
// There should be no sound on app start-up until this button is pressed
// Sound stops when 'stop' is pressed
setPlaying(bool isPlaying) {
sound->setPlaying(isPlaying);
}
Setting the buffer capacity to (anything bigger than my audio file frame count)
You don't need to set the buffer capacity. This will be set automatically at a reasonable level for you. Typically ~3000 frames. Note that buffer capacity is different from buffer size which defaults to 2*framesPerBurst.
Setting the burst size (frames per call back) to (anything equal to or lower than the buffer capacity / 2)
Again, don't do this. onAudioReady will be called every time the stream requires more audio data and numFrames indicates how many frames you should supply. If you override this value with a value which isn't an exact ratio of the audio device's native burst size (typical values are 128, 192 and 240 frames depending on underlying hardware) then you may get audio glitches.
I have switched to floats for buffer queuing
The format which you need to supply data in is determined by the audio stream and it is only known after the stream has been opened. You can get it by calling stream->getFormat().
In the RhythmGame sample (at least the version you're referring to) here's how the formats work:
Source file is converted from 16-bit to float inside AAssetDataSource::newFromAssetManager (floats are the preferred format for any kind of signal processing)
If the stream format is 16-bit then convert it back inside onAudioReady
1592 frames (stereo).
You said that your source was stereo but you're specifying it as mono here:
std::shared_ptr soundSource(AAssetDataSource::newFromAssetManager(assetManager, "sound.raw", ChannelCount::Mono));
Without doubt that will cause audio problems because the AAssetDataSource will have a value for numFrames which is double the correct value. This will cause audio glitches because half the time you'll be playing random parts of system memory.
I am implementing a reader of huge compressed raster files. Decompression is performed partially on the fly. Only requested regions of the raster are decompressed and stored in memory cache. Reader works similarly as memory mapping of a file but the data is not mapped to memory 1:1, it is decompressed.
It is implemented using anonymous memory mapping:
char* raster_cache = static_cast<char*>(mmap(0, UNCOMPRESSED_RASTER_SIZE, PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0));
Reading of an area which is not cached yet emits segmentation violation signal which is caught and handled using libsigsegv (see my previous question):
struct CacheHandlerData
{
std::mutex mutex;
// other data needed for decompression
};
int cache_sigsegv_handler(void* fault_address, void* user_data)
{
void* page_address = reinterpret_cast<void*>(reinterpret_cast<uintptr_t>(fault_address) & ~(PAGE_SIZE - 1));
CacheHandlerData* data = static_cast<CacheHandlerData*>(user_data);
std::lock_guard<std::mutex> lock(data->mutex);
unsigned char cached = 0;
mincore(page_address, 1, &cached);
if (!cached)
{
mprotect(page_address, PAGE_SIZE, PROT_WRITE);
// decompress whole page
mprotect(page_address, PAGE_SIZE, PROT_READ);
}
return 1;
}
The problem is that cached pages stay in memory forever. Because i write to the pages, they are marked as dirty and never invalidated.
QUESTION: Is there some possibility to mark pages as not dirty?
In case the system is running out of memory, the pages would be removed from memory similarly to a normal disk cache. It would also be needed to call mprotect(page_address, PAGE_SIZE, PROT_NONE) for the removed pages in order to cause a segmentation violation when the page is accessed again.
Thank you.
EDIT: I could use temporary file backed mapping instead of anonymous one. Pages would be swapped to disk in case the system is out of memory. But this solution loses benefits from using compressed data (smaller disk size, probably faster reading).
Why does my streaming OpenAL source somtimes go to AL_STOPPED state, forcing me to call alSourcePlay? This usually happens when I do not call send fast enough, i.e. in debug mode. Does the oal source automatically stop when it doesn't have enough queue buffers? How do I avoid that?
void send(audio_buffer audio) override
{
ALenum state;
alGetSourcei(source_, AL_SOURCE_STATE,&state);
if(state != AL_PLAYING)
alSourcePlay(source_); // This happens sometimes, usually when "send" is not called fast enough.
ALuint buffer = 0;
alSourceUnqueueBuffers(source_, 1, &buffer);
if(buffer)
{
alBufferData(buffer, AL_FORMAT_STEREO16, audio.data(), static_cast<ALsizei>(audio.size()*sizeof(int16_t)), 48000);
alSourceQueueBuffers(source_, 1, &buffer);
}
else
LOG << "Dropped audio.";
}
It sounds like your basic problem is that your audio stream is starved. There are a few options you can use to mitigate this, but they all have their own side effects:
(1) You can configure it to play from a looping buffer, to which you are supplying the relevant data. The downside to this is that it will audibly repeat itself if you starve the buffer too long, but it will have some better performance characteristics (fragmentation, etc).
(2) You can increase the send buffer size. This will only cover up small problems, and potentially increases the latency in dynamic content.
(3) Finally, you can thread the audio send operation, that way so long as the audio thread isn't starved, it can continue to send data in the background.
The high production / quality solution probably involes all three of these. Sorry for the lack of OpenAL specific terminology, but every audio system I've seen has these capabilities.