I am currently trying to decompress targa (RGB24_RLE) image data.
My algorithm looks like this:
static constexpr size_t kPacketHeaderSize = sizeof(char);
//http://paulbourke.net/dataformats/tga/
inline void DecompressRLE(unsigned int a_BytePerPixel, std::vector<CrByte>& a_In, std::vector<CrByte>& a_Out)
{
for (auto it = a_In.begin(); it != a_In.end();)
{
//Read packet header
int header = *it & 0xFF;
int count = (header & 0x7F) + 1;
if ((header & 0x80) != 0) //packet type
{
//For the run length packet, the header is followed by
//a single color value, which is assumed to be repeated
//the number of times specified in the header.
auto paStart = it + kPacketHeaderSize;
auto paEnd = paStart + a_BytePerPixel;
//Insert packets into output buffer
for (size_t pk = 0; pk < count; ++pk)
{
a_Out.insert(a_Out.end(), paStart, paEnd);
}
//Jump to next header
std::advance(it, kPacketHeaderSize + a_BytePerPixel);
}
else
{
//For the raw packet, the header s followed by
//the number of color values specified in the header.
auto paStart = it + kPacketHeaderSize;
auto paEnd = paStart + count * a_BytePerPixel;
//Insert packets into output buffer
a_Out.insert(a_Out.end(), paStart, paEnd);
//Jump to next header
std::advance(it, kPacketHeaderSize + count * a_BytePerPixel);
}
}
}
It is called here:
//Read compressed data
std::vector<CrByte> compressed(imageSize);
ifs.seekg(sizeof(Header), std::ifstream::beg);
ifs.read(reinterpret_cast<char*>(compressed.data()), imageSize);
//Decompress
std::vector<CrByte> decompressed(imageSize);
DecompressRLE(bytePerPixel, compressed, decompressed);
imageSize is defined like this:
size_t imageSize = hd.width * hd.height * bytePerPixel
However, after DecompressRLE() finishes (which takes a very long time with 2048x2048 textures) decompressed is still empty/only contains zeros. Maybe I am missing somehting out.
count seems to be unreasonably high sometimes, which I think is abnormal.
compressedSize should be less than imageSize, otherwise it's not compressed. However, using ifstream::tellg() gives me wrong results.
Any help?
If you look carefully at your variables in your debugger, you would see that std::vector<CrByte> decompressed(imageSize); declares a vector with imageSize elements in it. In DecompressRLE you then insert at the end of that vector, causing it to grow. This is why your decompressed image is full of zeros, and also why it takes so long (because the vector will be resized periodically).
What you want to do is reserve the space:
std::vector<CrByte> decompressed;
decompressed.reserve(imageSize);
Your compressed buffer looks like it is larger than the file content, so you're still decompressing past the end of the file. The compressed file size should be in Header. Use it.
Related
I'm reading lines from a file, and want to perform some computation on each row by the GPU.
The problem that I'm facing is that up until now I used to copy an array of int in a constant size, now I have a vector of strings and each of them in a different size. I'm using:
std::vector<std::string> lines;
I have used a constant size to copy array. something like:
err = cudaMemcpy(_devArr, tmp, count * sizeof(unsigned int) * 8, cudaMemcpyHostToDevice);
But I'm not sure I fully get the idea how can it worked with vectors. How can I address and copy Vector of Strings? can I somehow copy it and still access it like an array with a thread+block index?
*Using the latest CUDA 10.2 and CUDA RTX 2060 graphic card
You need to flatten the strings down into a contiguous block of memory containing all the strings. My recommendation is to do it with two (total) blocks, one containing the combined string data, and one containing indexes for each of these strings.
std::string combined; //Works perfectly fine so long as it is contiguously allocated
std::vector<size_t> indexes; //You *might* be able to use int instead of size_t to save space
for(std::string const& line : lines) {
combined += line;
indexes.emplace_back(combined.size());
}
/* If 'lines' initially consisted of ["Dog", "Cat", "Tree", "Yard"], 'combined' is now
* "DogCatTreeYard", and 'indexes' is now [3, 6, 10, 14].
*/
//I'm hoping I am writing these statements correctly; I don't specifically have CUDA experience
err = cudaMemcpy(_devArr, combined.data(), combined.size(), cudaMemcpyHostToDevice);
err = cudaMemcpy(_devArr2, indexes.data(), indexes.size() * sizeof(size_t), cudaMemcpyHostToDevice);
Then, in the device itself, you'll be able to read each string as you need them. I'm unfamiliar with the syntax that CUDA employs, so I'm going to write this in OpenCL syntax instead, but the principles should cleanly and directly translate over to CUDA; someone correct me if I'm mistaken.
kernel void main_func(
global char * lines, //combined string data
global ulong * indexes, //indexes telling us the beginning and end of each string
ulong indexes_size, //number of strings being analyzed
global int * results //space to return results back to Host
) {
size_t id = get_global_id(0);//"Which String are we examining?"
if(id >= indexes_size) //Bounds Checking
return;
global char * string; //Beginning of the string
if(id == 0) //First String
string = lines;
else
string = (lines + indexes[id-1]);
global char * string_end = (lines + indexes[id]); //end of the string
for(; string != string_end; string++) {
if(*string == 'A') {
results[id] = 1; //We matched the criteria; we'll put a '1' for this string
return;
}
}
results[id] = 0; //We did not match. We'll put a '0' for this string
}
The results of this code, executed on the initial list of strings, is that for any string that contains an A, it will get a result of 1; if it does not, it gets a result of 0. The logic here should be cleanly transferable to the particular syntax that CUDA uses; let me know if it is not.
I am creating a file with some data objects inside. data object have different sizes and are something like this (very simplified):
struct Data{
uint64_t size;
char blob[MAX_SIZE];
// ... methods here:
}
At some later step, the file will be mmap() in memory,
so I want the beginning of every data objects to starts on memory address aligned by 8 bytes where uint64_t size will be stored (let's ignore endianness).
Code looks more or less to this (currently hardcoded 8 bytes):
size_t calcAlign(size_t const value, size_t const align_size){
return align_size - value % align_size;
}
template<class ITERATOR>
void process(std::ofstream &file_data, ITERATOR begin, ITERATOR end){
for(auto it = begin; it != end; ++it){
const auto &data = *it;
size_t bytesWriten = data.writeToFile(file_data);
size_t const alignToBeAdded = calcAlign(bytesWriten, 8);
if (alignToBeAdded != 8){
uint64_t const placeholder = 0;
file_data.write( (const char *) & placeholder, (std::streamsize) alignToBeAdded);
}
}
}
Is this the best way to achieve alignment inside a file?
you don't need to rely on writeToFile to return the size, you can use ofstream::tellp
const auto beginPos = file_data.tellp();
// write stuff to file
const auto alignSize = (file_data.tellp()-beginPos)%8;
if(alignSize)
file_data.write("\0\0\0\0\0\0\0\0",8-alignSize);
EDIT post OP comment:
Tested on a minimal example and it works.
#include <iostream>
#include <fstream>
int main(){
using namespace std;
ofstream file_data;
file_data.open("tempfile.dat", ios::out | ios::binary);
const auto beginPos = file_data.tellp();
file_data.write("something", 9);
const auto alignSize = (file_data.tellp() - beginPos) % 8;
if (alignSize)
file_data.write("\0\0\0\0\0\0\0\0", 8 - alignSize);
file_data.close();
return 0;
}
You can optimize the process by manipulating the input buffer instead of the file handling. Modify your Data struct so the code that fills the buffer takes care of the alignment.
struct Data{
uint64_t size;
char blob[MAX_SIZE];
// ... other methods here
// Ensure buffer alignment
static_assert(MAX_SIZE % 8 != 0, "blob size must be aligned to 8 bytes to avoid Buffer Overflow.");
uint64_t Fill(const char* data, uint64_t dataLength) {
// Validations...
memcpy(this->blob, data, dataLength);
this->size = dataLength;
const auto paddingLen = calcAlign(dataLength, 8) % 8;
if (padding > 0) {
memset(this->blob + dataLength, 0, paddingLen);
}
// Return the aligned size
return dataLength + paddingLen;
}
};
Now when you pass the data to the "process" function simply use the size returned from Fill, which ensures 8 byte alignment.
This way you still takes care of the alignment manually but you don't have to write twice to the file.
note: This code assumes you use Data also as the input buffer. You should use the same principals if your code uses some another object to hold the buffer before it is written to the file.
If you can use POSIX, see also pwrite
I am trying to read simple BMP file and without performing any operations I am writing it back again to file.
I don't know where the mistake is in reading the file or writing it back.
I have added padding while reading as well as while writing
-- File-Read --.
std::vector<char> tempImageData;
/*tempImageData.resize(m_bmpInfo->imagesize);
file.seekg(m_bmpHeader->dataoffset);
file.read(&tempImageData[0], m_bmpInfo->imagesize);
file.close();*/
tempImageData.resize(m_bmpInfo->imagesize);
int padding = 0;
while (((m_bmpInfo->width*3+padding) % 4) != 0 )
padding++;
for(unsigned int i = 0 ; i < m_bmpInfo->height ; i++)
{
file.seekg(m_bmpHeader->dataoffset + i*(m_bmpInfo->width*3 + padding));
file.read(&tempImageData[i*m_bmpInfo->width*3], i*m_bmpInfo->width*3);
}
file.close();
//bitmaps are stored as BGR -- lets convert to RGB
assert(m_bmpInfo->imagesize % 3 == 0);
for (auto i = tempImageData.begin(); i != tempImageData.end(); i+=3)
{
m_data_red.push_back(*(i+2));
m_data_green.push_back(*(i+1));
m_data_blue.push_back(*(i+0));
}
-- write code
file.write(reinterpret_cast<const char*>(m_bmpHeader), sizeof(BITMAPFILEHEADER));
file.write(reinterpret_cast<const char*>(m_bmpInfo), sizeof(BITMAPINFOHEADER));
// this is wrong.. format asks for bgr.. we are putting all r, all g, all b
std::vector<char> img;
img.reserve(m_data_red.size() + m_data_green.size() + m_data_blue.size());
for(unsigned int i = 0 ; i < m_data_red.size() ; i++)
{
img.push_back(m_data_blue[i]);
img.push_back(m_data_green[i]);
img.push_back(m_data_red[i]);
}
char bmppad[3] = {0};
for(unsigned int i = 0 ; i < m_bmpInfo->height ; i++)
{
// maybe something is wrong
file.write(reinterpret_cast<const char*>(&img[i*m_bmpInfo->width*3]), m_bmpInfo->width * 3 * sizeof(unsigned char));
file.write(bmppad, 1 * ((4-(m_bmpInfo->width*3)%4)%4) * sizeof(char));
}
file.close();
But the results are weird.
Output image------Input image
As the padding is added to every row, I think you need to change this line:
file.seekg(m_bmpHeader->dataoffset + i*m_bmpInfo->width*3 + padding);
to this:
file.seekg(m_bmpHeader->dataoffset + i*(m_bmpInfo->width*3 + padding));
It also might be easier to save the calculated padding rather than calculating it in two different ways.
Edit:
Without all the code to debug through, it is a bit hard to pinpoint, but there is an error on this line:
file.read(&tempImageData[i*m_bmpInfo->width*3], i*m_bmpInfo->width*3);
you should not have the i* part in the amount you are reading. This means at row 200 you are reading 200 rows worth of data into the array, possibly overwriting the end of the array. once you are more than half way through the image, which is interesting given your output.
I am processing a binary file that is built up of events. Each event can have a variable length. Since my read buffer is a fixed size I handle things as follows:
const int bufferSize = 0x500000;
const int readSize = 0x400000;
const int eventLengthMask = 0x7FFE0000;
const int eventLengthShift = 17;
const int headerLengthMask = 0x1F000;
const int headerLengthShift = 12;
const int slotMask = 0xF0;
const int slotShift = 4;
const int channelMask = 0xF;
...
//allocate the buffer we allocate 5 MB even though we read in 4MB chunks
//to deal with unprocessed data from the end of a read
char* allocBuff = new char[bufferSize]; //inFile reads data into here
unsigned int* buff = reinterpret_cast<unsigned int*>(allocBuff); //data is interpretted from here
inFile.open(fileName.c_str(),ios_base::in | ios_base::binary);
int startPos = 0;
while(!inFile.eof())
{
int index = 0;
inFile.read(&(allocBuff[startPos]), readSize);
int size = ((readSize + startPos)>>2);
//loop to process the buffer
while (index<size)
{
unsigned int data = buff[index];
int eventLength = ((data&eventLengthMask)>>eventLengthShift);
int headerLength = ((data&headerLengthMask)>>headerLengthShift);
int slot = ((data&slotMask)>>slotShift);
int channel = data&channelMask;
//now check if the full event is in the buffer
if( (index+eventLength) > size )
{//the full event is not in the buffer
break;
}
++index;
//further processing of the event
}
//move the data at the end of the buffer to the beginning and set start position
//for the next read
for(int i = index; i<size; ++i)
{
buff[i-index] = buff[i];
}
startPos = ((size-index)<<2);
}
My question is this: Is there a better to handle having unprocessed data at the end of the buffer?
You could improve it by using a circular buffer rather than a simple array. That, or a circular iterator over the array. Then you don't need to do all that copying — the "start" of the array moves.
Other than that, no, not really.
When I encountered this problem in the past, I simply copied the
unprocessed data down, and then read from the end of it. This
is a valid solution (and by far the simplest) if the individual
elements are fairly small and the buffer is large. (On a modern
machine, "fairly small" can easily be anything up to a couple of
hundred KB.) Of course, you'll have to keep track of how much
you've copied down, to adjust the pointer and the size of the
next read.
Beyond that:
You'd be better off using std::vector<char> for the buffer.
You can't convert four bytes read from a disk into an
unsigned int just by casting its address; you have to insert
each of the bytes into the unsigned int where it belongs.
And finally: you don't check that the read has succeeded
before processing the data. Using unbuffered input with an
istream is a bit tricky: your loop should probably be
something like
while ( inFile.read( addr, len ) || inFile.gcount() != 0 )....
We want to get the line/column of an xpath query result in pugixml :
pugi::xpath_query query_child(query_str);
std::string value = Convert::toString(query_child.evaluate_string(root_node));
We can retrieve the offset, but not the line/column :
unsigned int = query_child.result().offset;
If we re-parse the file we can convert offset => (line, column), but it's not efficient.
Is there an efficient method to achieve this ?
result().offset is the last parsed offset in the query string; it will be equal to 0 if the query got parsed successfully; so this is not the offset in XML file.
For XPath queries that return strings the concept of 'offset in XML file' is not defined - i.e. what would you expect for concat("a", "b") query?
For XPath queries that return nodes, you can get the offset of node data in file. Unfortunately, due to parsing performance and memory consumption reasons, this information can't be obtained without reparsing. There is a task in the TODO list to make it easier (i.e. with couple of lines of code), but it's going to take a while.
So, assuming you want to find the offset of node that is a result of XPath query, the only way is to get XPath query result as a node set (query.evaluate_node_set or node.select_single_node/select_nodes), get the offset (node.offset_debug()) and convert it to line/column manually.
You can prepare a data structure for offset -> line/column conversion once, and then use it multiple times; for example, the following code should work:
#include <vector>
#include <algorithm>
#include <cassert>
#include <cstdio>
typedef std::vector<ptrdiff_t> offset_data_t;
bool build_offset_data(offset_data_t& result, const char* file)
{
FILE* f = fopen(file, "rb");
if (!f) return false;
ptrdiff_t offset = 0;
char buffer[1024];
size_t size;
while ((size = fread(buffer, 1, sizeof(buffer), f)) > 0)
{
for (size_t i = 0; i < size; ++i)
if (buffer[i] == '\n')
result.push_back(offset + i);
offset += size;
}
fclose(f);
return true;
}
std::pair<int, int> get_location(const offset_data_t& data, ptrdiff_t offset)
{
offset_data_t::const_iterator it = std::lower_bound(data.begin(), data.end(), offset);
size_t index = it - data.begin();
return std::make_pair(1 + index, index == 0 ? offset : offset - data[index - 1]);
}
This does not handle Mac-style linebreaks and does not handle tabs; this can be trivially added, of course.