I've created RLE encoding function, which encodes sequences like "A1A1B7B7B7B7" to such strings: "#A12#B74".
void encode(const char *input_path, const char *output_path)
{ // Begin of SBDLib::SBIMask::encode
std::fstream input(input_path, std::ios_base::in | std::ios_base::binary);
std::fstream output(output_path, std::ios_base::out | std::ios_base::binary);
int size = 0; // Set size variable
input.seekg(0, std::ios::end); // Move to EOF
size = input.tellg(); // Tell position
input.seekg(0); // Move to the beginning
int i = 1; // Create encoding counter
int counter = 0; // Create color counter
int cbyte1, cbyte2; // Create current color bytes
int pbyte1 = 0x0; int pbyte2 = 0x0; // Create previous color bytes
while (((cbyte1 = input.get()) != EOF && (cbyte2 = input.get()) != EOF)
|| input.tellg() >= size)
{ // Begin of while
// If current bytes are not equal to previous bytes
// or cursor is at the end of the input file, write
// binary data to file; don't do it if previous bytes
// were not set from 0x0 to any other integer.
if (((cbyte1 != pbyte1 || cbyte2 != pbyte2)
|| (input.tellg() == size))
&& (pbyte1 != 0x0 && pbyte2 != 0x0))
{ // Begin of main if
output << SEPARATOR; // Write separator to file
output.write(reinterpret_cast<const char*>(&pbyte1), 1);
output.write(reinterpret_cast<const char*>(&pbyte2), 1);
output << std::hex << counter; // Write separator, bytes and count
counter = 1; // Reset counter
} // End of main if
else counter++; // Increment counter
pbyte1 = cbyte1; pbyte2 = cbyte2; // Set previous bytes
} // End of main while
} // End of encode
However, function is not as fast as I need. This is the second version of function, I've already improved it to make it faster, but it is still too slow. Do you have any ideas how to improve? I'm lack of ideas.
Depending on the size of data you are reading from files it might be a good idea not to read single charcaters but a chunk of data from your input file at once. This might be a lot faster than accessing the input file on the disk for each input character.
Pseudo code example:
char dataArray[100];
while( !EOF )
{
input.get( &dataArray[0], 100 ); // read a block of data not a single charater
process( dataArray ); // process one line
}
Related
Can somebody tell if this is correct?
I try to read from binary file line by line and store it in a buffer? does the new line that it stores in the buffer delete the previous stored line?
ifs.open(filename, std::ios::binary);
for (std::string line; getline(ifs, line,' '); )
{
ifs.read(reinterpret_cast<char *> (buffer), 3*h*w);
}
For some reason you are mixing getline which is text-based reading, and read(), which is binary reading.
Also, it's completely unclear, what is buffer and what's it size. So, here is a simple example for you to start:
ifs.open(filename, std::ios::binary); // assume, that everything is OK
constexpr size_t bufSize = 256;
char buffer[bufSize];
size_t charsRead{ 0 };
do {
charsRead = ifs.read(buffer, bufSize)
// check if charsRead == 0, if it's ok
// do something with filled buffer.
// Note, that last read will have less than bufSize characters,
// So, query charsRead each time.
} while (charsRead == bufSize);
I am trying to read a binary file's data sadly opening in C++ is a lot different than in python for these things as they have byte mode. It seems C++ does not have that.
for (auto p = directory_iterator(path); p != directory_iterator(); p++) {
if (!is_directory(p->path()))
byte tmpdata;
std::ifstream tmpreader;
tmpreader.open(desfile, std::ios_base::binary);
int currentByte = tmpreader.get();
while (currentByte >= 0)
{
//std::cout << "Does this get Called?" << std::endl;
int currentByte = tmpreader.get();
tmpdata = currentByte;
}
tmpreader.close()
}
else
{
continue;
}
I want basically a clone of Python's methods of opening a file in 'rb' mode. To have to actual byte data of all of the contents (which is not readable as it has nonprintable chars even for C++. Most of which probably cant be converted to signed chars just because it contains zlib compressed data that I need to feed in my DLL to decompress it all.
I do know that in Python I can do something like this:
file_object = open('[file here]', 'rb')
turns out that replacing the C++ Code above with this helps. However fopen is depreciated but I dont care.
What the Code above did not do was work because I was not reading from the buffer data. I did realize later that fopen, fseek, fread, and fclose was the functions I needed for read bytes mode ('rb').
for (auto p = directory_iterator(path); p != directory_iterator(); p++) {
if (!is_directory(p->path()))
{
std::string desfile = p->path().filename().string();
byte tmpdata;
unsigned char* data2;
FILE *fp = fopen("data.d", "rb");
fseek(fp, 0, SEEK_END); // GO TO END OF FILE
size_t size = ftell(fp);
fseek(fp, 0, SEEK_SET); // GO BACK TO START
data2 = new unsigned char[size];
tmpdata = fread(data2, 1, size, fp);
fclose(fp);
}
else
{
continue;
}
int currentByte = tmpreader.get();
while (currentByte >= 0)
{
//std::cout << "Does this get Called?" << std::endl;
int currentByte = tmpreader.get();
//^ here!
You are declaring a second variable hiding the outer one. However, this inner one is only valid within the while loop's body, so the while condition checks the outer variable which is not modified any more. Rather do it this way:
int currentByte;
while ((currentByte = tmpreader.get()) >= 0)
{
I am Trying to read 64000 bytes from file in binary mode in buffer at one time till end of the file. My problem is tellg() returns position in hexadecimal value, How do I make it return decimal value?
because my if conditions are not working, it is reading more than 64000 and when I am relocating my pos and size_stream(size_stream = size_stream - 63999;
pos = pos + 63999;), it is pointing to wrong positions each time.
How do I read 64000 bytes from file into buffer in binary mode at once till the end of file?
Any help would be appreciated
std::fstream fin(file, std::ios::in | std::ios::binary | std::ios::ate);
if (fin.good())
{
fin.seekg(0, fin.end);
int size_stream = (unsigned int)fin.tellg(); fin.seekg(0, fin.beg);
int pos = (unsigned int)fin.tellg();
//........................<sending the file in blocks
while (true)
{
if (size_stream > 64000)
{
fin.read(buf, 63999);
buf[64000] = '\0';
CString strText(buf);
SendFileContent(userKey,
(LPCTSTR)strText);
size_stream = size_stream - 63999;
pos = pos + 63999;
fin.seekg(pos, std::ios::beg);
}
else
{
fin.read(buf, size_stream);
buf[size_stream] = '\0';
CString strText(buf);
SendFileContent(userKey,
(LPCTSTR)strText); break;
}
}
My problem is tellg() returns position in hexadecimal value
No, it doesn't. It returns an integer value. You can display the value in hex, but it is not returned in hex.
when I am relocating my pos and size_stream(size_stream = size_stream - 63999; pos = pos + 63999;), it is pointing to wrong positions each time.
You shouldn't be seeking in the first place. After performing a read, leave the file position where it is. The next read will pick up where the previous read left off.
How do I read 64000 bytes from file into buffer in binary mode at once till the end of file?
Do something more like this instead:
std::ifstream fin(file, std::ios::binary);
if (fin)
{
unsigned char buf[64000];
std::streamsize numRead;
do
{
numRead = fin.readsome(buf, 64000);
if ((!fin) || (numRead < 1)) break;
// DO NOT send binary data using `LPTSTR` string conversions.
// Binary data needs to be sent *as-is* instead.
//
SendFileContent(userKey, buf, numRead);
}
while (true);
}
Or this:
std::ifstream fin(file, std::ios::binary);
if (fin)
{
unsigned char buf[64000];
std::streamsize numRead;
do
{
if (!fin.read(buf, 64000))
{
if (!fin.eof()) break;
}
numRead = fin.gcount();
if (numRead < 1) break;
// DO NOT send binary data using `LPTSTR` string conversions.
// Binary data needs to be sent *as-is* instead.
//
SendFileContent(userKey, buf, numRead);
}
while (true);
}
The following code works, but is about twice as inefficient compared to when I use a (linux) pipe that gives unzipped data to the (modified) program. I need a steady stream within the program which I can keep splitting by \n. Is there a way to do this using a (string?) stream or any other trick?
int main(int argc, char *argv[]) {
static const int unzipBufferSize = 8192;
long long int counter = 0;
int i = 0, p = 0, n = 0;
int offset = 0;
char *end = NULL;
char *begin = NULL;
unsigned char unzipBuffer[unzipBufferSize];
unsigned int unzippedBytes;
char * inFileName = argv[1];
char buffer[200];
buffer[0] = '\0';
bool breaker = false;
char pch[4][200];
Read *aRead = new Read;
gzFile inFileZ;
inFileZ = gzopen(inFileName, "rb");
while (true) {
unzippedBytes = gzread(inFileZ, unzipBuffer, unzipBufferSize);
if (unzippedBytes > 0) {
unzipBuffer[unzippedBytes] = '\0'; //put a 0-char after the total buffer
begin = (char*) &unzipBuffer[0]; // point to the address of the first char
do {
end = strchr(begin,(int)'\n'); //find the end of line
if (end != NULL) *(end) = '\0'; // put 0-char to use it as a c-string
pch[p][0] = '\0'; \\ put a 0-char to be able to strcat
if (strlen(buffer) > 0) { // if buffer from previous iteration contains something
strcat(pch[p], buffer); // cat it to the p-th pch
buffer[0] = '\0'; \\ set buffer to null-string or ""
}
strcat(pch[p], begin); // put begin (or rest of line in case there was a buffer into p-th pch
if (end != NULL) { // see if it already points to something
begin = end+1; // if so, advance begin to old end+1
p++;
}
if(p>3) { // a 'read' contains 4 lines, so if p>3
strcat(aRead->bases,pch[1]); // we use line 2 and 4 as
strcat(aRead->scores,pch[3]); // bases and scores
//do things with the reads
aRead->bases[0] = '\0'; //put them back to 0-char
aRead->scores[0] = '\0';
p = 0; // start counting next 4 lines
}
}
while (end != NULL );
strcat(buffer,pch[p]); //move the left-over of unzipBuffer to buffer
}
else {
break; // when no unzippedBytes, exit the loop
}
}
Your main problem is probably the standard C string library.
With using strxxx() funcions, you are iterating through the complete buffer multiple times each call, first for strchr(), then for strlen(), then for each of the strcat() calls.
Using the standard library is a nice thing, but here, it's just plain inefficient.
Try if you could come up with something simpler that touches each character only once like (code just to show the principle, do not expect it working):
do
{
do
{
*tp++ = *sp++;
} while (sp < buffer_end && *sp != '\n');
/* new line, do whatever it requires */
...
/* reset tp to beginning of buffer */
} while (sp < buffer_end);
I am trying to get this to work, but all it does is giving a Segmentation Fault at runtime:
do {
unzippedBytes = gzread(inFileZ, unzipBuffer, unzipBufferSize);
if (unzippedBytes > 0) {
while (*unzipBuffer < unzippedBytes) {
*pch = *unzipBuffer++;
cout << pch;
i++;
}
i=0;
}
else break;
} while (true);
What am I doing wrong here?
I'm making a enciphering/deciphering program using XTEA algorithm. The encipher/decipher functions work fine, but when I encipher a file and then decipher it, I get some extra characters in the end of the file:
--- Original file ---
QwertY
--- Encrypted file ---
»¦æŸS#±
--- Deciphered from encrypted ---
QwertY ß*tÞÇ
I have no idea why the " ß*tÞÇ" appears in the end.
I will post some of my code, but not all of it since it would be too long. The encipher/decipher function takes 64 bits data and 128 bits key, and encipher/decipher the data to the same block size, which is again 64 bits (similar functions here). It can then be written to a new file.
long data[2]; // 64bits
ZeroMemory(data, sizeof(long)*2);
char password[16];
ZeroMemory(password, sizeof(char)*16);
long *key;
if(argc > 1)
{
string originalpath = argv[1];
string finalpath;
string eextension = "XTEA";
string extension = GetFileExtension(originalpath);
bool encipherfile = 1;
if(extension.compare(eextension) == 0) // If extensions are equal, dont encipher file
{
encipherfile = 0;
finalpath = originalpath;
finalpath.erase(finalpath.length()-5, finalpath.length());
}
ifstream in(originalpath, ios::binary);
ofstream out(finalpath, ios::binary);
cout << "Password:" << endl;
cin.get(password,sizeof(password));
key = reinterpret_cast<long *>(password);
while(!in.eof())
{
ZeroMemory(data, sizeof(long)*2);
in.read(reinterpret_cast<char*>(&data), sizeof(long)*2); // Read 64bits from file
if(encipherfile == 1)
{
encipher(data, key);
out.write(reinterpret_cast<char*>(&data), sizeof(data));
continue;
}
if(encipherfile == 0)
{
decipher(data, key);
out.write(reinterpret_cast<char*>(&data), sizeof(data));
}
}
Check for eof immediately after your read, and if you get eof break out of the loop.
If you may have partial reads (i.e. it is possible to read fewer than all of the requested bytes), then you need also to call gcount to find out how many bytes you actually read, thus:
cin.read( ... )
if( cin.eof() )
{
streamsize bytesRead = cin.gcount();
if( bytesRead > 0 )
// process those bytes
break;
}