Writing chars as a byte in C++ - c++

I'm writing a Huffman encoding program in C++, and am using this website as a reference:
http://algs4.cs.princeton.edu/55compression/Huffman.java.html
I'm now at the writeTrie method, and here is my version:
// write bitstring-encoded tree to standard output
void writeTree(struct node *tempnode){
if(isLeaf(*tempnode)){
tempfile << "1";
fprintf(stderr, "writing 1 to file\n");
tempfile << tempnode->ch;
//tempfile.write(&tempnode->ch,1);
return;
}
else{
tempfile << "0";
fprintf(stderr, "writing 0 to file\n");
writeTree(tempnode->left);
writeTree(tempnode->right);
}
}
Look at the line commented - let's say I'm writing to a text file, but I want to write the bytes that make up the char at tempnode->ch (which is an unsigned char, btw). Any suggestions for how to go about doing this? The line commented gives an invalid conversion error from unsigned char* to const char*.
Thanks in advance!
EDIT: To clarify: For instance, I'd like my final text file to be in binary -- 1's and 0's only. If you look at the header of the link I provided, they give an example of "ABRACADABRA!" and the resulting compression. I'd like to take the char (such as in the example above 'A'), use it's unsigned int number (A='65'), and write 65 in binary, as a byte.

A char is identical to a byte. The preceding line tempfile << tempnode->ch; already does exactly what you seem to want.
There is no overload of write for unsigned char, but if you want, you can do
tempfile.write(reinterpret_cast< char * >( &tempnode->ch ),1);
This is rather ugly, but it does exactly the same thing as tempfile << tempnode->ch.
EDIT: Oh, you want to write a sequence of 1 and 0 characters for the bits in the byte. C++ has an obscure trick for that:
#include <bitset>
tempfile << std::bitset< 8 >( tempnode->ch );

Related

how to copy binary data of a file

Basically, I am trying to read binary data of a file by using fread() and print it on screen using printf(), now, the problem is that when it prints it out, it actually don't show it as binary 1 and 0 but printing symbols and stuff which I don't know what they are.
This is how I am doing it:
#include <stdio.h>
#include <windows.h>
int main(){
size_t sizeForB, sizeForT;
char ForBinary[BUFSIZ], ForText[BUFSIZ];
char RFB [] = "C:\\users\\(Unknown)\\Desktop\\hi.mp4" ; // Step 1
FILE *ReadBFrom = fopen(RFB , "rb" );
if(ReadBFrom == NULL){
printf("Following File were Not found: %s", RFB);
return -1;
} else {
printf("Following File were found: %s\n", RFB); // Step 2
while(sizeForB = fread(ForBinary, 1, BUFSIZ, ReadBFrom)){ // Step 1
printf("%s", ForBinary);
}
fclose(ReadBFrom);
}
return 0;
}
I would really appreciate if someone could help me out to read the actual binary data of a file as binary (0,1).
while(sizeForB = fread(ForBinary, 1, BUFSIZ, ReadBFrom)){
printf("%s", ForBinary); }
This is wrong on many levels. First of all you said it is binary file - which means there might not be text in it in the first place, and you are using %s format specifier which is used to print null terminated strings. Again since this is binary file, and there might not be text in it in the first place, %s is the wrong format specifier to use. And even if there was text inside this file, you are not sure that fread would read a "complete" null terminated string that you could pass to printf with format specifier %s.
What you may want to do is, read each byte form a file, convert it to a binary representation (google how to convert integer to binary string say, e.g., here), and print binary representation for each that byte.
Basically pseudocode:
foreach (byte b in FileContents)
{
string s = convertToBinary(b);
println(s);
}
How to view files in binary in the terminal?
Either
"hexdump -C yourfile.bin" perhaps, unless you want to edit it of course. Most linux distros have hexdump by default (but obviously not all).
or
xxd -b file
To simply read a file and print it in binary (ones and zeros), read it one char at a time. Then for each bit, print a '0' or '1'. Can print Most or Least significant bit first. Suggest MSb.
if (ReadBFrom) {
int ch;
while ((ch = fgetc(ReadBFrom)) != EOF) {
unsigned mask = 1u << (CHAR_BIT - 1); // CHAR_BIT is typically 8
while (mask) {
putchar(mask & ch ? '1' : '0');
mask >>= 1;
}
}
fclose(ReadBFrom);
}

Writing/Reading strings in binary file-C++

I searched for a similar post but couldn't find something that could help me.
I' m trying to first write the integer containing the string length of a String and then write the string in the binary file.
However when i read data from the binary file i read integers with value=0 and my strings contain junk.
for example when i type 'asdfgh' for username and 'qwerty100' for password
i get 0,0 for both string lengths and then i read junk from the file.
This is how i write data to the file.
std::fstream file;
file.open("filename",std::ios::out | std::ios::binary | std::ios::trunc );
Account x;
x.createAccount();
int usernameLength= x.getusername().size()+1; //+1 for null terminator
int passwordLength=x.getpassword().size()+1;
file.write(reinterpret_cast<const char *>(&usernameLength),sizeof(int));
file.write(x.getusername().c_str(),usernameLength);
file.write(reinterpret_cast<const char *>(&passwordLength),sizeof(int));
file.write(x.getpassword().c_str(),passwordLength);
file.close();
Right below in the same function i read the data
file.open("filename",std::ios::binary | std::ios::in );
char username[51];
char password[51];
char intBuffer[4];
file.read(intBuffer,sizeof(int));
file.read(username,atoi(intBuffer));
std::cout << atoi(intBuffer) << std::endl;
file.read(intBuffer,sizeof(int));
std::cout << atoi(intBuffer) << std::endl;
file.read(password,atoi(intBuffer));
std::cout << username << std::endl;
std::cout << password << std::endl;
file.close();
When reading the data back in you should do something like the following:
int result;
file.read(reinterpret_cast<char*>(&result), sizeof(int));
This will read the bytes straight into the memory of result with no implicit conversion to int. This will restore the exact binary pattern written to the file in the first place and thus your original int value.
file.write(reinterpret_cast<const char *>(&usernameLength),sizeof(int));
This writes sizeof(int) bytes from the &usernameLength; which is binary representation of integer and depends on the computer architecture (little endian vs big endian).
atoi(intBuffer))
This converts ascii to integer and expect the input to contain character representation. e.g. intBuffer = { '1', '2' } - would return 12.
You can try to read it in the same way you have written -
*(reinterpret_cast<int *>(&intBuffer))
But it can potentially lead to unaligned memory access issues. Better use serialization formats like JSON, which would be helpful to read it in cross-platform ways.

C++ Character Encoding

This is my C++ Code where i'm trying to encode the received file path to utf-8.
#include <string>
#include <iostream>
using namespace std;
void latin1_to_utf8(unsigned char *in, unsigned char *out);
string encodeToUTF8(string _strToEncode);
int main(int argc,char* argv[])
{
// Code to receive fileName from Sockets
cout << "recvd ::: " << recvdFName << "\n";
string encStr = encodeToUTF8(recvdFName);
cout << "encoded :::" << encStr << "\n";
}
void latin1_to_utf8(unsigned char *in, unsigned char *out)
{
while (*in)
{
if (*in<128)
{
*out++=*in++;
}
else
{
*out++=0xc2+(*in>0xbf);
*out++=(*in++&0x3f)+0x80;
}
}
*out = '\0';
}
string encodeToUTF8(string _strToEncode)
{
int len= _strToEncode.length();
unsigned char* inpChar = new unsigned char[len+1];
unsigned char* outChar = new unsigned char[2*(len+1)];
memset(inpChar,'\0',len+1);
memset(outChar,'\0',2*(len+1));
memcpy(inpChar,_strToEncode.c_str(),len);
latin1_to_utf8(inpChar,outChar);
string _toRet = (const char*)(outChar);
delete[] inpChar;
delete[] outChar;
return _toRet;
}
And the OutPut is
recvd ::: /Users/zeus/ÄÈÊÑ.txt
encoded ::: /Users/zeus/AÌEÌEÌNÌ.txt
The above function latin1_to_utf8 is provided as an solution Convert ISO-8859-1 strings to UTF-8 in C/C++ , Looks like it works.[Answer is accepted]. So i think i must be making some mistake, but i'm not able to identify what it is. Can someone help me out with this , Please.
I have first posted this question in Codereview,but i'm not getting any answers out there. So sorry for the duplication.
Do you use any platform or you build it on the top of std? I am sure that many people use such convertions and therefore there is library. I strongly recommend you to use the libraray, because the library is tested and usually the best know way is used.
A library which I found doing this is boost locale
This is standard. If you use QT I will recommend you to use the QT conversion library for this (it is platform independant)
QT
In case you want to do it yourself (you want to see how it works or for any other reason)
1. Make sure that you allocate memory ! - this is very important in C,C++ . Since you use iostream use new to allocate memory and delete to release it (this is also important C++ won't figure out when to release it for sure. This is developer's job here - C++ is hardcore :D )
2. Check that you allocate the right size of memory. I expect unicode to be larger memory (it encodes more symbols and sometimes uses large numbers).
3. As already mentioned above read from somewhere (terminal or file) but output in new file. After that when you open the file with text editor make sure you set the encoding to be utf-8 ( your text editor has to know how to interpretate the data)
I hope that helps.
You are first outputting the original Latin-1 string to a terminal expecting a certain encoding, probably Latin-1. You then transcode to UTF-8 and output it to the same terminal, which interprets it differently. Classic mojibake. Try the following with the output instead:
for(size_t i=0, len=strlen(outChar); i!=len; ++i)
std::cout << static_cast<unsigned>(static_cast<unsigned char>(outChar[i])) << ' ';
Note that the two casts are to first get the unsigned byte value and then to get the unsigned value to keep the stream from treating it as a char. Note that your char might already be unsigned, but that's compile-dependent.

Need Convert Binary file to Txt file

I have a dat(binary) file but i wish to convert this file into Ascii (txt) file using c++ but i am very new in c++ programming.so I juct opend my 2 files:myBinaryfile and myTxtFile but I don't know how to read data from that dat file and then how to write those data into new txt file.so i want to write a c+ codes that takes in an input containing binary dat file, and converts it to Ascii txt in an output file. if this possible please help to write this codes. thanks
Sorry for asking same question again but still I didn’t solve my problem, I will explain it more clearly as follows: I have a txt file called “A.txt”, so I want to convert this into binary file (B.dat) and vice verse process. Two questions:
1. how to convert “A.txt” into “B.dat” in c++
2. how to convert “B.dat” into “C.txt” in c++ (need convert result of the 1st output again into new ascii file)
my text file is like (no header):
1st line: 1234.123 543.213 67543.210 1234.67 12.000
2nd line: 4234.423 843.200 60543.232 5634.60 72.012
it have more than 1000 lines in similar style (5 columns per one line).
Since I don’t have experiences in c++, I am struggle here, so need your helps. Many Thanks
All files are just a stream of bytes. You can open files in binary mode, or text mode. The later simply means that it may have extra newline handling.
If you want your text file to contain only safe human readable characters you could do something like base64 encode your binary data before saving it in the text file.
Very easy:
Create target or destination file
(a.k.a. open).
Open source file in binary mode,
which prevents OS from translating
the content.
Read an octet (byte) from source
file; unsigned char is a good
variable type for this.
Write the octet to the destination
using your favorite conversion, hex,
decimal, etc.
Repeat at 3 until the read fails.
Close all files.
Research these keywords: ifstream, ofstream, hex modifier, dec modifier, istream::read, ostream::write.
There are utilities and applications that already perform this operation. On the *nix and Cygwin side try od, *octal dump` and pipe the contents to a file.
There is the debug utility on MS-DOS system.
A popular format is:
AAAAAA bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb cccccccccccccccc
where:
AAAAAA -- Offset from beginning of file in hexadecimal or decimal.
bb -- Hex value of byte using ASCII text.
c -- Character representation of byte, '.' if the value is not printable.
Please edit your post to provide more details, including an example layout for the target file.
Edit:
A complex example (not tested):
#include <iostream>
#include <fstream>
#include <cstdio>
#include <cstdlib>
using namespace std;
const unsigned int READ_BUFFER_SIZE = 1024 * 1024;
const unsigned int WRITE_BUFFER_SIZE = 2 * READ_BUFFER_SIZE;
unsigned char read_buffer[READ_BUFFER_SIZE];
unsigned char write_buffer[WRITE_BUFFER_SIZE];
int main(void)
{
int program_status = EXIT_FAILURE;
static const char hex_chars[] = "0123456789ABCDEF";
do
{
ifstream srce_file("binary.dat", ios::binary);
if (!srce_file)
{
cerr << "Error opening input file." << endl;
break;
}
ofstream dest_file("binary.txt");
if (!dest_file)
{
cerr << "Error creating output file." << endl;
}
// While no read errors from reading a block of source data:
while (srce_file.read(&read_buffer[0], READ_BUFFER_SIZE))
{
// Get the number of bytes actually read.
const unsigned int bytes_read = srce_file.gcount();
// Define the index and byte variables outside
// of the loop to maybe save some execution time.
unsigned int i = 0;
unsigned char byte = 0;
// For each byte that was read:
for (i = 0; i < bytes_read; ++i)
{
// Get source, binary value.
byte = read_buffer[i];
// Convert the Most Significant nibble to an
// ASCII character using a lookup table.
// Write the character into the output buffer.
write_buffer[i * 2 + 0] = hex_chars[(byte >> 8)];
// Convert the Least Significant nibble to an
// ASCII character and put into output buffer.
write_buffer[i * 2 + 1] = hex_chars[byte & 0x0f];
}
// Write the output buffer to the output, text, file.
dest_file.write(&write_buffer[0], 2 * bytes_read);
// Flush the contents of the stream buffer as a precaution.
dest_file.flush();
}
dest_file.flush();
dest_file.close();
srce_file.close();
program_status = EXIT_SUCCESS;
} while (false);
return program_status;
}
The above program reads 1MB chunks from the binary file, converts to ASCII hex into an output buffer, then writes the chunk to the text file.
I think you are misunderstanding that the difference between a binary file and a test file is in the interpretation of the contents.

C++ Text File Reading

So I need a little help, I've currently got a text file with following data in it:
myfile.txt
-----------
b801000000
What I want to do is read that b801 etc.. data as bits so I could get values for
0xb8 0x01 0x00 0x00 0x00.
Current I'm reading that line into a unsigned string using the following typedef.
typedef std::basic_string <unsigned char> ustring;
ustring blah = reinterpret_cast<const unsigned char*>(buffer[1].c_str());
Where I keep falling down is trying to now get each char {'b', '8' etc...} to really be { '0xb8', '0x01' etc...}
Any help is appreciated.
Thanks.
I see two ways:
Open the file as std::ios::binary and use std::ifstream::operator>> to extract hexadecimal double bytes after using the flag std::ios_base::hex and extracting to a type that is two bytes large (like stdint.h's (C++0x/C99) uint16_t or equivalent). See #neuro's comment to your question for an example using std::stringstreams. std::ifstream would work nearly identically.
Access the stream iterators directly and perform the conversion manually. Harder and more error-prone, not necessarily faster either, but still quite possible.
strtol does string (needs a nullterminated C string) to int with a specified base
Kind of a dirty way to do it:
#include <stdio.h>
int main ()
{
int num;
char* value = "b801000000";
while (*value) {
sscanf (value, "%2x", &num);
printf ("New number: %d\n", num);
value += 2;
}
return 0;
}
Running this, I get:
New number: 184
New number: 1
New number: 0
New number: 0
New number: 0