As per xsd, the supported binary types are hexbinary and base64 encoded binary data. http://www.w3schools.com/schema/schema_dtypes_misc.asp
My intention is to read raw byte contents from the memory and serialize it to the xml file. Hence, what data type above would describe the raw byte contents OR do i have to make sure that the raw byte contents are converted to hexdecimal to adhere to one of the 2 data types described above ?
You do have to convert the raw binary to hexadecimal (or base64) representation. Eg, if the value of the byte is 255 (in decimal), it's hex representation (as a string) would be "ff".
The (conventional) type to use for storing the raw input is unsigned char, so you can get the ranges 0-255 easily byte by byte, but for each byte of that array, you need two bytes in a signed char (or std::string) type to store the representation, and that is what you use in the XML.
Your framework probably has a method for converting raw bytes to Base64 or hex. If not, here's one method for hex:
#include <iostream>
#include <string>
#include <sstream>
using namespace std;
int main (void) {
ostringstream os;
os.flags(ios::hex);
unsigned char data[] = { 0, 123, 11, 255, 66, 99 };
for (int i = 0; i < 6; i++) {
if (data[i] < 16) os << '0';
os << (int)data[i] << '|';
}
string formatted(os.str());
cout << formatted << endl;
return 0;
}
Outputs: 00|7b|0b|ff|42|63|
You need to encode the raw data to one of the two data types. This is to keep some random data from messing up the XML format, for example if you had a < embedded in the data somewhere.
You can choose whichever of the two is most convenient for you. The hexadecimal type is easier to write code for but produces a larger file - the ratio of bytes out to bytes in is 2:1, where it is 4:3 for the Base64 encoding. You shouldn't need to write your own code though, Base64 conversion functions are readily available. Here's a question that has some code in the answers: How do I base64 encode (decode) in C?
As an example of how the codings differ, here's the phrase "The quick brown fox jumps over the lazy dog." encoded both ways.
Hex:
54686520717569636b2062726f776e20666f78206a756d7073206f76657220746865206c617a7920646f672e
Base64:
VGhlIHF1aWNrIGJyb3duIGZveCBqdW1wcyBvdmVyIHRoZSBsYXp5IGRvZy4=
Related
I have a question about utf16_t character interaction and SHA-256 generation with OpenSSL.
The thing is, I'm currently writing code that should deal with password hashing. I've generated a 256-bit hash, and I want to throw it into the database in a UTF-16 encoded character field. In my C++ code, I use char16_t to store such data. However, there is a problem. utf16_t can have more than 16 bytes, depending on the machine it ends up on. And if I use memcpy() to copy bytes from my SHA-256 hash, it may turn out to be a mess on some machines.
What should I do in this situation? Read bytes differently, store hashes in the database differently, maybe something else?
SHA256 generates 256 essentially random bits (32 bytes) of data. It will not always generate valid UTF-16 data.
You need to somehow encode the 32 bytes into more-than-32 utf-16 bytes to store in your database. Or you can convert the database field to a proper 256-bit binary type
One of the easier-to-implement ways to store it in your DB as a string would be to map each byte to a character 1-to-1 (and store 32 bytes of data with 32 bytes of zeroes in between):
unsigned char sha256_hash[256/8];
get_hash(sha256_hash);
// encoding
char16_t db_data[256/8];
for (int i = 0; i < std::size(db_data); ++i) {
db_data[i] = char16_t(sha256_hash[i]);
}
write_to_db(db_data);
char16_t db_data[256/8];
read_from_db(db_data);
// decoding
unsigned char sha256_hash[256/8];
for (int i = 0; i < std::size(sha256_hash); ++i) {
assert((std::uint16_t) db_data[i] <= 0xFF);
sha256_hash[i] = (unsigned char) db_data[i];
}
Be careful if you are using null-terminated strings though. You will need an extra character for the null terminator and map the 0 byte to something else (0x100 would be a good choice).
But if you have additional requirements (like it being readable characters), you might consider base64 or a hexadecimal encoding
What is the best way to convert a char array (containing bytes from a file) into an decimal representation so that it can be converted back later?
E.g "test" -> 18951210 -> "test".
EDITED
It can't be done without a bignum class, since there's more letter combinations possible than integer combinations in an unsigned long long. (unsigned long long will hold about 7-8 characters)
If you have some sort of bignum class:
biguint string_to_biguint(const std::string& s) {
biguint result(0);
for(int i=0; i<s.length(); ++i) {
result *= UCHAR_MAX;
result += (unsigned char)s[i];
}
return result;
}
std::string biguint_to_string(const biguint u) {
std::string result;
do {
result.append(u % UCHAR_MAX)
u /= UCHAR_MAX;
} while (u>0);
return result;
}
Note: the string to uint conversion will lose leading NULLs, and the uint to string conversion will lose trailing NULLs.
I'm not sure what exactly you mean, but characters are stored in memory as their "representation", so you don't need to convert anything. If you still want to, you have to be more specific.
EDIT: You can
Try to read byte by byte shifting the result 8 bits left and oring it
with the next byte.
Try to use mpz_inp_raw
You can use a tree similar to Huffman compression algorithm, and then represent the path in the tree as numbers.
You'll have to keep the dictionary somewhere, but you can just create a constant dictionary that covers the whole ASCII table, since the compression is not the goal here.
There is no conversion needed. You can just use pointers.
Example:
char array[4 * NUMBER];
int *pointer;
Keep in mind that the "length" of pointer is NUMBER.
As mentioned, character strings are already ranges of bytes (and hence easily rendered as decimal numbers) to start with. Number your bytes from 000 to 255 and string them together and you've got a decimal number, for whatever that is worth. It would help if you explained exactly why you would want to be using decimal numbers, specifically, as hex would be easier.
If you care about compression of the underlying arrays forming these numbers for Unicode Strings, you might be interested in:
http://en.wikipedia.org/wiki/Standard_Compression_Scheme_for_Unicode
If you want some benefits of compression but still want fast random-access reads and writes within a "packed" number, you might find my "NSTATE" library to be interesting:
http://hostilefork.com/nstate/
For instance, if you just wanted a representation that only acommodated 26 english letters...you could store "test" in:
NstateArray<26> myString (4);
You could read and write the letters without going through a compression or decompression process, in a smaller range of numbers than a conventional string. Works with any radix.
Assuming you want to store the integers(I'm reading as ascii codes) in a string. This will add the leading zeros you will need to get it back into original string. character is a byte with a max value of 255 so it will need three digits in numeric form. It can be done without STL fairly easily too. But why not use tools you have?
#include <iostream>
#include <sstream>
using namespace std;
char array[] = "test";
int main()
{
stringstream out;
string s=array;
out.fill('0');
out.width(3);
for (int i = 0; i < s.size(); ++i)
{
out << (int)s[i];
}
cout << s << " -> " << out.str();
return 0;
}
output:
test -> 116101115116
Added:
change line to
out << (int)s[i] << ",";
output
test -> 116,101,115,116,
I'm building some code to read a RIFF wav file and I've bumped into something odd.
The first 4 bytes of the file header are the word RIFF in big-endian ascii coding:
0x5249 0x4646
I read this first element using:
char *fileID = new char[4];
filestream.read(fileID,4);
When I write this to screen the results are as expected:
std::cout << fileID << std::endl;
>> RIFF
Now, the next 4 bytes give the size of the file, but crucially they're little-endian.
So, I write a little function to flip the bytes, based on a union:
int flip4bytes(char* input){
union flip {int flip_int; char flip_char[4];};
flip.flip_char[0] = input[3];
flip.flip_char[1] = input[2];
flip.flip_char[2] = input[1];
flip.flip_char[3] = input[0];
return flip.flip_int;
}
This looks good to me, except when I call it, the value returned is totally wrong. Interestingly, the following code (where the bytes are not reversed!) works correctly:
int flip4bytes(char* input){
union flip {int flip_int; char flip_char[4];};
flip.flip_char[0] = input[0];
flip.flip_char[1] = input[1];
flip.flip_char[2] = input[2];
flip.flip_char[3] = input[3];
return flip.flip_int;
}
This has thoroughly confused me. Is the union somehow reversing the bytes for me?! If not, how are the bytes being converted to int correctly without being reversed?
I think there's some facet of endian-ness here that I'm ignorant to..
You are simply on a little-endian machine, and the "RIFF" string is just a string and thus neither little- nor big-endian, but just a sequence of chars. You don't need to reverse the bytes on a little-endian machine, but you need to when operating on a big-endian.
You need to figure of the endianess of your machine. #include <sys/param.h> will help you do that.
You could also use the fact that network byte order is big ended (if my memory serves me correctly - you need to check). In which case convert to big ended and use the ntohs function. That should work on any machine that you compile the code on.
I have a dat(binary) file but i wish to convert this file into Ascii (txt) file using c++ but i am very new in c++ programming.so I juct opend my 2 files:myBinaryfile and myTxtFile but I don't know how to read data from that dat file and then how to write those data into new txt file.so i want to write a c+ codes that takes in an input containing binary dat file, and converts it to Ascii txt in an output file. if this possible please help to write this codes. thanks
Sorry for asking same question again but still I didn’t solve my problem, I will explain it more clearly as follows: I have a txt file called “A.txt”, so I want to convert this into binary file (B.dat) and vice verse process. Two questions:
1. how to convert “A.txt” into “B.dat” in c++
2. how to convert “B.dat” into “C.txt” in c++ (need convert result of the 1st output again into new ascii file)
my text file is like (no header):
1st line: 1234.123 543.213 67543.210 1234.67 12.000
2nd line: 4234.423 843.200 60543.232 5634.60 72.012
it have more than 1000 lines in similar style (5 columns per one line).
Since I don’t have experiences in c++, I am struggle here, so need your helps. Many Thanks
All files are just a stream of bytes. You can open files in binary mode, or text mode. The later simply means that it may have extra newline handling.
If you want your text file to contain only safe human readable characters you could do something like base64 encode your binary data before saving it in the text file.
Very easy:
Create target or destination file
(a.k.a. open).
Open source file in binary mode,
which prevents OS from translating
the content.
Read an octet (byte) from source
file; unsigned char is a good
variable type for this.
Write the octet to the destination
using your favorite conversion, hex,
decimal, etc.
Repeat at 3 until the read fails.
Close all files.
Research these keywords: ifstream, ofstream, hex modifier, dec modifier, istream::read, ostream::write.
There are utilities and applications that already perform this operation. On the *nix and Cygwin side try od, *octal dump` and pipe the contents to a file.
There is the debug utility on MS-DOS system.
A popular format is:
AAAAAA bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb cccccccccccccccc
where:
AAAAAA -- Offset from beginning of file in hexadecimal or decimal.
bb -- Hex value of byte using ASCII text.
c -- Character representation of byte, '.' if the value is not printable.
Please edit your post to provide more details, including an example layout for the target file.
Edit:
A complex example (not tested):
#include <iostream>
#include <fstream>
#include <cstdio>
#include <cstdlib>
using namespace std;
const unsigned int READ_BUFFER_SIZE = 1024 * 1024;
const unsigned int WRITE_BUFFER_SIZE = 2 * READ_BUFFER_SIZE;
unsigned char read_buffer[READ_BUFFER_SIZE];
unsigned char write_buffer[WRITE_BUFFER_SIZE];
int main(void)
{
int program_status = EXIT_FAILURE;
static const char hex_chars[] = "0123456789ABCDEF";
do
{
ifstream srce_file("binary.dat", ios::binary);
if (!srce_file)
{
cerr << "Error opening input file." << endl;
break;
}
ofstream dest_file("binary.txt");
if (!dest_file)
{
cerr << "Error creating output file." << endl;
}
// While no read errors from reading a block of source data:
while (srce_file.read(&read_buffer[0], READ_BUFFER_SIZE))
{
// Get the number of bytes actually read.
const unsigned int bytes_read = srce_file.gcount();
// Define the index and byte variables outside
// of the loop to maybe save some execution time.
unsigned int i = 0;
unsigned char byte = 0;
// For each byte that was read:
for (i = 0; i < bytes_read; ++i)
{
// Get source, binary value.
byte = read_buffer[i];
// Convert the Most Significant nibble to an
// ASCII character using a lookup table.
// Write the character into the output buffer.
write_buffer[i * 2 + 0] = hex_chars[(byte >> 8)];
// Convert the Least Significant nibble to an
// ASCII character and put into output buffer.
write_buffer[i * 2 + 1] = hex_chars[byte & 0x0f];
}
// Write the output buffer to the output, text, file.
dest_file.write(&write_buffer[0], 2 * bytes_read);
// Flush the contents of the stream buffer as a precaution.
dest_file.flush();
}
dest_file.flush();
dest_file.close();
srce_file.close();
program_status = EXIT_SUCCESS;
} while (false);
return program_status;
}
The above program reads 1MB chunks from the binary file, converts to ASCII hex into an output buffer, then writes the chunk to the text file.
I think you are misunderstanding that the difference between a binary file and a test file is in the interpretation of the contents.
I am reading the string of data from the oracle database that may or may not contain the Unicode characters into a c++ program.Is there any way for checking the string extracted from the database contains an Unicode characters(UTF-8).if any Unicode characters are present they should be converted into hexadecimal format and need to displayed.
There are two aspects to this question.
Distinguish UTF-8-encoded characters from ordinary ASCII characters.
UTF-8 encodes any code point higher than 127 as a series of two or more bytes. Values at 127 and lower remain untouched. The resultant bytes from the encoding are also higher than 127, so it is sufficient to check a byte's high bit to see whether it qualifies.
Display the encoded characters in hexadecimal.
C++ has std::hex to tell streams to format numeric values in hexadecimal. You can use std::showbase to make the output look pretty. A char isn't treated as numeric, though; streams will just print the character. You'll have to force the value to another numeric type, such as int. Beware of sign-extension, though.
Here's some code to demonstrate:
#include <iostream>
void print_characters(char const* s)
{
std::cout << std::showbase << std::hex;
for (char const* pc = s; *pc; ++pc) {
if (*pc & 0x80)
std::cout << (*pc & 0xff);
else
std::cout << *pc;
std::cout << ' ';
}
std::cout << std::endl;
}
You could call it like this:
int main()
{
char const* test = "ab\xef\xbb\xbfhu";
print_characters(test);
return 0;
}
Output on Solaris 10 with Sun C++ 5.8:
$ ./a.out
a b 0xef 0xbb 0xbf h u
The code detects UTF-8-encoded characters, but it makes no effort to decode them; you didn't mention needing to do that.
I used *pc & 0xff to convert the expression to an integral type and to mask out the sign-extended bits. Without that, the output on my computer was 0xffffffbb, for instance.
I would convert the string to UTF-32 (you can use something like UTF CPP for that - it is very easy), and then loop through the resulting string, detect code points (characters) that are above 0x7F and print them as hex.