I'd like to hash a file in the same way that git hash-object does, so I can compare it to an existing hash, but using Qt and C++.
The answers to this question show how to get the same hash, but none of the examples use C++.
So far this is what we've tried:
QString fileName = entry.toObject().value( "name" ).toString();
QByteArray shaJson = entry.toObject().value( "sha" ).toString().toUtf8();
QByteArray shaFile;
QFile f( QString( "%1/%2" ).arg( QCoreApplication::applicationDirPath() ).arg( fileName ) );
if( f.open(QFile::ReadOnly ) )
{
QCryptographicHash hash(QCryptographicHash::Sha1);
hash.addData( QString( "blob " ).toUtf8() ); // start with the string "blob "
hash.addData( QString( "%1" ).arg( f.size() ).toUtf8() ); // add size in bytes of the content
hash.addData( QString( "\0" ).toUtf8() ); // null byte
hash.addData( f.readAll() ); // actual file content
shaFile = hash.result().toHex();
if( shaFile != shaJson ){
}
}
How to implement this hashing method with Qt?
Edit:
Here's an example hash output:
ccbf4f0a52fd5ac59e18448ebadf2ef37c62f54f
Computed with git hash-object from this file:
https://raw.githubusercontent.com/ilia3101/MLV-App/master/pixel_maps/80000301_1808x1007.fpm
So that's the hash we also like to compute with Qt.
The problem is that on the one hand, QString ignores \0 as termination string, on the other hand, QByteArray always appends extra \0. From Qt's docs:
Using QByteArray is much more convenient than using const char *.
Behind the scenes, it always ensures that the data is followed by a
\0 terminator, and uses implicit sharing (copy-on-write) to reduce
memory usage and avoid needless copying of data.
https://doc.qt.io/qt-5/qbytearray.html
So, every addData in your case is adding extra \0 to the data that is to be hashed. Some workaround might be the following code:
QFile file(path);
if( file.open(QFile::ReadOnly ) )
{
QCryptographicHash hash(QCryptographicHash::Sha1);
QByteArray header = QString("blob %1").arg(file.size()).toUtf8();
hash.addData(header.data(), header.size() + 1);
hash.addData(file.readAll());
shaFile = hash.result().toHex();
qDebug() << shaFile;
}
The data() of QByteArray is returning a pointer to the data stored in the byte array. The pointer can be used to access and modify the bytes that compose the array. The data is '\0'-terminated, i.e. the number of bytes in the returned character string is size() + 1 for the '\0' terminator. Therefore, we do not need add explicitly \0, QByteArray is doing that for us. We need to add +1 to the size since QByteArray returns size of an array as it would be no \0 character.
The code above generated ccbf4f0a52fd5ac59e18448ebadf2ef37c62f54f for your file, so I guess it is a correct hash.
Related
Using C++ and Qt, I need to store some raw byte data (an unsigned char array) in a QDomElement (XML node), and then recover it later so that I can compare it to the raw data that is written directly to a different binary file. During testing, I noticed my solution works ~85% of the time, but comparing the recovered data and the raw data read from file seems to fail occasionally. The code snippets below illustrate the Qt methods I am currently using. I have very little knowledge of different character encodings and what I need to look out for in that regard, so I am assuming my mistake has something to do with that.
Storing the raw data in XML:
QDomElement myElement;
unsigned char rawData[ DATA_LEN ];
foo( rawData ); // upon return, rawData now contains the data I want to store in XML
QByteArray dataByteArray( reinterpret_cast< char * >( rawData ) );
QString dataStr( dataByteArray.toBase64() );
QByteArray excluded = " /():|+,.=[]_^{}";
myElement.setAttribute( "Data", QUrl::toPercentEncoding( dataStr, excluded ) );
Recovering the data from XML and comparing to raw data read from binary file (the memcmp() occasionally fails):
unsigned char recoveredData[ DATA_LEN ];
QString dataStr = QUrl::fromPercentEncoding( stringFromXmlNode.toUtf8() );
QByteArray dataByteArray = QByteArray::fromBase64( dataStr.toAscii() );
memcpy( recoveredData, reinterpret_cast< unsigned char * >( dataByteArray.data() ), DATA_LEN );
unsigned char dataFromFile[ DATA_LEN ];
fread( dataFromFile, 1, DATA_LEN, filePtr );
if( 0 != memcmp( dataFromFile, recoveredData, DATA_LEN ) )
{
return false;
}
I am restricted to Qt 4.8, so please refrain from any Qt5-specific solutions if possible, thanks!
You state the bytes are random, so they can contain 0 bytes. Byte value 0 is string terminator in C-style strings. This line in your code initializes QByteArray from such string:
QByteArray dataByteArray( reinterpret_cast< char * >( rawData ) );
Solution is to also pass length of rawData and use this constructor.
You want to use an XML CDATA section.
Look at QDomCDATASection
https://doc.qt.io/archives/qt-4.8/qdomcdatasection.html
Why don't you use QDataStream, which can define the byte order (important if exchanging data across different platforms) and the versioning?
From the Qt 4.8 documentation page:
A data stream cooperates closely with a QIODevice. A QIODevice represents an input/output medium one can read data from and write data to. The QFile class is an example of an I/O device.
Example (write binary data to a stream):
QFile file("file.dat");
file.open(QIODevice::WriteOnly);
QDataStream out(&file); // we will serialize the data into the file
out << QString("the answer is"); // serialize a string
out << (qint32)42; // serialize an integer
Example (read binary data from a stream):
QFile file("file.dat");
file.open(QIODevice::ReadOnly);
QDataStream in(&file); // read the data serialized from the file
QString str;
qint32 a;
in >> str >> a; // extract "the answer is" and 42
Each item written to the stream is written in a predefined binary
format that varies depending on the item's type. Supported Qt types
include QBrush, QColor, QDateTime, QFont, QPixmap, QString, QVariant
and many others. For the complete list of all Qt types supporting data
streaming see Serializing Qt Data Types.
You can read/write your XML Data with QDataStream and import them in the QDomDocument structure with the QDomDocument function setContent() and toByteArray().
Starting from a QByteArray, I'd like to search "\n" char inside my QByteArray and join all the characters from the beginning up to "\n" and save them in a QString; after that, I'd pass to the following bytes up to the next "\n" and save these into a new QString
QByteArray MyArray= (all data from my previous process);
quint16 ByteArrayCount = MyArray.count(); // number of bytes composing MyArray
quint16 mycounter;
QString myString;
while (mycounter < ByteArrayCount)
{
if(MyArray[mycounter] != "\n")
myString.append(MyArray[mycounter]);
mycounter++;
}
This to append all bytes preceeding a new line; my problem is how to evaluate MyArray[counter], since I'm not able to check every byte when the counter increases.
Solution?
You could save yourself the trouble and simply:
QString s(myArray);
QStringList resultStrings = s.split('\n');
This will give you a list of strings split for every new line character, which is what you sound like you want to do.
Also, not to belabor the point, but you don't initialize your counter, and you really should ;)
Here is simple example of using function hello
QString str = "ooops\nhello mama\n daddy cool";
QByteArray bta;
bta.append(str);
for(quint16 index = bta.indexOf('\n');
index != -1;
index = bta.indexOf('\n', index+1)) {
/**
* Do something with index
**/
}
But according to your question there is not so clear when you say that you "not able to check every byte". If you know diapasons of available mem, you can use raw data with:
const char * ptr = MyArray.constData();
and use custom validators:
while(ptr){
if(valid(ptr) && ptr == '\n') {
/**
* do something ...
**/
}
ptr++;
}
ow and also in C/C++:
"\n" != 'n'
because "\n" - is const C string(char[2]) containing \n and EOF('\0')
and '\n' - is just simple C char;
In my app I read a string field from a file in local (not Unicode) charset.
The field is a 10 bytes, the remainder is filled with zeros if the string < 10 bytes.
char str ="STRING\0\0\0\0"; // that was read from file
QByteArray fieldArr(str,10); // fieldArr now is STRING\000\000\000\000
fieldArr = fieldArr.trimmed() // from some reason array still containts zeros
QTextCodec *textCodec = QTextCodec::codecForLocale();
QString field = textCodec->ToUnicode(fieldArr).trimmed(); // also not removes zeros
So my question - how can I remove trailing zeros from a string?
P.S. I see zeros in "Local and Expressions" window while debuging
I'm going to assume that str is supposed to be char const * instead of char.
Just don't go over QByteArray -- QTextCodec can handle a C string, and it ends with the first null byte:
QString field = textCodec->toUnicode(str).trimmed();
Addendum: Since the string might not be zero-terminated, adding storage for a null byte to the end seems to be impossible, and making a copy to prepare for making a copy seems wasteful, I suggest calculating the length ourselves and using the toUnicode overload that accepts a char pointer and a length.
std::find is good for this, since it returns the ending iterator of the given range if an element is not found in it. This makes special-case handling unnecessary:
QString field = textCodec->toUnicode(str, std::find(str, str + 10, '\0') - str).trimmed();
Does this work for you?
#include <QDebug>
#include <QByteArray>
int main()
{
char str[] = "STRING\0\0\0\0";
auto ba = QByteArray::fromRawData(str, 10);
qDebug() << ba.trimmed(); // does not work
qDebug() << ba.simplified(); // does not work
auto index = ba.indexOf('\0');
if (index != -1)
ba.truncate(index);
qDebug() << ba;
return 0;
}
Using fromRawData() saves an extra copy. Make sure that the str
stays around until you delete the ba.
indexOf() is safe even if you have filled the whole str since
QByteArray knows you only have 10 bytes you can safely access. It
won't touch 11th or later. No buffer overrun.
Once you removed extra \0, it's trivial to convert to a QString.
You can truncate the string after the first \0:
char * str = "STRING\0\0\0\0"; // Assuming that was read from file
QString field(str); // field == "STRING\0\0\0\0"
field.truncate(field.indexOf(QChar::Null)); // field == "STRING" (without '\0' at the end)
I would do it like this:
char* str = "STRING\0\0\0\0";
QByteArray fieldArr;
for(quint32 i = 0; i < 10; i++)
{
if(str[i] != '\0')
{
fieldArr.append(str[i]);
}
}
QString can be constructed from a char array pointer using fromLocal8Bit. The codec is chosen the same way you do manually in your code.
You need to set the length manually to 10 since you say you have no guarantee that an terminating null byte is present.
Then you can use remove() to get rid of all null bytes. Caution: STRI\0\0\0\0NG will also result in STRING but you said that this does not happen.
char *str = "STRING\0\0\0\0"; // that was read from file
QString field = QString::fromLocal8Bit(str, 10);
field.remove(QChar::Null);
I'm implementing a pluggable MIME filter for IE (this question concerns IInternetProtocol::Read(void*, ULONG, ULONG*)) and I'm intercepting incoming HTML with a view to modify the HTML.
The HTML is generally UTF-8 encoded, except there are some \0 (null) characters, and sits inside a char buffer. I want to load it inside a std::string instance so I can perform string operations such as std::string::find as well as inserting content (by copying substrings into a destination buffer around my injected string, something like this:
string received( this->buffer );
size_t index = received.find("<p id=\"foo\">");
if( index != string::npos ) {
memcpy( destination , received , index );
memcpy( destination + index , "Injected content" , 17 );
memcpy( destination + index + 17, received.substr(index), received.size() - 17 - index );
} else {
memcpy( destination , this->buffer , this->bufferSize );
}
The problem is that the buffer might contain null bytes (it's a quirk of the website I'm working with). To what extent would \0 character values interact with the string operations such as find? The documentation on MSDN nor CPlusPlus.com does not say.
I'm pretty close to losing my head here ;)
I'm developing a service that uses gsoap. I would like to return a mime response.
I have everything working, but when reading binary files, all kind of files like jpeg, pdf, etc... contains the \0 char several times over the data (if opened with notepad can see a lot of NUL).
So any code for reading a raw file fails miserably once it finds the end-of-file char. I have tried to replace the \0 but the file becomes incorrect to display.
I have also tried several methods including the example that comes with gsoap.
So resuming,
fstream generic code doesn't work.
for (i = 0; i < MAX_FILE_SIZE; i++)
{ if ((c = fgetc(fd)) == EOF)
break;
image.__ptr[i] = c;
}
doesn't work also
QFile::ReadAll works but when converting QString to char* the array is trimmed in the first NUL.
So, which is the best aproach to read an entire binary file? Its crazy how sometimes C++ at the basic.
Thanks in advance.
I have tried this as retnick suggested below
UrlToPdf urlToPdf;
urlToPdf.getUrl(&input, &result);
QByteArray raw = urlToPdf.getPdf(QString(result.data.c_str()));
int size = raw.toBase64().size();
char* arraydata = new char[size];
strcpy(arraydata, raw.toBase64().data());
soap_set_mime(this, "MIME_boundary", NULL);
if(soap_set_mime_attachment(this, arraydata, size, SOAP_MIME_BASE64, "application/pdf", NULL, NULL, NULL))
{
soap_clr_mime(this);
soapMessage = this->error;
}
but no luck... the mime response is bigger than the actual file...
David G Ortega
to read binary files use fread()
Once you read it treat it as an array of bytes not as a string. No string functions allowed.
EDIT: The gSOAP documentation section 14.1 explains how to send MIME attachments. I only refer to the relevant function (please read it all).
int soap_set_mime_attachment(struct soap *soap, char *buf_ptr, size_t buf_size,
enum soap_mime_encoding encoding,
const char *type, const char *id,
const char *location, const char *description);
char *buf_ptr is your buffer.
size_t buf_size is the length of your buffer.
So just do your QFile::ReadAll.
this gives you back a QByteArray. The QByteArray has the method
QByteArray QByteArray::toBase64 () const
this will return a
QByteArray base64image = QByteArray::toBase64(rawImage);
so now just do
soap_set_mime(soap, "MIME_boundary", "<boundary.xml#just-testing.com>");
/* add a base64 encoded image (base64image points to base64 data) */
soap_set_mime_attachment(soap,
base64image.data(), base64image.size(),
SOAP_MIME_BASE64, "image/jpeg",
"<boundary.jpeg#just-testing.com>", NULL, NULL);
I have not tested this but should be close to finished.
QFile::ReadAll works but when converting QString to char* the array is trimmed in the first NUL.
Are you sure it's actually trimmed or you just can't print/view the array in the debugger [since C-style strings are 0 terminated]?
If the QString itself is not enough for your needs you may want to convert it to a std::vector or similar using the range constructor or range assign, you'll have lots less grief towards the how much data the container holds.
EDIT:
Here's some sample code for fstream reading from a binary file:
std::ifstream image( <image_file_name>, std::ios_base::in | std::ios_base::binary );
std::istream_iterator< char > image_begin( image ), image_end;
std::vector< char > vctImage( image_begin, image_end );
The std::ios_base::binary is the most important part of the thing (similar to fopen/fread ["rb"] & probably QFile has something similar)
Also posting some sample code usually helps in getting the right answer.
HIH
I have the solution for this... As renick suggested I tried his idea but it failed without undestanding it so much... From a logical point of view recnick was right... bat the truth is that any king of string manipulation using QT QByteArray, std or mem is going to stop when findind the first \0 char, Qt QString can do it without problems but when converting it to c string (char*) the data will be again trimmed with the first \0
I found that using QDataStream::readRawData reads the file into a char* given the size to read. So thats how I accomplished the deal...
QFile file("test.pdf");
file.open(QIODevice::ReadOnly);
int size = file.size();
char* buffer = new char[size];
QDataStream stream(&file);
stream.readRawData(buffer, size);
soap_set_mime(this, "MIME_boundary", NULL);
if(soap_set_mime_attachment(this, buffer, size, SOAP_MIME_BINARY, "application/pdf", NULL, NULL, NULL))
{
soap_clr_mime(this);
soapMessage = this->error;
}
Note that in the line
if(soap_set_mime_attachment(this, buffer, size, SOAP_MIME_BINARY, "application/pdf", NULL, NULL, NULL))
I'm still using the size var instead of doing sizeof(buffer) or any other aproach since this one is going to trimm again the data qhen finding the first \0...
Hope this helps...
David G Ortega