Calculating the info-hash of a torrent file - c++

I'm using C++ to parse the info hash of a torrent file, and I am having trouble getting a "correct" hash value in comparison to this site:
http://i-tools.org/torrent
I have constructed a very simple toy example just to make sure I have the basics right.
I opened a .torrent file in sublime and stripped off everything except for the info dictionary, so I have a file that looks like this:
d6:lengthi729067520e4:name31:ubuntu-12.04.1-desktop-i386.iso12:piece lengthi524288e6:pieces27820:¡´E¶ˆØËš3í ..............(more unreadable stuff.....)..........
I read this file in and parse it with this code:
#include <string>
#include <sstream>
#include <iomanip>
#include <fstream>
#include <iostream>
#include <openssl/sha.h>
void printHexRep(const unsigned char * test_sha) {
std::cout << "CALLED HEX REP...PREPPING TO PRINT!\n";
std::ostringstream os;
os.fill('0');
os << std::hex;
for (const unsigned char * ptr = test_sha; ptr < test_sha + 20; ptr++) {
os << std::setw(2) << (unsigned int) *ptr;
}
std::cout << os.str() << std::endl << std::endl;
}
int main() {
using namespace std;
ifstream myFile ("INFO_HASH__ubuntu-12.04.1-desktop-i386.torrent", ifstream::binary);
//Get file length
myFile.seekg(0, myFile.end);
int fileLength = myFile.tellg();
myFile.seekg(0, myFile.beg);
char buffer[fileLength];
myFile.read(buffer, fileLength);
cout << "File length == " << fileLength << endl;
cout << buffer << endl << endl;
unsigned char datSha[20];
SHA1((unsigned char *) buffer, fileLength, datSha);
printHexRep(datSha);
myFile.close();
return 0;
}
Compile it like so:
g++ -o hashes info_hasher.cpp -lssl -lcrypto
And I am met with this output:
4d0ca7e1599fbb658d886bddf3436e6543f58a8b
When I am expecting this output:
14FFE5DD23188FD5CB53A1D47F1289DB70ABF31E
Does anybody know what I might be doing wrong here? Could the problem lie with the un-readability of the end of the file? Do I need to parse this as hex first or something?

Make sure you don't have a newline at the end of the file, you may also want to make sure it ends with an 'e'.
The info-hash of a torrent file is the SHA-1 hash of the info-section (in bencoded form) from the .torrent file. Essentially you need to decode the file (it's bencoded) and remember the byte offsets where the content of the value associated with the "info" key begins and end. That's the range of bytes you need to hash.
For example, if this is the torrent file:
d4:infod6:pieces20:....................4:name4:test12:piece lengthi1024ee8:announce27:http://tracker.com/announcee
You wan to just hash this section:
d6:pieces20:....................4:name4:test12:piece lengthi1024ee
For more information on bencoding, see BEP3.

SHA1 calculation is just as simple as what you've written, more or less. The error is probably in the data you're feeding it, if you get the wrong answer from the library function.
I can't speak to the torrent file prep work you've done, but I do see a few problems. If you'll revisit the SHA1 docs, notice the SHA1 function never requires its own digest length as a parameter. Next, you'll want to be quite certain the technique you're using to read the file's contents is faithfully sucking up the exact bytes, no translation.
A less critical style suggestion: make use of the third parameter to SHA1. General rule, static storage in the library is best avoided. Always prefer to supply your own buffer. Also, where you have a hard-coded 20 in your print function, that's a marvelous place for that digest length constant you've been flirting with.

Related

Using std::find to find chars read from binary file and cast to a std::string in a std::vector<string> creates this inpredictible behaviour?

Sorry for the long headline. I couldn't know how to describe it in short words.
Would you care to recreate the problem i am going through?
You can use any wav file to read.
I am trying to query the chunks in a wav file here, this is the simplified version of the code, but i think it might be enough to recreate if there is a problem.
I use a mac, and compile with g++ -std=c++11.
When i run this code and don't include the line std::cout << query << std::endl; then std::find(chunk_types.begin(), chunk_types.end(), query) != chunk_types.end() returns 0 in all iterations. But i know the binary file contains some of these chunks. If i include the line then it works properly, but that is also not predictable lets say it works properly sometimes.
I am a bit perplexed am i doing anything wrong here?
#include <fstream>
#include <algorithm>
#include <iostream>
#include <string>
#include <vector>
int main(){
std::vector<std::string> chunk_types{
"RIFF","WAVE","JUNK","fmt ","data","bext",
"cue ","LIST","minf","elm1",
"slnt","fact","plst","labl","note",
"adtl","ltxt","file"};
std::streampos fileSize;
std::ifstream file(/* file path here */, std::ios::binary);
file.seekg(0, std::ios::beg);
char fileData[4];
for(int i{0};i<100;i+=4){ //100 is an arbitrary number
file.seekg(i);
file.read((char*) &fileData[0], 4);
std::string query(fileData);
std::cout << query << std::endl;
/* if i put this std::cout here, it works or else std::find always returns 0 */
if( std::find(chunk_types.begin(), chunk_types.end(), query) != chunk_types.end() ){
std::cout << "found " + query << std::endl;
}
}
return 0;
}
std::string query(fileData) uses strlen on fileData to find its terminating 0, but doesn't find one because fileData is not zero-terminated and continues searching for 0 up the stack until it finds it or hits inaccessible memory past the end of the stack and causes SIGSEGV.
Also file.read can read fewer symbols than expected, gcount must be used to extract the actual number of characters last read:
A fix:
file.read(fileData, sizeof fileData);
auto len = file.gcount();
std::string query(fileData, len);
A slightly more efficient solution is to read directly into std::string and keep reusing it to avoid a memory allocation (if no short string optimisation) and copying:
std::string query;
// ...
constexpr int LENGTH = 4;
query.resize(LENGTH);
file.read(&query[0], LENGTH);
query.resize(file.gcount());

Store file in unsigned char array and print it

I've used the code below to read a binary file (in my case .docx file) and storing it in unsigned char array instead of just char (took reference from here Reading and writing binary file)
#include <fstream>
#include <iterator>
#include <vector>
int main()
{
std::ifstream input("C:\\test.docx", std::ios::binary);
std::vector<unsigned char> buffer((std::istreambuf_iterator<unsigned char>(input)),
(std::istreambuf_iterator<unsigned char>()));
}
Now I got two questions.
First thing I wanna know, is this a correct way to read a .docx file in an unsigned char array? Or are there better options available?
Secondly, I need to print the contents of file that are read in the unsigned char array, just to verify if it has correctly read the file or not. How can that be achieved?
That is an OK way if you're fine having the whole file in memory. If you want to read the file in parts, you should iterate over it. A use-case for it would be for transmitting it over the network - there, you won't need the whole file in memory.
About printing the file, it's possible to print the bytes read, for example, like this:
#include <fstream>
#include <iterator>
#include <vector>
#include <iostream>
#include <iomanip>
int main()
{
std::ifstream input("C:\\test.docx", std::ios::binary);
std::vector<unsigned char> buffer((std::istreambuf_iterator<unsigned char>(input)),
(std::istreambuf_iterator<unsigned char>()));
std::cout << std::hex;
for (unsigned char b : buffer)
std::cout << "0x" << std::setfill('0') << std::setw(2) << (int)b << " ";
std::cout << std::dec << std::endl;
}
If you meant printing the contents of the file to see some familiar text, that's not going to work directly. docx files use the Open XML File Format, which first of all, makes them a zip file. Inside the zip file, you will find XML representations of the data in the document, which are readable.

Why is the file I'm writng to printing gibberish?

I'm not sure what's happening, but I'm using an ofstream object to write to a file in binary mode.
I am writing a byte to a file, but the file is printing gibberish. It's printing this type of gibberish: ôß
I have a class called ByteOutput with a function called inByte defined as so:
void inByte(int byte)
{
ostreamObj.write(&buffer, byte & 255);
}
&buffer is a reference to a bit buffer I am using to store a byte of data
In my main, I defined an ofstream obj and opened a file in binary using:
obj.open("tester", std::ios::binary);
I write a byte of data to the file using a ByteOutput object using:
writeObj.inByte(1001011);
However, when I check the file, it is all hieroglyphics. It does not show the letter K, which has the binary presentation 1001011.
What am I doing wrong?
this
writeObj.inByte(1001011);
calls the function with an integer = 1,001,001 decimal, not binary.
If you want to use binary, consider hex or std::bitset
#include <bitset>
#include <iostream>
using namespace std;
int main()
{
int a = 0x4B; // 01001011
bitset<8> bs(a);
cout << hex << a << endl;
cout << bs << endl;
}
if you output std::bitset to a file, every bit will be represented as a char, i.e., you will see 01001011 in you file.

Automatically extend file size when seeking / writing to a location on a read/write fstream

I'm working on some legacy code that uses win32 WriteFile() to write to a random location in a binary file. The offset of the write can be past the end of the file in which case WriteFile() seems to automatically extend the file size to the offset and then write the data to the file.
I'd like to use std::fstream to do the same, but when I try to seekp() to the appropriate location, past the end of the file, the seekp() fails and the subsequent write() fails as well.
So it seems to me that I have to 'manually' fill in the space between the current EOF and the location I want to write to.
The code looks like this:
void Save(size_t offset, const Element& element)
{
m_File.seekp(offset, std::ios_base::beg);
m_File.write(reinterpret_cast<const char*>(&element), sizeof(Element));
if (m_File.fail()) {
// ... error handling
}
}
So is my only option to 'manually' write 0s from the current EOF up to offset?
Here is an example I picked up verbatim from MSDN:
// basic_ostream_seekp.cpp
// compile with: /EHsc
#include <fstream>
#include <iostream>
int main()
{
using namespace std;
ofstream x("basic_ostream_seekp.txt");
streamoff i = x.tellp();
cout << i << endl;
x << "testing";
i = x.tellp();
cout << i << endl;
x.seekp(2); // Put char in third char position in file
x << " ";
x.seekp(2, ios::end); // Put char two after end of file
x << "z";
}
The file "basic_ostream_seekp.txt" has te ting\0\0z at the end of the program, i.e., you are allowed to seek past the end of the file.
In any case, if write does fail for you, you could check and see if seekp does too. If it does, you can detect the failure earlier.

How do i read an entire .txt file of varying length into an array using c++?

I'm making a shift cipher that reads in text from a file and decodes it. The decryption works fine howver i can't figure out how to find the length of the file without hardcoding it into the size of the char array. It also only reads in one line, anything with a newline in corrupts.
Any help would be greatly appreciated, i've left out the main block of code as that deals with the array after it has been read in and seemed a bit long and irrelevant.
string fileName;
cout << "Please enter the locations of your encrypted text (e.g ""encryptedText.txt""): ";
getline( cin, fileName );
char encryptedMessage[446]; //How do i read in the file length and declare the array size as a variable instead of [446]?
char decryptedMessage[446];
ifstream in(fileName);
if(in.get(encryptedMessage, 446))
{
[my decrypting code]
}
else
{
cout << "Couldn't successfully read file.\n";
}
system("pause");
Well, a simple one-liner for reading a whole file into a dynamically sized array (don't use a statically sized array) of chars would be:
#include <vector>
#include <iterator>
std::vector<char> encryptedMessage(std::istreambuf_iterator<char>(in),
std::istreambuf_iterator<char>());
Don't mess with dynamic allocation yourself, just let std::vector do its job. And due to its optimized growth behaviour you don't really need to bother with checking the file size. Optimize for speed when neccessary or at least not before your files get larger than a few hundred characters. And of course the istreambuf_iterator (instead of istream_iterator) doesn't handle whitespace any special, it just takes each character raw from the file one by one.
You may do the same with a std::string instead of a std::vector<char>, but I'm not sure about its growth behaviour (maybe it always reallocates the array with one more element). But then again, who cares for speed when the file contains 400 charcters?
You can use seekg to get the size of an entire file:
#include <iostream>
#include <fstream>
using namespace std;
int main () {
long begin_byte, end_byte;
ifstream in("example.txt");
begin_byte = in.tellg();
in.seekg (0, ios::end);
end_byte = in.tellg();
int total_bytes = end_byte - begin_byte;
in.seekg(0, ios::begin);
char *message = new char[total_bytes + 1];
int index = 0;
while (in) {
message[index++] = in.get();
}
in.close();
cout << "message is: " << message << endl;
delete [] message;
return 0;
}
You can read more about seekg, tellg and files in c++ as a whole here.
However a better solution then using char * is using a std:string and calling push_back on it while in has not ended:
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main () {
ifstream in("example.txt");
string message;
while (in) {
message.push_back(in.get());
}
in.close();
cout << "message is: " << message << endl;
return 0;
}
You cannot have Variable Length Arrays(VLA) in C++.
Compilers do provide VLA's as extensions but using them would make your code non-portable.
Simplest and Best Solution is to use std::string instead of character arrays.
You might get answers all over which advice you to use to use dynamically allocated arrays but using std::string is the best choice, so ignore those.
EDIT:
Since somebody downvoted this. I would be very interested in knowing the reasons(provided they are technical) to do so.
You need dynamically allocated memory, and the best way to manage that is with std::vector.
std::vector<char> encryptedMessage;
encryptedMessage.resize(size_of_file);
in.get(&encryptedMessage[0], encryptedMessage.size());