Reading in Intel Hex file for sorting C++ - c++

I need to read in an Intel Hex file which looks something like this:
:0300000002F8D42F
:07000300020096000000005E
:07000B000200B50000000037
:030013000200D414
:03001B000200F3ED
(Yes, some lines are missing and sometimes 1 line only contains 1 byte)
The : is the start code
First 2 bytes is the byte count
Next 4 are the address in memory
Next 2 is record type
The rest is the data (except the last 2 bytes)
last 2 bytes are the checksum
More info here (wikipedia)
I need to end up with something like this (no periods, only there for readability):
:10.addr.RT.10bytesofdata.CK
If there is no data from the file for an address I am filling it with 'FF'
So what is the best way to read in and store a file like this if I am going to need to divide up and sort the information by address, byte for byte.
I was hoping to read byte by byte (?) storing the appropriate values into a 2D integer array ordered by the address.
[BC][ADDR][RT][b1][b2][b3][b4][b5][b6][b...16][ck]
[BC][ADDR][RT][b1][b2][b3][b4][b5][b6][b...16][ck]
...
I would like to stay away from using strings so I can more easily calculate checksums.
Also I am using Visual Studio.
Thanks for the help I can post more info if this was not clear enough.
Update So right now I think I'm reading in with something like this:
fscanf_s(in_file,"%2X", &BC);
fscanf_s(in_file,"%4X", &ADDR);
fscanf_s(in_file,"%2X", &RT);
The I'll print out to a file like this:
fprintf_s(out_file,"%2X", BC);
fprintf_s(out_file,"%04X", ADDR); //this pads with zeros if needed and forces 4 "digits"
fprintf_s(out_file,"%2X", RT);
Now I'm working on a routine for the data. Let me know if anyone has any good ideas. Thanks

I would suggest a Dictionary<RT, byte[]>, and just use a single flat array. Then stride through that array calculating checksums and building the output lines, if all bytes in the line were 0xFF then you can skip appending that line to your output.
Maybe Dictionary<RT, List<byte>> if you can't predict the size of each memory space in advance, but since 4 nibbles of address only allows 64k, I'd just allocate each array to that space immediately.

I'm not sure about a 2D array -- I'd just start with a big 1D array representing (in essence) the target address space. Pre-fill it with FF. Walk through the records from the hex file, and:
fill the values into your array, and
keep track of the highest and lowest addresses encountered.
When you're done, start from the lowest address encountered, and encode data 10 (10h?) bytes at a time until you reach the highest address you wrote to.

Related

C++ substring - a pointer to a range of a string (loading a big file)

I have a file like this:
ACCCTCGGCTACTACGACTAC
GCTAGTCAGACTGAGCATGTCAGTC
TAGCTAGCTGACTGACTACATCGAC
GCTAGATGCTAGCGTATAGTCTGCTGAGTCTGAGT
GTCAGTCATGTGACTGACGTATGCTATTA
Above file is kinda big, it has 9000 lines which are 100-200 chars long.
I need to insert substrings of these lines in a range of 5 to the map (whole file has to be in the same map).
First line is: ACCCTCGGCTACTACGACTAC so I need to load to the map:
ACCCTCGGCTACTACGACTAC next
ACCCTCGGCTACTACGACTAC next
ACCCTCGGCTACTACGACTAC next
...
ACCCTCGGCTACTACGACTAC
After this we load second line, third, till the eof.
SO. my first idea was:
map<string, set<string>> sequences;
int SEQLEN = 74; // cause we load 74 long substrings
while (getline(in, name) && getline(in, chain)) {
for (int i = 0; i + SEQLEN < chain.size(); i++) {
string subchain = chain.substr(i, SEQLEN);
sequences[subchain].insert(name);
}
}
but after this we have a map, which consumes 4.5 gb of RAM, which is unacceptable, cause PC on which it should work has only 2 GB's :C
I heard about some kind of 'pointers to string's chars'. If something like this exists, I could just load all of the lines and save pointers to 'start char' and 'end char' for these substrings and then just load them by providing this 'range'.
What do you think, is there something like 'pointers to string's certain char'?
If someone has ANY idea, I would be grateful :)
Since your strings encode nucleobases, and you are concerned about saving memory, the best approach is to get rid of strings completely.
With four nucleobase characters in your alphabet, there are only 45 or 1024 possible sub-strings of length 5. You can encode each one of them as a short integer number by doing a lookup, and then decode it for output by doing a reverse lookup.
This approach will save you a lot of memory: an array of 1024 strings and an std::map<std::string,short> needed for lookups will take about 50K of memory. Storing each individual 5-character substring will cost you two bytes, instead of 14 on a 32-bit system or 22 on a 64-bit system. Your entire file could be stored in under one megabyte of memory.

C++ How can I retrieve a file BOM to get its type of encoding?

I don't know if its possible, but is there a way to retrieve the first 4 bytes of a file (most likely the BOM) in order to get its type of encoding (UTF-8, UTF-16LE, CP1252, etc...). And then, if the file selected was encoded in UTF-8, the values found in an array "tabBytes[]" would be something like:
tabBytes[0] = 0xEF
tabBytes[1] = 0xBB
tabBytes[2] = 0xBF
tabBytes[3] = XXXX
Thanks for taking time and helping me! I'll be looking forward to read your comments and answers on this.
EDIT: I'm new to C++, so the code I wrote before is probably wrong, thus I removed it.
FINAL EDIT: Finally I found a solution to my problem, thanks to those who helped me!
Array indices start at 0, so you're writing past the end of the buffer with buffer[fourBytes] = '\0';. You need to allocate fourBytes + 1 bytes if you want to do that. This should stop the crash you're getting when you delete the buffer.
However the only reason for null-terminating the buffer like that is if you want to treat it as a C-style string (e.g. to print it out), which you don't seem to be doing. You're copying it into tabBytes, but you're not copying the null-terminator. So it's unclear exactly what it is you're trying to achieve.
Your overall logic for reading the first few bytes from the file is fine. Although based on the code above, you could just read the data straight into tabBytes and do away with the allocation/copy/free of buffer.

Byte to ASCII turns into square characters

I have an USB Device. It's a pedometer/activity tracker.
I'm able to send bytes to the device and I'm also able to retrieve bytes back.
But for some reason numbers are turned into those square characters and not into numbers... The characters that are actually letters come through just fine.
Any idea's on how to retrieve the bytes as numbers?
The expected format something like this:
The square characters are actually binary data, likely hex before 0x20 or above 0x7f.
The first 15 bytes are binary, you would need to interpret them using something approximately like the following pseudocode:
if (isascii(byte)) {
textToAppendToEditBox(byte);
} else {
textToAppendToEditBox( someKindOfSprintF( "{%02x}");
}
There are plenty of googleable examples of hex dumping code snippets that can make it look pretty
The expected format that you showed sends binary data. You first have to convert the received data to internal information, then you can pass that information to an std::ostringstream to display it in a gui.
When reading in the binary data, make sure to respect the used endianess.

Adding specific record to binary file c++

Suppose I have a binary file and text file of all record workers.
The default total month hours are all set to 0.
How to I actually access to the particular month in the binary and change it to the desired value?
This is in text file format
ID Name J F M
1 Jane 0 0 0
2 Mark 0 0 0
3 Kelvin 0 0 0
to
ID Name J F M
1 Jane 0 0 25
2 Mark 0 0 30
3 Kelvin 0 0 40
The 25 is actually the amount of hours worked in march.
I think the first question here is what you mean by "binary". Are you showing the format of the file literally? In other words, at input, is the character going to be '0' or '\0'? When you're done, do you want the file to contain the two digits '3' and '0' or the single character '\25', '\30' or '\40'?
If you're dealing with a single character at a known offset in each record for input, and want to replace it by a single character for the result, things are pretty easy: seek to the right offset in the file, write a byte, seek to the next offset, and continue 'til you've updated all the records.
If the input file contains character strings, so when you update the value its length will (probably) change, then you're pretty much stuck with reading data in, modifying it in memory, and writing the new data back out (usually to a new file). This is pretty easy too, but can be slow if your file is large.
If you're doing this in a real program, I'd think twice about doing it on your own at all. I'd consider using something like SQLite to handle the data instead. This not only allows you to simplify your code, but also makes life quite a bit nicer for your clients. It uses a known/documented file format, so other tools can work with the data, do backups, etc. It supports transactions, logging, roll-backs, etc. In short, they get a robust solution instead of yet another fragile problem.
A file is a stream of bytes. You can access a file by using the c family of functions fopen fread fwrite. Or though c++ iostream operations. In each case you will need to find the record usually by knowing its position and then reading and writing that record. If the records are not of fixed size you will have to handle moving all subsequent records.

Writing the huffman tree to file after compression

I'm trying to write a Huffman tree to the compressed file after all the actual compressed file data has been inserted. But , i just realized a bit of a problem , suppose I decide that once all my actual data has been written to file , I will put in 2 linefeed characters and then write the tree.
That means , when I read stuff back, those two linefeeds (or any character really) are my delimiters. The problem is , that its entirely possible that the actual data also has 2 linefeeds one after the other, in such a scenario, my delimiter check would fail.
I've taken the example of two linefeeds here , but the same is true for any character string, I could subvert the problem by maybe taking a longer string as the delimiter , but that would have two undersirable effects:
1. There is still a remote chance that the long string is by some coincidence present in the compressed data.
2. Un-necessarily bloating a file which needs to be compressed.
Does anyone have any suggestions on how to separate the compressed data from the tree data ?
First, write the size of the tree in bytes. Then, write the tree itself, and then the contents itself.
When reading, first read the size, then the tree (now you know how many characters to read), and then the contents.
The size can be written as a string, ending with a line feed - this way, you know that the first number and line feeds belong to the size of the tree.
Why not write the size and len on the first 8 bytes (4 each) and then the data?
Then something like:
uint32_t compressed_size;
uint32_t data_len;
char * data;
file.read((char*)compressed_size, 4);
file.read((char*)data_len, 4);
data = new char[data_len];
zip.read(data, data_len);
Should work.
You could deflate the data for better compression.