decipher a file format called .EWB - compression

I have a file which I know that contains a bunch of compressed files inside with some kind of a header.
Can anyone tell me how to unpack it?
file format is .EWB, which stands for EasyWorshipBible.
I know its possible as I've seen it being done. But they didn't tell me how.
I tried using hex editors and winRAR. But non of them seem to get the files correct.

In an example I found, each entry begins with hex 51 4b 03 04, followed by six more bytes of information, followed by a zlib stream. When the zlib streams are decompressed, they have the format "1:1 text line ...", blank line, "1:2 text line ...", etc. However the text does not seem to match the extraction I found along with the example, so I suspect that the text is encoded or encrypted somehow.
That should be enough to get you started.

Related

C++: Problem of Korean alphabet encoding in text file write process with std::ofstream

I have a code for save the log as a text file.
It usually works well, but I found a case where doesn't work:
{Id": "testman", "ip": "192.168.1.1", "target": "?뚯뒪??exe", "desc": "?덈뀞諛⑷??뚯슂"}
My code is a simple logic that saves the log string as a text file.
My code was works well when log is English, but there is a problem when log is Korean language.
After checking through various experiments, it was confirmed that Korean language would not problem if the file could be saved as utf-8 format.
I think, if Korean language is included in log string, c++ is basically saved as ANSI format.
This is my c++ code:
string logfilePath = {path};
log = "{\Id\": \"testman\", \"ip\": \"192.168.1.1\", \"target\": \"테스트.exe\", \"desc\": \"안녕방가워요\"}";
ofstream output(logFilePath, ios::app);
output << log << endl;
output.close();
Is there a way to save log files as uft-8 or any other good way?
Please give me some advice.
You could set UTF-8 in File->Advanced Save Options.
If you do not find it, you could add Advanced Save Options in Tools->Customize->Commands->Add Command..->File.
TDLR: write 0xefbbbf (3-bytes UTF-8 BOM) in the beginning of the file before writing out your string.
One of the hints that text viewer software use to determine if the file should be shown in the Unicode format is something called the Byte Order Marker (or BOM for short). It is basically a series of bytes in the beginning of a stream of text that specifies the encoding and endianness of the text string. For UTF-8 it is these three bytes 0xEF 0xBB 0xBF.
You can experiment with this by opening notepad, writing a single character and saving file in the ANSI format. Then look at the size of file in bytes. It will be 1 byte. Now open the file and save it in UTF-8 and look at the size of file again. It will 4 bytes that is three bytes for the BOM and one byte for the single character you put in there. You can confirm this by viewing both files in some hex editor.
That being said, you may need to insert these bytes to your files before writing your string to them. So why UTF-8? you may ask, well, it depends on the encoding the original string is encoded in (your std::string log) which in this case it is an string literal written in a source file whose encoding is (most likely) UTF-8. Therefor the bytes that build up the string are made according to this encoding and are put into your executable.
note that std::string can contain Unicode string, it just can't make sense of it. For example it reports its length wrong. But it can be used to carry Unicode string around fine.

Mergecom Tags not in order (MC_OUT_OF_ORDER_TAG) issue

While using the MC_Open_File API of MERGECOM,
MC_Open_File( applID, msgID, &cbInfo, MediaToFileObj );
The following error occurred. How to resolve this/ overcome this issue?
(5124) 03-09 15:01:10.39 MC3 E: Tags not in ascending order: (0010,0010) found after (696c,6e6f)
(5124) 03-09 15:01:10.39 MC3 W: Error with tag (0010,0010) at byte offset 704 when parsing file
The same file works fine with MC_Stream_To_Message_With_Offset and MC_Stream_To_Message. Since am not aware of the MC_ATT_TRANSFER_SYNTAX_UID am not able to use those two API's.
Kindly help me to overcome this.
MC_Open_File expects that the file you're reading is a DICOM file with a 128 byte preamble, 'DICM' prefix, then the group 0x0002 elements, followed by the dataset itself.
The error you are seeing looks suspiciously like a parsing error when reading the file. The tag number (696c,6e6f) is obvious ASCII characters that it looks like the parser attempted to parse as a DICOM tag.
So it looks like you might have either an invalidly formatted file, or you're trying to read in a file that is not in the DICOM File Format. Note that MergeCOM-3 APIs do not attempt to parse and determine the format of the file (whether the file is a DICOM file or stream), they just assume the format for the function being used. I'd suggest looking alittle deeper at the binary content of the file to determine the format and if you're using the right function to read the file.

Text extraction from PDF in UNICODE

I need to extract text from pdf files and I found this article which takes every text stream from pdf file and decompresses it.. But I also need to extract the text in Unicode so I tried to adapt my code so it can use wchar_t characters. Only problem is that zlib accepts only one byte at a time for decompressing.. And my wchar_t doesn't have 1 byte per 1 character.
So, is there a way I can work things out here? :)

Converting WAV file audio input into plain ASCII characters

I am working on a project where we need to convert WAV file audio input into plain ASCII characters. The input WAV file will contain a single short alphanumeric code e.g. asdrty543 and each character will be pronounced one by one when you play the WAV file. Our requirement is that when a single character code is pronounced we need to convert it into it's equivalent ASCII code. The implementation will be done in C/C++ as un-managed Win32 DLL. We are open to use third party libraries. I am already googling for directions. However, I will really appreciate it if I can get directions/pointers from an experienced programmer who has already worked on similar requirement. Thank you in advance for your help.
ASCII characters like Az09 are only a portion of the ASCII Table. WAV files like any other file is stored and accessed in bytes.
1 byte has 256 different values. Therefore one can't simply convert bytes into Az09 since there are not enough Az09 characters.
You'll have to find a library which opens WAV files and creates the wave format for you. In relation to the wave's intensity and length, a chain of Az or Az09 characters can be produced.
I believe you're trying to convert the wave to a series of notes. That's possible too, using the same approach.

C++ - Detect whether a file is PNG or JPEG

Is there any fast way to determine if some arbitrary image file is a png file or a jpeg file or none of them?
I'm pretty sure there is some way and these files probably have some sort of their own signatures and there should be some way to distinguish them.
If possible, could you also provide the names of the corresponding routines in libpng / libjpeg / boost::gil::io.
Look at the magic number at the beginning of the file. From the Wikipedia page:
JPEG image files begin with FF D8 and end with FF D9. JPEG/JFIF files
contain the ASCII code for "JFIF" (4A
46 49 46) as a null terminated string.
JPEG/Exif files contain the ASCII code
for "Exif" (45 78 69 66) also as a
null terminated string, followed by
more metadata about the file.
PNG image files begin with an 8-byte signature which identifies the file as
a PNG file and allows detection of
common file transfer problems: \211 P
N G \r \n \032 \n
Apart from Tim Yates' suggestion of reading the magic number "by hand", the Boost GIL documentation says:
png_read_image throws std::ios_base::failure if the file is not a valid PNG file.
jpeg_read_image throws std::ios_base::failure if the file is not a valid JPEG file.
Similarly for other Boost GIL routines. If you only want the type, you might want to try reading only the dimensions, rather than loading the entire file.
The question is essentially answered by the above replies, but I thought I'd add the following: If you ever need to determine file types beyond just "JPEG, PNG, other", there's always libmagic. This is what powers the Unix utility file, which is pretty magical indeed, on many of the modern operating systems.
Image file types like PNG and JPG have well-defined file formats that include signatures identifying them. All you have to do is read enough of the file to read that signature.
The signatures you need are well documented:
http://en.wikipedia.org/wiki/Portable_Network_Graphics#File_header
http://en.wikipedia.org/wiki/JPEG#Syntax_and_structure