how to convert a file from data to ASCII format - file-conversion

I have the file /etc/mydata which is in data format
$ file /etc/mydata
/etc/mydata: data
is there any fast way to convert
/etc/mydata: data
from data format to ASCII format to have this
/etc/mydata: ASCII text
thank you!
thank you

strings will filter any ASCII text in the file. But it's probably not what you actually need.

Related

C++: Problem of Korean alphabet encoding in text file write process with std::ofstream

I have a code for save the log as a text file.
It usually works well, but I found a case where doesn't work:
{Id": "testman", "ip": "192.168.1.1", "target": "?뚯뒪??exe", "desc": "?덈뀞諛⑷??뚯슂"}
My code is a simple logic that saves the log string as a text file.
My code was works well when log is English, but there is a problem when log is Korean language.
After checking through various experiments, it was confirmed that Korean language would not problem if the file could be saved as utf-8 format.
I think, if Korean language is included in log string, c++ is basically saved as ANSI format.
This is my c++ code:
string logfilePath = {path};
log = "{\Id\": \"testman\", \"ip\": \"192.168.1.1\", \"target\": \"테스트.exe\", \"desc\": \"안녕방가워요\"}";
ofstream output(logFilePath, ios::app);
output << log << endl;
output.close();
Is there a way to save log files as uft-8 or any other good way?
Please give me some advice.
You could set UTF-8 in File->Advanced Save Options.
If you do not find it, you could add Advanced Save Options in Tools->Customize->Commands->Add Command..->File.
TDLR: write 0xefbbbf (3-bytes UTF-8 BOM) in the beginning of the file before writing out your string.
One of the hints that text viewer software use to determine if the file should be shown in the Unicode format is something called the Byte Order Marker (or BOM for short). It is basically a series of bytes in the beginning of a stream of text that specifies the encoding and endianness of the text string. For UTF-8 it is these three bytes 0xEF 0xBB 0xBF.
You can experiment with this by opening notepad, writing a single character and saving file in the ANSI format. Then look at the size of file in bytes. It will be 1 byte. Now open the file and save it in UTF-8 and look at the size of file again. It will 4 bytes that is three bytes for the BOM and one byte for the single character you put in there. You can confirm this by viewing both files in some hex editor.
That being said, you may need to insert these bytes to your files before writing your string to them. So why UTF-8? you may ask, well, it depends on the encoding the original string is encoded in (your std::string log) which in this case it is an string literal written in a source file whose encoding is (most likely) UTF-8. Therefor the bytes that build up the string are made according to this encoding and are put into your executable.
note that std::string can contain Unicode string, it just can't make sense of it. For example it reports its length wrong. But it can be used to carry Unicode string around fine.

Converting binary data to readable text using c++

I have a file with .dat extension which contains some binary data.i want to convert this to readable text format in c++.Read data line by line .
You have tried this way?
Read / Write Binary Data.
You can now iterate the data in the buffer and cast them to a char.
Now you can
write the chars.
Depending on your encoding the binary data will be "some char".
Linebreaks mostly are '10 or '13'.

How to save a text file to a .mat file?

How do I save a '.txt' file as a '.mat' file, using either MATLAB or Python?
I tried using textscan() (in MATLAB), and scipy.io.savemat() (in Python). Both didn't help.
My text file is of the format: value1,value2,value3,valu4 (each row) and has over 1000 rows.
Appreciate any help is appreciated. Thanks in advance.
You can use textscan to read the file and save to save the variables into a .mat file
fid = fopen('yourTextFile.txt');
C = textscan(fid,'%f %f %f %f');
fclose(fid);
% maybe change the cells from `C` to a single matrix
M = cell2mat(C);
save('myMatFile.mat','M');
This works because your file seems to have a fixed format.
Have a look at this and this
I was able to get it to work using csvread() as follows:
file = csvread('yourTextFile.txt');
save('myMatFile.mat','file');
if what you need is to change file format:
mv example.mat example.txt

c++ read from binary file and convert to utf-8

I would like to read data from a application/octet-stream charset=binary file with fread on linux and convert it to UTF-8 encoding. I tried with iconv, but it doesn't support binary charset. I haven't found any solution yet. Can anyone help me with it?
Thanks.
According to the MIME that you've given, you're reading data that's in non-textual binary format. You cannot convert it with iconv or similar, because it's meant for converting text from one (textual) encoding to another. If your data is not textual, then a conversion to any character encoding is meaningless and will just corrupt the data, but not make it any more readable.
The typical way to present binary as readable text for inspection is hex dump. There's an existing answer for implementing it in c++: https://stackoverflow.com/a/16804835/2079303

Read Chinese Characters in Dicom Files

I have just started to get a feel of Dicom standard. I am trying to write a small program, that would read a dicom file and dump the information to a text file. I have a dataset that has the patient names in Chinese. How can I read and store these names?
Currently, I am reading the names as Char* from the dicom file, converting this char* to wchar* using code page "950" for Chinese and writing to a text file. Instead of seeing Chinese characters I see * ? and % in my text file. What am I missing?
I am working in C++ on Windows.
If the text file contains UTF-16, have you included a BOM?
There may be multiple issues at hand.
First, do you know the character encoding of the Chinese name, e.g. Big5 or GB*? See http://en.wikipedia.org/wiki/Chinese_character_encoding
Second, do you know the encoding of your output text file? If it is ascii, then you probably won't ever be able to view the Chinese characters. In which case, I would suggest changing it to unicode (i.e. UTF-8).
Then, when you read the Chinese name, convert the raw bytes and write out the result. For example, if the DICOM stores it in Big5, and your text file is UTF-8, you will need a Big5->UTF-8 converter.