NUSOAP is generating in the response some rare characters, and when the JSON is bigger it responds an error, like if code was encrypted, in the begining of response shows <SOAP-ENV:Envelope... , i encoded the results as UTF-8, and changed the codification of file to UTF-8 and nothing...
thanks to all.. yes as Alex K said.. it's a BOM character, the problem was not in my file but in an external file, a class file had the character BOM, i noticed using a tool called bomremover.exe which lists the files and batch removes the character. After that all works fine.. the tool was made by Maurice Wohlkönig
Related
I have a code for save the log as a text file.
It usually works well, but I found a case where doesn't work:
{Id": "testman", "ip": "192.168.1.1", "target": "?뚯뒪??exe", "desc": "?덈뀞諛⑷??뚯슂"}
My code is a simple logic that saves the log string as a text file.
My code was works well when log is English, but there is a problem when log is Korean language.
After checking through various experiments, it was confirmed that Korean language would not problem if the file could be saved as utf-8 format.
I think, if Korean language is included in log string, c++ is basically saved as ANSI format.
This is my c++ code:
string logfilePath = {path};
log = "{\Id\": \"testman\", \"ip\": \"192.168.1.1\", \"target\": \"테스트.exe\", \"desc\": \"안녕방가워요\"}";
ofstream output(logFilePath, ios::app);
output << log << endl;
output.close();
Is there a way to save log files as uft-8 or any other good way?
Please give me some advice.
You could set UTF-8 in File->Advanced Save Options.
If you do not find it, you could add Advanced Save Options in Tools->Customize->Commands->Add Command..->File.
TDLR: write 0xefbbbf (3-bytes UTF-8 BOM) in the beginning of the file before writing out your string.
One of the hints that text viewer software use to determine if the file should be shown in the Unicode format is something called the Byte Order Marker (or BOM for short). It is basically a series of bytes in the beginning of a stream of text that specifies the encoding and endianness of the text string. For UTF-8 it is these three bytes 0xEF 0xBB 0xBF.
You can experiment with this by opening notepad, writing a single character and saving file in the ANSI format. Then look at the size of file in bytes. It will be 1 byte. Now open the file and save it in UTF-8 and look at the size of file again. It will 4 bytes that is three bytes for the BOM and one byte for the single character you put in there. You can confirm this by viewing both files in some hex editor.
That being said, you may need to insert these bytes to your files before writing your string to them. So why UTF-8? you may ask, well, it depends on the encoding the original string is encoded in (your std::string log) which in this case it is an string literal written in a source file whose encoding is (most likely) UTF-8. Therefor the bytes that build up the string are made according to this encoding and are put into your executable.
note that std::string can contain Unicode string, it just can't make sense of it. For example it reports its length wrong. But it can be used to carry Unicode string around fine.
I am getting non-latin content as base64 encoded from mainframes.
I am decoding this content and inserting it into the Oracle DB which is configured for UTF-8 charset.
But all the non latin characters are getting displayed as junk.
Even Umalut charecters are getting displayed as junk.
6 months earlier, this code was working fine. Bug appeared only recently when I was testing.
What might be the cause for this error?
Were there any updates to Oracle or Unix box which might have caused this?
Thanks
You are getting content from the mainframes, so encoding of the DB is kind of irrelevant.
What you really need to do is figure out encoding of those non-latin characters in the incoming base64 encoded data and after decoding from base64 also convert from whatever charset you have to UTF-8.
It was working fine when you tested because you provided input in same format (UTF-8) from your computer, not in the format that mainframe is giving you.
I downloaded a source code .rar file from internet to my linux server. Then, I extract all source files into local directory. When I use "cat" command to see the content of each file, the wrong text encoding is shown on my terminal (There are some chinese characters in the source file).
I use
file -bi testapi.cpp
then shows:
text/plain; charset=iso-8859-1
I tried to convert that file to uft-8 encoding with following command:
iconv -f ISO88591 -t UTF8 testapi.cpp > new.cpp
But it doesn't work.
I set my .vimrc file with following two lines:
set encoding=utf-8
set fileencoding=utf-8
After this, when I vim testapi.cpp, the chinese characters will be normally displayed in the vim. But cat testapi.cpp doesn't work.
When I compile and run the program, the printf statement with chinese characters will print wrong characters like ????
What should I do to display correct chinese characters when I run the program?
TLDR Quickest Solution: Copy/Paste the Visible Text to a Brand-New, Confirmed UTF-8 File
Your file is marked as latin1, but the data is stored as utf8.
When you set set-enc=utf8 or set fileencoding=utf-8 in VIM, you're not changing the data or converting it. You're looking at the same exact data, but interpreting as if it is the utf8 charset. So, good news: Your data is good. No conversion or changing necessary.
You just need to put the same exact data into a file already marked as UTF-8 encoding. That can be done easily by simply making a brand new file in vim, using set enc=utf8, and then copy-pasting your old data into the new file. You can test this out by making a testfile with the only text "汉语" ("chinese language"), set enc, save, close, reopen, and see that the text didn't get corrupted. And you can test with file -bi $pathtofile, though that is not super reliable.
Anyway, TLDR: Make a brand new UTF-8 file, confirm that it's utf-8, make your data visible, and then copy/paste and/or transfer it to the new UTF-8 file, without doing any conversion.
Also, theoretically, I considered that iconv -f utf8 -t utf8 would work, since all I wanted to do was make utf-8-encoded data be marked as utf-8-encoded, without changing it. But this gave me an error that indicated it was still trying to do a data conversion.
I have a string I need to write to create an XML file. The string has Russian characters in it, which I can cfoutput to the page no problem, but when I write the file with cffile, those characters return with a ?. I tried changing the charset to the following with no success:
windows-1252
iso-8859-1
cp1251
cp866
I'm sure the charset is the problem here. Any suggestions?
Here is one of the strings in question: Другие
I'm running ColdFusion 10 on a Windows Server 2008 R2 System.
Untested, but I have had pageencoding problems in the past. Try <cfprocessingDirective pageencoding="utf-8"> Make sure you put it every template in the operation. Putting it in application.cfm or application.cfc may not be sufficient.
Set the charset = "utf-8" when writing to the file
Can't figure out, how to remove this � symbol from string.
String is in utf-8 format.
What to do? :(
This removes whole string:
preg_replace('/\W/','',utf8_decode(substr(utf8_encode($ad['description']),0,125)))
Thanks ;)
Update:
Using:
header('Content-Type: text/html; charset=utf-8');
After replacement using exit() right away.
U+FFFD REPLACEMENT CHARACTER is used when the character does not have a representation in the current charset encoding. Declare your encodings properly as UTF-8 and use UTF-8 strings and it will not show upon most platforms.
The problem here is that your string is not in utf-8 format. You pretend it is, and handle the data accordingly, but the string probably contains Ansi characters. You don't just need to pass the Content-Encoding = utf-8 header, but your contents needs to be converted to utf-8 before it is sent as well.
you could try utf8_decode('string'); or utf8_encode('string');
but you should really try to find the actuall problem make sure the headers are correct set, document type and that the text is encoded in the right format when saved or what not