I'm using windows OS and trying to write ACE_Tstring that contains multiple languages sentence(by Unicode) to a file using ACE_OS::write().
But the result I'm getting in the file is unpredictable characters(gibberish text).
This is my code implemented :
ACE_Tstring *str = new ACE_Tstring(L"مرحبا привет świecie Hello")
ACE_HANDL hFile = ACE_OS::open(L"myfile", _O_WRONLY);
ACE_OS::write(hFile, str, 1048);
wprintf(L"%ls",str->c_str());
As you can see I also print the string to the screen, and on screen I get the characters "????" where any character accept for English characters appear.
Written Text
مرحبا привет świecie Hello
Result on Screen :
?????? ????? ??????? Hello
What am I missing and what is wrong with my code?
ACE_TString is a typedef for ACE_CString when ACE_USES_WCHAR is not set. Try using ACE_WString if you need to force it to wide-chars.
Related
1.
How to get polish characters from pdf file? Can I somehow tell
PdfVariant::getString()
it will process polish characters?
Becouse I get \200instead of ł for example and the funny thing is thats only when ł occures as first "nonbase" character. So if the pdf file begins with aaaałęąaaaa, the ł is coded like \200, the ę like \201 and ą like \202 but if pdf file begins with aaaaąęłaaaa, the ł is coded like \202, the ę like \201 and ą like \200
How can i get this characters in any system?
2.
When i'm trying to extract text from pdf file, I do something like this:
string input_name = "example.pdf";
PdfMemDocument pdf(input_name.c_str());
for (int pn = 0; pn < pdf.GetPageCount(); ++pn) {
PdfPage* page = pdf.GetPage(pn);
PdfContentsTokenizer tok(page);
const char* token = nullptr;
PdfVariant var;
EPdfContentsType type;
while (tok.ReadNext(type, token, var)) {
//etc.
But I got problem with PdfContentsTokenizer tok(page); It doesn't work properly. For some pdf files it goes smoothly and for the other it throws Access violation reading location error in inffas32.asm file, 669 line:
L_get_length_code_mmx:
pand mm4,mm0
movd eax,mm4
movq mm4,mm3
mov eax, [ebx+eax*4]//this is the error line
Btw, I noticed not every pdf file is coded in the same way. For example, using podofobrowser I couldn't see Hello World! text from the official podofo helloworld example. And for the others pdf files podofobrowser showed text in different ways or didn't show it at all.
Ad 1. The link to patch files
which allows to extraxt polish text from pdf using TextExtractor.
This is the most important line when it comes to extract non-unicode text from pdf:
PdfString unicode = pCurFont->GetEncoding()->ConvertToUnicode( rString, pCurFont );
Ad 2. The problem was zlib library which was built wrong. I rebuit it, rebuilt podofo and the problem is gone.
i am working on a cgi program , i am receiving an email address like this : someone#site.com and storing it in a file.
but some thing strange happens. when i use IE the '#' char , won't change and it is same in the file , but when i use chrome , '#' char ,changes to %40 and the only way to retrieve the '#' is to find %40 and replace it with '#'.am i coding wrong or chrome has problem?
to understand better :
IE: someone#site.com
Chrome:someone%40site.com
and when i send information back to the browser , %40 doesn't change to #
It's IE that's doing it wrong in actual fact - You need to change the %xx code back to a character. In the case of %40 this would be ASCII no. 0x40 == 64 == '#'. You can't rely on it being ASCII, however, as unicode characters (such as accented letters) will also be similarly encoded.
Most languages like PHP and Python have a helper function to encode and decode these (PHP's is called url_encode() and url_decode()) - I've not used CGI on C++ for a long while, so not sure if there's a helper readily available or if you'll have to code your own - either way you should be prepared to decode url-encoded strings as all browsers will do this for some characters if not all (eg. %20 instead of a space is very common).
Hope this helps!
the answer above is correct , just in make the topic richer i think this peace of code can be the function you use :
int main()
{
int number;
string dataString="hi %40 c++ programmer %40 !";
string transform;
istringstream input;
string::size_type location = dataString.find("%");
while (location <string::npos)
{
transform = dataString.substr(location+1, 2);
input.str(transform);
input >> hex >> number;
dataString.replace(location,3,1,static_cast<char>(number));
location = dataString.find("%", location+1);
}
cout << dataString << endl;
}
I am trying to find a process by which to edit and write to a resource .rc file; I attempted to use the sample code listed at
How to increment values in resourse file by using vbscript but the last line in both samples returned the same error ( fso.OpenTextFile(rcfile, 2).Write rctext ) :
Error: Invalid procedure call or argument
Code: 800A0005
Source: Microsoft VBScript runtime error
I modified the script to write out to a .txt file and that worked fine, but I'm baffled as to what may be causing the problem writing out to a .rc file.
From the linked sample (simplified)
rctext = fso.OpenTextFile(rcfile).ReadAll
rctext = ....
fso.OpenTextFile(rcfile, 2).Write rctext
The idea is read all the file, and as far as there is no variable holding a reference to the opened file, it is closed, then change what needs to be changed and open again the file, now for writing, and write the changed content to file
And, usually, it works. But sometimes the file opened for reading is not closed fast enough to later open it for writing.
To ensure the file is closed and then can be opened for writing, change the reading code to
set f = fso.OpenTextFile(rcfile)
rctext = f.ReadAll
f.Close
As your line
fso.OpenTextFile(rcfile, 2).Write rctext
does three things (access fso, open file, write to it), there are many things that could go wrong. Please see this answer for ideas wrt to problems concerning the first two actions. Another answer concerns the write.
In your case, the evidence - works with a.txt, but not with b.rc - makes it highly improbable that the file's opening is to blame (so .Close won't save you). I suspect that the .rc contains Unicode (UTF-8/UTF-16) data that the textstream can't encode.
So either use the unicode parameter to read/write open the file with UTF-16 encoding or an ADODB.Stream for UTF-8.
It seems that the answer to my question required both of your answers(#MC ND and #Ekkehard.Horner); also, once I changed the vbs script to open and write to the .rc file in Unicode, which I'm not sure why I have to, the script was able to execute without error.
Here is the vbs script in it's final form:
Const ForReading = 1, ForWriting = 2
Const TristateUseDefault = -2, TristateTrue = -1, TristateFalse = 0
Const DoNotCreate = false
rcFile = "C:\Path\To\RC\File.rc"
major = 4
minor = 3
maint = 2
build = 1
version = major & "," & minor & "," & maint & "," & build
Set fso = CreateObject("Scripting.FileSystemObject")
Set fileObj = fso.OpenTextFile(rcFile, ForReading, DoNotCreate, TristateTrue)
rcText = fileObj.ReadAll
fileObj.Close
Set regex = New RegExp
regex.Global = True
regex.Pattern = "(PRODUCTVERSION|FILEVERSION) \d+,\d+,\d+,\d+"
rcText = regex.Replace(rcText, "$1 " & version)
regex.Pattern = "(""(ProductVersion|FileVersion)"",) ""\d+, \d+, \d+, \d+"""
rcText = regex.Replace(rcText, "$1 """ & Replace(version, ",", ", ") & """")
Set fileObj = fso.GetFile(rcFile)
Set textStream = fileObj.OpenAsTextStream(ForWriting, TristateTrue)
textStream.Write rcText
textStream.Close
The only thing that does not seem to work is the regex for replacing the ProduceVersion|FileVersion values, but hopefully I can hammer that out within a reasonable time.
I'm working on a dictionary server via telnet, and I'd like it to return it in this format:
**word** (wordType): wordDef wordDef wordDef wordDef
wordDef wordDef wordDef.
Right now I'm outputting the code using:
write( my_socket, ("%s", word.data() ), word.length() ); // Bold this
write( my_socket, ("%s", theRest.data() ), theRest.length() );
So I'd like that first line to be bolded.
Edit
Sorry, I forgot to mention that this is for a command line.
Consider using using something like VT100 escape sequences. Since your server is telnet based the user is likely to have a client that supports various terminal modes.
For instance if you wanted to turn on bold for a VT100 terminal you would output
ESC[1m
where "ESC" is the character value 0x1b. To switch back to normal formatting output
ESC[0m
To use this in your application you can change the example lines from your question to the following.
std::string str = "Hello!"
write( my_socket, "\x1b[1m", 4); // Turn on bold formatting
write( my_socket, str.c_str(), str.size()); // output string
write( my_socket, "\x1b[0m", 4); // Turn all formatting off
There other terminal modes such as VT52, VT220, etc. You might want to look into using ncurses although it might be a bit heavy if all you need is simple bold on/off.
I'm trying to append text to a rich edit control by appending the original string and resending the EM_SETTEXTEX message.
char outputText[4096] = "{\\rtf1\\ansi\\ansicpg0\\deff0{\\colortbl;\\red0\\green0\\blue0;\\red255\\green0\\blue0;\\red50\\green205\\blue50;\\red255\\green140\\blue0;}TEST";
SETTEXTEX s;s.flags = ST_DEFAULT;s.codepage = CP_ACP;
SendMessage(hOutputWndText,EM_SETTEXTMODE,(WPARAM)TM_RICHTEXT,NULL);
SendMessage(hOutputWndText,EM_SETTEXTEX,(WPARAM)&s,(LPARAM)outputText);
I know I do not have a closing bracket on the string but it shows what I want.
TEST
Now I append the string and "re-set" the text inside the rich edit control. Notice, I add a closing bracket just incase.
strcat_s(outputText,"NEWSTUFF}");
SendMessage(hOutputWndText,EM_SETTEXTEX,(WPARAM)&s,(LPARAM)outputText);
And the output this time.
NEWSTUFF}
What gives? I printed the variable outputText to the console and I get the complete string.
{\rtf1\ansi\ansicpg0\deff0{\colortbl;\red0\green0\blue0;\red255\green0\blue0;\red50\green205\blue50;\red255\green140\blue0;}TESTNEWSTUFF}