Utf8 string braking one the way to web service function - web-services

When passing utf8 data to a web service, the data breaks, and it gets to the function with weird symbols after every chunk of text, always in the same location.
This problem does not happen always.
When adding a character that changes the length of the string -the problem disappears.
This problem happens only when sending utf8 characters. When sending English letters, the problem does not happen
example:
this is the string I send (as regular string, not as XML)
string string_to_send="<API><METADATA id="METADATA"><SITE>sapir</SITE>
<SESSION_ID>52CA5BF6-472B</SESSION_ID><READER_ID></READER_ID>
<LANG_ID>HEB</LANG_ID> <LANG_UI_ID>HEB</LANG_UI_ID><ITEM>
<ITEM_ID>1234</ITEM_ID><update><FIELD lif ="SH2" collector_lif="0"
collector_val="0" old_val="N" new_val="N"/><FIELD lif="TA" collector_lif="0"
collector_val="0" old_val=""
new_val="טטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטט
טטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטט
טטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטטט"/>
</update></ITEM><IDEASTATE></IDEASTATE></METADATA></API>";
This is how I send the data to the function in the web service:
ideaCatApi.idea_edit_item(string_to_send);
I have service reference to the web service:
this is the configuration in the web config
<system.serviceModel>
<bindings>
<basicHttpBinding>
<binding name="idea_api_cat" />
</basicHttpBinding>
</bindings>
<client>
<endpoint
address="http://my_ip/ws/services/idea_api_cat_service"
binding="basicHttpBinding" bindingConfiguration="idea_api_cat"
contract="API_cat_idea.idea_api_cat_service"
name="idea_api_cat_service" />
</client>
</system.serviceModel>
I run this code from my local machine - Windows 10
to the server - Windows Server 2012 R2
I tried checking if the web service function brakes the text, but its developer said he gets it with the weird symbols
I expect the string to either always brake if there is an encoding problem,
I want to get to a state that the string will get to the web service function with no weird symbols
Am I missing headers?
I tried encoding the c# string but it did not help
Any other idea what can cause the problem?
I tried adding content-type header, but it did not help. So now my code looks like this:
using (new OperationContextScope(ideaCatApi.InnerChannel))
{
HttpRequestMessageProperty requestMessage = new HttpRequestMessageProperty();
requestMessage.Headers["Content-Type"] = "text/html;charset=utf-8";
OperationContext.Current.OutgoingMessageProperties[HttpRequestMessageProperty.Name] = requestMessage;
xmlString = ideaCatApi.idea_edit_item(sb.ToString());
response = parseResult(xmlString);
}

Related

European characters switch to strange characters in response when posting to server using C++

I am struggeling to get the response from the server in correct format under Windows. I have tried two C++ libraries Beast, (based on Boost Asio) and Cpr (based on libcurl) and I get the exact same issue with both.
The strange thing is that I also tried this in C# (HttpClient) and everything works just fine. Also, in Postman and other REST tools it looks good.
When I post to the server and should get back the name René I get Ren� instead. Other European characters like æ,ø,å,ö give the same strange output. To me it looks like an issue with utf-8 / iso-8859-1 but I cannot figure it out. The server (based on node.js) and the response is set to push out utf-8. We have tried to just redirect the response so it does not hit a database or anything like that. So, the problem is under C++ it seems. Any suggestions to what I can try would be greatly appreciated.
Example code:
nlohmann::json test_json = nlohmann::json
{
{ "text", "Hi, my name is René" },
{ "language", "en" }
};
auto r = cpr::Post(cpr::Url{ "http://www.exampleserver.com" },
cpr::Body{ test_json.dump() },
cpr::Header{ { "content-type", "application/json; charset=utf-8" } });
std::cout << r.text << std::endl;
It looks like you've got some ISO-8859-1 content being sent through but it's labelled as UTF-8. This causes a whole rash of conversion errors which can mangle non-ASCII characters beyond recognition.
The way to fix this is to either identify the non-UTF-8 data and properly convert it, or identify the payload with the correct MIME type and encoding.
Your issue is with the encoded string. The string is most likely coming back UTF-8 encoded but you are not converting it properly.
There are various libraries that help you convert. It all depends on the version of C++ you're using. Hard to tell you what to use without more details.

WinSock Manual HTTP File Upload

I am playing around writing some HTTP communication in C++ using the Winsock APIs. I have no trouble performing GET requests and receiving the response, however I am having a problem when trying to perform a file upload via a POST request.
So first of all, I will share the code of my PHP file which receives the upload request (upload.php):
<?php
if(isset($_FILES['file'])){
$errors= array();
$file_name = $_FILES['file']['name'];
$file_size =$_FILES['file']['size'];
$file_tmp =$_FILES['file']['tmp_name'];
$file_type=$_FILES['file']['type'];
$file_ext=strtolower(end(explode('.',$_FILES['file']['name'])));
if(empty($errors)==true){
move_uploaded_file($file_tmp,"uploads/".$file_name);
echo "Success";
}
}
else{
echo "Was no file";
}
?>
<form action="" method="POST" enctype="multipart/form-data">
<input type="file" name="file"/>
<input type="submit"/>
</form>
Now I know that there is nothing wrong with this code, because I succeed to perform a file upload to it by using WinInet APIs (HttpSendRequest). My WinInet code which works on this PHP file is comprised of the following main steps:
HttpOpenRequest(..., "POST", "uploader/upload.php", "HTTP/1.0", NULL, NULL, NULL, NULL)
HttpSendRequest with headers I set to: "Content-Type: multipart/form-data; boundary=---------FILE_BOUNDARY----------". I printed out the request body that was built by my program as you can see here, sorry I didn't print it out in a way which I could copy and paste here: http://i.imgur.com/mMmo7Xd.png
This works beautifully, the file is uploaded properly. However my problem arises when I try to "port" this code to use winsock API's instead of wininet. With winsock, as you may know, I must completely manually construct the whole request (headers and body). I assume this must be where my problem is, because the main body of the request itself is the same as when I am using wininet APIs. Here is a printout of my winsock request that is being sent: http://i.imgur.com/TkLNGrq.png
PS I have no idea why MessageBox put the boundary part of the header on another line, the is no "\r\n" there in my code. Could this have something to do with my issue? Here is my code for building the entire request string. Please don't give me pointers on security and buffer overruns, this is not production code:
wsprintfA(FullReqStr,
// Headers
"POST %s HTTP/1.0\r\n"
"Host: %s\r\n"
"Content-Type: multipart/form-data; boundary=---------FILE_BOUNDARY----------\r\n\r\n"
// Body
"-----------FILE_BOUNDARY----------\r\n"
"Content-Disposition: form-data; name=\"file\"; filename=\"file.log\"\r\nContent-Type: application/octet-stream\r\n\r\n"
"%s\r\n"
"-----------FILE_BOUNDARY----------\r\n"
"Content-Disposition: form-data; name=\"submit\"\r\n\r\n"
"Submit\r\n"
"-----------FILE_BOUNDARY------------\r\n",
GatePath, Server, FileBody
);
And yes the file bytes in the winsock request are different, I used different file bytes for testing. That is besides the point. Basically, when I send that request and receive the response, it hits the "Was no file" else in the PHP file. I feel like I must be missing some required field in the headers of my winsock request, one that is otherwise automatically added when using WinInet APIs. I cannot think of anything else, however I'm not sure what I'm missing.
Could anybody point me to what is wrong in my Winsock code? I would greatly appreciate it. Thank you. By the way please not I am not looking for any security or anything in my upload script, obviously. I'm just playing around trying to get this to work. It is not being applied to any system that needs to be secure or anything.
Thanks.
Looks like I didn't read the RFC closely enough. http://www.ietf.org/rfc/rfc1867.txt
I needed to add the "Content-Length" field to my HTTP header. Now I construct the body of the request first so I can take the length, and then construct the entire request using that. Working fine.

SAXParseException while sending binary to web service using hexBinary datatype

I am trying to send TIFF images to a web service which accepts the image in the following way (this is just a fragment of a larger WSDL of course):
<complexType name="ArrayOfImage">
<sequence>
<element maxOccurs="unbounded" name="image" type="xsd:hexBinary"/>
</sequence>
</complexType>
The data is loaded in the following way, where the enclosingType is generated by JAX-WS RI (JAX-WS RI 2.1.7-b01-):
final List<byte[]> imgData = new LinkedList<byte[]>();
for (final Iterator<File> iterator = files.iterator(); iterator.hasNext(); ) {
imgData.add(Files.toByteArray(iterator.next())); //Files class from Guava release 13
}
enclosingType.setArrayOfImage(imgData);
When the image is sent to the remote webservice, I get errors of the following style:
javax.xml.ws.soap.SOAPFaultException: org.xml.sax.SAXParseException: An invalid XML character (Unicode: 0x0) was found in the element content of the document. Message being parsed: HEXHEXHEXHEXHEXHEXHEXHEXHEXHEXHEXHEXHEXHEXHEXHEXHEXHEXHEXHEXHEXHEXHEXHEXHEXHEXHEXHEXHEXHEXHEXHEXHEXHEXHEXHEXHEXHEXHEXHEXHEXHEXHEXH</ns4:image></ns4:arrayOfImage><ns4:otherField></ns4:otherField></ns4:enclosingType></ns5:enclosingTypes></ns5:outerEnclosingType></S:Body></S:Envelope>
I suppose it is entirely possible for a TIFF to include Unicode NULL byte, which I assume is what is being sent based upon this answer.
As I understand it, I am using the generated API correctly as hexBinary doesn't expect Base64 encoded data. Is there something else that needs to be done to make the images send correctly?
This was a fine case of the error message throwing the developer off the scent.
There was never a problem with null bytes in the binary file causing this exception - the null bytes were in normal String-typed fields, but the Exception didn't show this for some reason.
We were able to test this by generating byte[]s in test cases, and serialising the SOAP objects to XML. No error was thrown. However, placing the null byte into Strings in 'normal' fields did cause the error.
The Strings come from a database over which I have no control - I have therefore added some String-cleaning code which removes the null byte from all Strings before they are 'set' in the SOAP object.

C++ - CGI - Audio not working properly

I have a website with an HTML5 audio element whose audio data shall be served via a cgi script.
The markup is rather simple:
<audio controls>
<source type="audio/mpeg" src="audio.cgi?test.mp3">
<em>Me, your browser does not support HTML5 audio</em>
</audio>
The cgi is written in C++ and is pretty simple too, I know there is need of optimizing, e.g. reading the whole file in a buffer is really bad, but that's not the point.
This basic version kinda works, meaning the audio is played, but the player does not display the full length and one can only seek through the track in parts that have already been played.
If the audio file is placed in a location accessible via the web-server everything works fine.
The difference between these two methods seems to be, that the client issues a partial-content request if the latter method is chosen and an ordinary 200 if I try to serve the audio data via the cgi at once.
I wanted to implement partial-content serving into the cgi but I failed to read out the environment variable Request-Range, which is needed to serve the requested part of data.
This leads me to my questions:
Why does the HTML5 player not display the full length of the track if I'm serving the audio data via the cgi script?
Would implementing a partial-content handling solve this issue?
If the partial-content handling is the right approach, how would I access the required environment variables in apache, since I have not found anything about them? Do I need to send a complete HTTP header indicating partial-content is coming, so the client knows he needs to send the required fields?
This is the source of the .cgi:
void serveAudio()
{
//tried these, were not the right ones
//getenv("HTTP_RANGE");
//getenv("HTTP_CONTENT_RANGE");
ifstream in(audioFile, ios::binary | ios::ate);
size_t size = in.tellg();
char *buffer = new char[size];
in.seekg(0, ios::beg);
in.read(buffer, size);
cout<<"Content-Type: audio/mpeg\n\n";
cout.write(buffer, size);
}
Any suggestions and helpful comments are appreciated!
Thanks in advance!
P.S.:
Forgot to mention that this behaviour applies to FF 31 and IE 11.

What's the most efficient way to parse incomplete XML messages over a stream?

I have a TCP connection that sends me XML messages over a stream.
The first message I receive in the <?xml version="1.0" encoding="utf-8"?> message.
The second is a authentication request message, which provides a seed to use when hashing my credentials to send back to the server - <session seed="VJAWKBJXJO">.
At this point I should send a <session user="admin" password_hash="123456789"> message back to authenticate myself.
Once authenticated I will receive the desired data in the form of <Msg>data</Msg>.
If I do not authenticate in time with the server, I receive a </session> message, to indicate the session has been closed.
The problem is that I can't use a DOM parser because attempting to parse the <session> tag with no end tag always throws an error, so I'm attempting to use the Xerces-c SAX parser, to perform progressive parsing of the XML.
When I receive each message I want to ideally append it to a MemBufInputSource which contains all XML which has currently been received, then perform a parseNext on the buffer to parse the new XML that has been received, but I can't figure out how to get it working correctly.
Is there a better way around this problem? Perhaps just using a special case for the <session></session> messages?
Thanks
Have you tried using a different parser? If not, I'm using libxml2 (http://xmlsoft.org/), it's incredibly simple and it allows you to handle errors at your leisure.
You can create an xmlTextReaderPtr from a stream (your connection):
xmlTextReaderPtr reader = xmlReaderForMemory(...)
Then iterate through the nodes until you find your data:
while ( (result=xmlTextReaderRead(reader))== 1 )
{
int nodetype = xmlTextReaderNodeType(reader);
if ( nodetype == XML_READER_TYPE_ELEMENT )
{
const xmlChar* name = xmlTextReaderConstName(reader);
/* now name is the name of the element, like "session" */
if ( strcmp(name,"session")==0 )
{
/* now look for the XML_READER_TYPE_ATTRIBUTE named "seed" and read the
* value with xmlTextReaderConstValue to get the seed value */
}
}
}
They have a simple example, as well, for parsing out values:
http://xmlsoft.org/examples/reader1.c
It does have a bunch of features in there, though I can only speak for the basic reading, writing, and xinclude features.
Hope that helps!