I am reading a webservice output from classic asp.
Web service output is as follows.
<boolean xmlns="http://somewebsite.com/">true</boolean>
This is proper output as expected.
I have written below code to read this output in classic asp.
Set obj1 = Server.createobject("MSXML2.ServerXMLHTTP.3.0")
URL1 = "http://webserive.asmx/method?para=2"
obj1.open "GET", URL1, False
obj1.setRequestHeader "Content-Type", "application/x-www-form-urlencoded; charset=utf-8"
obj1.setRequestHeader "SOAPAction", URL1
obj1.send
if obj1.responseText <> "" Then
response.write "ok." & obj1.responseText
end if
But this output prints following output:
"ok. true"
There is a space in the output which is not expected.
This is the problem.
Please advice.
Your output from ResponseText for all intense and purpose is valid HTML structure as far as an Internet Browser is concerned and will treat it accordingly. When you use Response.Write() to send content to a Browser it is sent "as is" so in this case the <boolean> element is seen as HTML so only the contained text true is outputted.
To fix this you first need to HTML encode the ResponseText before you send it to the Browser so the Browser knows to treat what is sent as plain old text. You do this by calling the method Server.HTMLEncode()
Response.Write "ok." & Server.HTMLEncode(obj1.ResponseText)
According to MSDN;
The HTMLEncode method applies HTML encoding to a specified string. This is useful as a quick method of encoding form data and other client request data before using it in your Web application. Encoding data converts potentially unsafe characters to their HTML-encoded equivalent.
If the string to be encoded is not Double-Byte Character Set (DBCS), HTMLEncode converts characters as follows:
The less-than character (<) is converted to <.
The greater-than character (>) is converted to >.
The ampersand character (&) is converted to &.
The double-quote character (") is converted to ".
Any ASCII code character whose code is greater-than or equal to 0x80 is converted to &#<number>, where <number> is the ASCII character value.
If the string to be encoded is DBCS, HTMLEncode converts characters as follows:
All extended characters are converted.
Any ASCII code character whose code is greater-than or equal to 0x80 is converted to &#<number>, where <number> is the ASCII character value.
Half-width Katakana characters in the Japanese code page are not converted.
At the moment when send the ResponseText this happens;
ResponseText
<boolean xmlns="http://somewebsite.com/">true</boolean>
Output at client
ok.true
If you use Server.HTMLEncode() it will be;
ResponseText (HTML Encoded)
<boolean xmlns="e;http://somewebsite.com/"e;>true</boolean>
Output at client
ok.<boolean xmlns="http://somewebsite.com/">true</boolean>
So I think I may have miss understood what you require. If you want to parse the XML to remove it so only the content of;
<boolean xmlns="http://somewebsite.com/">true</boolean>
is returned to the browser in this case true then you need to parse the XML using something like XPath to get the underlying value.
'After the initial Send()
Dim xml, root
If obj1.Status = 200 Then
Set xml = obj1.ResponseXML
Call xml.SetProperty("SelectionLanguage", "XPath")
Set root = xml.DocumentElement
Call Response.Write(root.SelectSingleNode("boolean").Text)
End If
Related
In jmeter, I need to extract digits which comes after 36th character.
Example
Response: {"data":{"paymentId":"DOM1234567890111243"}}
I need to extract :11243 (Sometimes it will be only 1 or 2 or 3 or 4 digits)
Right boundary : DOM12345678901 Keeps changing too.But the right boundary length will be 36 charters always.
Any help will be higly appreciated.
Your response data seems to be JSON therefore I wouldn't rely on this "36 characters" as it's format might be different.
I would suggest extracting this paymentId value first and then apply a regular expression onto this DOMxxx bit.
Add JSR223 PostProcessor as a child of the request which returns the above data
Put the following code into "Script" area:
def dom = new groovy.json.JsonSlurper().parse(prev.getResponseData()).data.paymentId
log.info("DOM: " + dom)
def myValue = ((dom =~ ".{14}(\\d+)")[0][1]) as String
log.info("myValue: " + myValue)
vars.put("myValue", myValue)
That's it, you should be able to access the extracted data as ${myValue} where required.
More information:
Groovy: Parsing and producing JSON
Groovy: Match Operator
Apache Groovy - Why and How You Should Use It
If there isn't anything else in the string you're checking, you could use something like:
.{36}(\d+)
The first group of this regex will be the number you're looking for.
Test and explanation: https://regex101.com/r/iDOO8T/2
Consider I want to scrape a site which contains the following HTML:
<a id="mylink" href="http://www.sainsburys.co.uk/shop/gb/groceries/chablis/chablis-premi%C3%A8r-cru-brocard-75cl">
This href is the percent encoding of the utf8 byte string representation of u'https://www.sainsburys.co.uk/shop/gb/groceries/chablis/chablis-premièr-cru-brocard-75cl'
I get the href with Scrapy like this:
u = response.xpath('//a[id="mylink"]/#href').extract_first()
Scrapy sets the variable u as
u'http://www.sainsburys.co.uk/shop/gb/groceries/chablis/chablis-premi%C3%A8r-cru-brocard-75cl'
Notice that it has incorrectly interpreted the page's byte string (that represented a unicode string) as a unicode string itself and as such it is the wrong unicode object with different unicode chars:
In [67]: print urllib.unquote(x)
http://www.sainsburys.co.uk/shop/gb/groceries/chablis/chablis-premièr-cru-brocard-75cl
What is actually desired is that Scrapy interprets the href as a byte string:
bs = 'http://www.sainsburys.co.uk/shop/gb/groceries/chablis/chablis-premi%C3%A8r-cru-brocard-75cl'
so that this represents the correct unicode object, i.e.
In [70]: print urllib.unquote(bs).decode('utf8')
http://www.sainsburys.co.uk/shop/gb/groceries/chablis/chablis-premièr-cru-brocard-75cl
The only way I've managed to get around this is with a small cleaning function that corrects the "mistake" as follows:
def _deal_with_encoding(url):
# should give no encoding errors since url is ascii
pbs = url.encode('ascii')
# Get a regular (not percent enc) utf8 enc byte str
bs = urllib.unquote(pbs)
# Finally we can decode the utf8 to get correct unicode string
return bs.decode('utf8')
It works but doesn't seem ideal. Is this really the only way?
I'm trying to parse cyrillyc text from the site page and i have that error if i try to print soup.text of the scring which includes closing quotation marks in the word "word"
error 'charmap' codec can't encode character u'\xab' in position 6: charater maps to undefined
The original string page (utf-8)
urllib2.urlopen raw page = bbb = '\xab\x80\xd1\x8c\xc2\xbb'
\xbb and \xab- it's closing quotation mark
I try to convert to unicode by hand (BeautifulSoup does this too)
unicode(bbb, 'utf8', errors='ignore')
But inspite of error key "ignore" unknown elements they still exists int
i get
\xab\u0446\u0435\u0437\u0430\u0440\u044c**\xbb**'
I try to delete all unknown element starting with ^\x with help regular exp, but it's doesn't work
bbb = re.sub(r'[\x00-\x7f]', r' ', bbb)
But inspite of error key "ignore" unknown elements they still exists
u'\xbb' is not an unknown element, there is no problem there. It represents the character U+00BB Right-Pointing Double Angle Quotation Mark. The Unicode string literals u'\xbb' and u'\u00bb' represent the same string.
\x has a different meaning depending on what kind of string literal it is used in. In a byte string, it introduces a hex-encoded byte from 0x00 to 0xFF. In a Unicode string, it introduces a hex-encoded character from U+0000 to U+00FF. When producing the repr() representation of a string, Python prefers to output characters in the range up to U+00FF using \x escapes rather than the arguably-clearer \u escapes, because they're shorter.
The \u and \x are merely alternative ways to refer to a character in the string literal representation; they are not literally part of the value of the string. There is no actual backslash in the value, so you can't use re to try to remove characters that might appear in the repr() form as backslash escapes.
The actual error:
error 'charmap' codec can't encode character u'\xab' in position 6: charater maps to undefined
Is just PrintFails again as usual. Apparently your console is using an encoding that doesn't include the character U+00AB.
If you are using the Windows Command Prompt, you could try to use win-unicode-console as a workaround for the brokenness of that particular console.
In ColdFusion, URLDecode() decodes the URL-encoded string (URL encoding formats some characters with a percent sign and the two-character hexadecimal representation of the character).
Example: %3A is the hex equivalent for ":(colon)" When the URLDecode() is applied on different strings as below,
URLDecode("%3A%") = :% -- Valid
URLDecode("%EE") = � -- this is because EE has no equivalent character.
But I am trying to decode the string "%ara%" which is invalid, the result I am getting is "%ar%". I don't find the 2nd occurrence of character "a".Can anyone explain me why this is happening??
I'm fairly new to Ember, but I'm on v1.12 and struggling with the following problem.
I'm making a template helper
The helper takes the bodies of tweets and HTML anchors around the hashtags and usernames.
The paradigm I'm following is:
use Ember.Handlebars.Utils.escapeExpression(value); to escape the input text
do logic
use Ember.Handlebars.SafeString(value);
However, 1. seems to escape apostrophes. Which means that any sentences I pass to it get escaped characters. How can I avoid this whilst making sure that I'm not introducing potential vulnerabilities?
Edit: Example code
export default Ember.Handlebars.makeBoundHelper(function(value){
// Make sure we're safe kids.
value = Ember.Handlebars.Utils.escapeExpression(value);
value = addUrls(value);
return new Ember.Handlebars.SafeString(value);
});
Where addUrlsis a function that uses a RegEx to find and replace hashtags or usernames. For example, if it were given #emberjs foo it would return #emberjs foo.
The result of the above helper function would be displayed in an Ember (HTMLBars) template.
escapeExpression is designed to convert a string into the representation which, when inserted in the DOM, with escape sequences translated by the browser, will result in the original string. So
"1 < 2"
is converted into
"1 < 2"
which when inserted into the DOM is displayed as
1 < 2
If "1 < 2" were inserted directly into the DOM (eg with innerHTML), it would cause quite a bit of trouble, because the browser would interpret < as the beginning of a tag.
So escapeExpression converts ampersands, less than signs, greater than signs, straight single quotes, straight double quotes, and backticks. The conversion of quotes is not necessary for text nodes, but could be for attribute values, since they may enclosed in either single or double quotes while also containing such quotes.
Here's the list used:
var escape = {
"&": "&",
"<": "<",
">": ">",
'"': """,
"'": "'",
"`": "`"
};
I don't understand why the escaping of the quotes should be causing you a problem. Presumably you're doing the escapeExpression because you want characters such as < to be displayed properly when output into a template using normal double-stashes {{}}. Precisely the same thing applies to the quotes. They may be escaped, but when the string is displayed, it should display fine.
Perhaps you can provide some more information about input and desired output, and how you are "printing" the strings and in what contexts you are seeing the escaped quote marks when you don't want to.