Issue with evaluating ' regexp ' for '\c+' , ' \i\c* ' and ' [\i-[:]][\c-[:]]* ' - regex

I Am working on a TCL GUI, and I obtain the Data Tree structure for the GUI from a XML Schema, and I have to validate the entry fields fro the restrictions as in the XML Schema. In the XML Schema I am working with I have the simple types NMTOKEN Name and NCName with pattern restrictions '\c+' , '\i\c*' and '[\i-[:]][\c-[:]]*' respectively.
The code i use to check is
method validatePatternValue { value } {
set patternCheck 1
set pattern "^($patternValue)\$"
set patternCheck [regexp $pattern $value]
if {$patternCheck == 0} {
tk_messageBox -message "Only Characters within range $patternValue for $patternValueType is\
accepted "
return 0
}
return 1
}
and whenever the $pattern is one of these '\c+' , '\i\c*' and '[\i-[:]][\c-[:]]*' my text field does not accept any input and keeps throwing an error exception dialogue.
Just to add some more info, I came across this website, with some good info regarding my question about processing combinations of '\i' and '\c'. But is there no other way apart from the one suggested in the following link : XML Schema Character Classes

The \c escape sequence does not do in Tcl regexp what it does in XML-Schema regexp.
In XML Schema
\c matches any character that may occur after the first character in
an XML name, i.e. [-._:A-Za-z0-9]
In Tcl
\cX (where X is any character) the character whose low-order 5 bits
are the same as those of X, and whose other bits are all zero
It's also clearly stated in the link you sent
Note that the \c shorthand syntax conflicts with the control character
syntax used in many other regex flavors.
You should try using [-.:\w] instead of \c
The same is true for \i, it's not doing the same in Tcl and in XML

Related

Regular Expression to extract the digits comes after 36th character in a String

In jmeter, I need to extract digits which comes after 36th character.
Example
Response: {"data":{"paymentId":"DOM1234567890111243"}}
I need to extract :11243 (Sometimes it will be only 1 or 2 or 3 or 4 digits)
Right boundary : DOM12345678901 Keeps changing too.But the right boundary length will be 36 charters always.
Any help will be higly appreciated.
Your response data seems to be JSON therefore I wouldn't rely on this "36 characters" as it's format might be different.
I would suggest extracting this paymentId value first and then apply a regular expression onto this DOMxxx bit.
Add JSR223 PostProcessor as a child of the request which returns the above data
Put the following code into "Script" area:
def dom = new groovy.json.JsonSlurper().parse(prev.getResponseData()).data.paymentId
log.info("DOM: " + dom)
def myValue = ((dom =~ ".{14}(\\d+)")[0][1]) as String
log.info("myValue: " + myValue)
vars.put("myValue", myValue)
That's it, you should be able to access the extracted data as ${myValue} where required.
More information:
Groovy: Parsing and producing JSON
Groovy: Match Operator
Apache Groovy - Why and How You Should Use It
If there isn't anything else in the string you're checking, you could use something like:
.{36}(\d+)
The first group of this regex will be the number you're looking for.
Test and explanation: https://regex101.com/r/iDOO8T/2

Wrong regexp query for elasticsearch

I have some problems with the regexp query for elasticsearch. In my index there's a text field with comma-separated numeric values (IDs), f.e.
2,140,3,2495
And I have the following query term:
"regexp" : {
"myIds" : {
"value" : "^2495,|,2495,|,2495$|^2495$",
"boost" : 1
}
}
But my result list is empty.
Let me say that I know that regexp queries are kind of slow but the index still exists and is filled with millions of documents so unfortunately it's not an option to restructure it. So I need a regex solution.
In ElasticSearch regex, patterns are anchored by default, the ^ and $ are treated as literal chars.
What you mean to use is "2495,.*|.*,2495,.*|.*,2495|2495" - 2495, at the start of string, ,2495, in the middle, ,2495 at the end or a whole string equal to 2495.
Or, you may use a simpler
"(.*,)?2495(,.*)?"
That means
(.*,)? - an optional text (not including line breaks) ending with ,
2495 - your value
(,.*)? - an optional text (not including line breaks) ending with ,
Here is an online demo showing how this expression works (not a proof though).
Ok, I got it to work but run in another problem now. I built the string as follows:
(.*,)?2495(,.*)?|(.*,)?10(,.*)?|(.*,)?898(,.*)?
It works good for a few IDs but if I have let's say 50 IDs, then ES throws an exception which says that the regexp is too complex to process.
Is there a way to simplify the regexp or restructure the query it selves?

How to stop Ember.Handlebars.Utils.escapeExpression escaping apostrophes

I'm fairly new to Ember, but I'm on v1.12 and struggling with the following problem.
I'm making a template helper
The helper takes the bodies of tweets and HTML anchors around the hashtags and usernames.
The paradigm I'm following is:
use Ember.Handlebars.Utils.escapeExpression(value); to escape the input text
do logic
use Ember.Handlebars.SafeString(value);
However, 1. seems to escape apostrophes. Which means that any sentences I pass to it get escaped characters. How can I avoid this whilst making sure that I'm not introducing potential vulnerabilities?
Edit: Example code
export default Ember.Handlebars.makeBoundHelper(function(value){
// Make sure we're safe kids.
value = Ember.Handlebars.Utils.escapeExpression(value);
value = addUrls(value);
return new Ember.Handlebars.SafeString(value);
});
Where addUrlsis a function that uses a RegEx to find and replace hashtags or usernames. For example, if it were given #emberjs foo it would return #emberjs foo.
The result of the above helper function would be displayed in an Ember (HTMLBars) template.
escapeExpression is designed to convert a string into the representation which, when inserted in the DOM, with escape sequences translated by the browser, will result in the original string. So
"1 < 2"
is converted into
"1 < 2"
which when inserted into the DOM is displayed as
1 < 2
If "1 < 2" were inserted directly into the DOM (eg with innerHTML), it would cause quite a bit of trouble, because the browser would interpret < as the beginning of a tag.
So escapeExpression converts ampersands, less than signs, greater than signs, straight single quotes, straight double quotes, and backticks. The conversion of quotes is not necessary for text nodes, but could be for attribute values, since they may enclosed in either single or double quotes while also containing such quotes.
Here's the list used:
var escape = {
"&": "&",
"<": "<",
">": ">",
'"': """,
"'": "'",
"`": "`"
};
I don't understand why the escaping of the quotes should be causing you a problem. Presumably you're doing the escapeExpression because you want characters such as < to be displayed properly when output into a template using normal double-stashes {{}}. Precisely the same thing applies to the quotes. They may be escaped, but when the string is displayed, it should display fine.
Perhaps you can provide some more information about input and desired output, and how you are "printing" the strings and in what contexts you are seeing the escaped quote marks when you don't want to.

Parsing log files

I'm trying to write a script to simplify the process of searching through a particular applications log files for specific information. So I thought maybe there's a way to convert them into an XML tree, and I'm off to a decent start....but The problem is, the application log files are an absolute mess if you ask me
Some entries are simple
2014/04/09 11:27:03 INFO Some.code.function - Doing stuff
Ideally I'd like to turn the above into something like this
<Message>
<Date>2014/04/09</Date>
<Time>11:48:38</Time>
<Type>INFO</Type>
<Source>Some.code.function</Source>
<Sub>Doing stuff</Sub>
</Message>
Other entries are something like this where there's additional information and line breaks
2014/04/09 11:27:04 INFO Some.code.function - Something happens
changes:
this stuff happened
I'd like to turn this last chunk into something like the above, but add the additional info into a section
<Message>
<Date>2014/04/09</Date>
<Time>11:48:38</Time>
<Type>INFO</Type>
<Source>Some.code.function</Source>
<Sub>Doing stuff</Sub>
<details>changes:
this stuff happened</details>
</Message>
and then other messages, errors will be in the form of
2014/04/09 11:27:03 ERROR Some.code.function - Something didn't work right
Log Entry: LONGARSEDGUID
Error Code: E3145
Application: Name
Details:
message information etc etc and more line breaks, this part of the message may add up to an unknown number of lines before the next entry
This last chunk I'd like to convert as the last to above examples, but adding XML nodes for log entry, error code, application, and again, details like so
<Message>
<Date>2014/04/09</Date>
<Time>11:48:38</Time>
<Type>ERROR </Type>
<Source>Some.code.function</Source>
<Sub>Something didn't work right</Sub>
<Entry>LONGARSEDGUID</Entry>
<Code>E3145</Code>
<Application>Name</Application>
<details>message information etc etc and more line breaks, this part of the message may add up to an unknown number of lines before the next entry</details>
</Message>
Now I know that Select-String has a context option which would let me select a number of lines after the line I've filtered, the problem is, this isn't a constant number.
I'm thinking a regular expression would also me to select the paragraph chunk before the date string, but regular expressions are not a strong point of mine, and I thought there might be a better way because the one constant is that new entries start with a date string
the idea though is to either break these up into xml or tables of sorts and then from there I'm hoping it might take the last or filtering non relevant or recurring messages a little easier
I have a sample I just tossed on pastebin after removing/replacing a few bits of information for privacy reasons
http://pastebin.com/raw.php?i=M9iShyT2
Sorry this is kind of late, I got tied up with work for a bit there (darn work expecting me to be productive while on their dime). I ended up with something similar to Ansgar Wiechers solution, but formatted things into objects and collected those into an array. It doesn't manage your XML that you added later, but this gives you a nice array of objects to work with for the other records. I'll explain the main RegEx line here, I'll comment in-line where it's practical.
'(^\d{4}/\d{2}/\d{2} \d{2}:\d{2}:\d{2}) [\d+?] (\w+?) {1,2}(.+?) - (.+)$' is the Regex that detects the start of a new record. I started to explain it, but there are probably better resources for you to learn RegEx than me explaining it to me. See this RegEx101.com link for a full breakdown and examples.
$Records=#() #Create empty array that we will populate with custom objects later
$Event = $Null #make sure nothing in $Event to give script a clean start
Get-Content 'C:\temp\test1.txt' | #Load file, and start looping through it line-by-line.
?{![string]::IsNullOrEmpty($_)}|% { #Filter out blank lines, and then perform the following on each line
if ($_ -match '(^\d{4}/\d{2}/\d{2} \d{2}:\d{2}:\d{2}) \[\d+?] (\w+?) {1,2}(.+?) - (.+)$') { #New Record Detector line! If it finds this RegEx match, it means we're starting a new record.
if ($Event) { #If there's already a record in progress, add it to the Array
$Records+=$Event
}
$Event = New-Object PSObject -Property #{ #Create a custom PSObject object with these properties that we just got from that RegEx match
DateStamp = [datetime](get-date $Matches[1]) #We convert the date/time stamp into an actual DateTime object. That way sorting works better, and you can compare it to real dates if needed.
Type = $Matches[2]
Source = $Matches[3]
Message = $Matches[4]}
Ok, little pause for the cause here. $Matches isn't defined by me, why am I referencing it? . When PowerShell gets matches from a RegEx expression it automagically stores the resulting matches in $Matches. So all the groups that we just matched in parenthesis become $Matches[1], $Matches[2], and so on. Yes, it's an array, and there is a $Matches[0], but that is the entire string that was matched against, not just the groups that matched. We now return you to your regularly scheduled script...
} else { #End of the 'New Record' section. If it's not a new record if does the following
if($_ -match "^((?:[^ ^\[])(?:\w| |\.)+?):(.*)$"){
RegEx match again. It starts off by stating that this has to be the beginning of the string with the carat character (^). Then it says (in a non-capturing group noted by the (?:<stuff>) format, which really for my purposes just means it won't show up in $Matches) [^ \[]; that means that the next character can not be a space or opening bracket (escaped with a ), just to speed things up and skip those lines for this check. If you have things in brackets [] and the first character is a carat it means 'don't match anything in these brackets'.
I actually just changed this next part to include periods, and used \w instead of [a-zA-Z0-9] because it's essentially the same thing but shorter. \w is a "word character" in RegEx, and includes letters, numbers, and the underscore. I'm not sure why the underscore is considered part of a word, but I don't make the rules I just play the game. I was using [a-zA-Z0-9] which matches anything between 'a' and 'z' (lowercase), anything between 'A' and 'Z' (uppercase), and anything between '0' and '9'. At the risk of including the underscore character \w is a lot shorter and simpler.
Then the actual capturing part of this RegEx. This has 2 groups, the first is letters, numbers, underscores, spaces, and periods (escaped with a \ because '.' on it's own matches any character). Then a colon. Then a second group that is everything else until the end of the line.
$Field = $Matches[1] #Everything before the colon is the name of the field
$Value = $Matches[2].trim() #everything after the colon is the data in that field
$Event | Add-Member $Field $Value #Add the Field to $Event as a NoteProperty, with a value of $Value. Those two are actually positional parameters for Add-Member, so we don't have to go and specify what kind of member, specify what the name is, and what the value is. Just Add-Member <[string]name> <value can be a string, array, yeti, whatever... it's not picky>
} #End of New Field for current record
else{$Value = $_} #If it didn't find the regex to determine if it is a new field then this is just more data from the last field, so don't change the field, just set it all as data.
} else { #If it didn't find the regex then this is just more data from the last field, so don't change the field, just set it all as data.the field does not 'not exist') do this:
$Event.$Field += if(![string]::isNullOrEmpty($Event.$Field)){"`r`n$_"}else{$_}}
This is a long explanation for a fairly short bit of code. Really all it does is add data to the field! This has an inverted (prefixed with !) If check to see if the current field has any data, if it, or if it is currently Null or Empty. If it is empty it adds a new line, and then adds the $Value data. If it doesn't have any data it skips the new line bit, and just adds the data.
}
}
}
$Records+=$Event #Adds the last event to the array of records.
Sorry, I'm not very good with XML. But at least this gets you clean records.
Edit: Ok, code is notated now, hopefully everything is explained well enough. If something is still confusing perhaps I can refer you to a site that explains better than I can. I ran the above against your sample input in PasteBin.
One possible way to deal with such files is to process them line by line. Each log entry starts with a timestamp and ends when the next line starting with a timestamp appears, so you could do something like this:
Get-Content 'C:\path\to\your.log' | % {
if ($_ -match '^\d{4}/\d{2}/\d{2} \d{2}:\d{2}:\d{2}') {
if ($logRecord) {
# If a current log record exists, it is complete now, so it can be added
# to your XML or whatever, e.g.:
$logRecord -match '^(\d{4}/\d{2}/\d{2}) (\d{2}:\d{2}:\d{2}) (\S+) ...'
$message = $xml.CreateElement('Message')
$date = $xml.CreateElement('Date')
$date.InnerText = $matches[1]
$message.AppendChild($date)
$time = $xml.CreateElement('Time')
$time.InnerText = $matches[2]
$message.AppendChild($time)
$type = $xml.CreateElement('Type')
$type.InnerText = $matches[3]
$message.AppendChild($type)
...
$xml.SelectSingleNode('...').AppendChild($message)
}
$logRecord = $_ # start new record
} else {
$logRecord += "`r`n$_" # append to current record
}
}

Regular Expression to find string in Expect buffer

I'm trying to find a regex that works to match a string of escape characters (an Expect response, see this question) and a six digit number (with alpha-numeric first character).
Here's the whole string I need to identify:
\r\n\u001b[1;14HX76196
Ultimately I need to extract the string:
X76196
Here's what I have already:
interact {
#...
#...
#this expression does not identify the screen location
#I need to find "\r\n\u001b[1;14H" AND "([a-zA-Z0-9]{1})[0-9]{5}$"
#This regex was what I was using before.
-nobuffer -re {^([a-zA-Z0-9]{1})?[0-9]{5}$} {
set number $interact_out(0,string)
}
I need to identify the escape characters to to verify that it is a field in that screen region. So I need a regex that includes that first portion, but the backslashes are confusing me...
Also once I have the full string in the $number variable, how do I isolate just the number in another variable in Tcl?
If you just want the number at the end, then this should be enough...
[0-9]{6}
Update with new information
Assuming \n is a newline character, rather than a literal \ followed by a literal n, you can do this...
\r\n\u001B\[1;14H(X[0-9]{5})
I found out a few things with some more digging. First of all I wasn't looking at the output of the program but the input of the user. I needed to add the "-o" flag to look at the program output. I also shortened the regex to just the necessary part.
The regex example from #rikh led me to look at why his or my own regex was failing, and that was due to the fact that I wasn't looking at the output but the input. So the original regex that I tried wasn't at fault but the data being looked at (missing the "-o" flag)
Here's the complete answer to my problem.
interact {
#...
-o -nobuffer -re {(\[1;14H[a-zA-Z0-9]{1})[0-9]{5}} {
#get number in place
set numraw $interact_out(0,string)
#get just number out
set num [string range $numraw 6 11]
#switch to lowercase
set num [string tolower $num]
send_user " stored number: $num"
}
}
I'm a noob with Expect and Tcl so if any of this doesn't make sense or if you have any more insights into the interact flags, please set me straight.