Regex that stops on concatenated strings? - regex

So, I have a for each that goes through all of our pages' Regex.Matches for manipulation... The problem is that I can't seem to correctly create a regex for certain strings. The goal is essentially to grab:
[ControlName].Text = "Static ""Text"" Here!"
As shown below, I'm able to do so... However, I'm getting issue when it comes to concatenated strings, I'd like to stop matching after the final closing quotation mark before concatenation. (see Goals below)
Current Regex: \w+\.Text = (".*?"+?.*"?)
Currently Hits:
lblError.Text = "Error on: ""Navigation Admin"" page."
lblError.Text = "Error on:" & "Navigation Admin"
lblError.Text = "Error on:" & ""Navigation Admin"" page."
Goal:
lblError.Text = "Error on: ""Navigation Admin"" page."
lblError.Text = "Error on:" & "Navigation Admin"
lblError.Text = "Error on:" & ""Navigation Admin"" page."
I'm quite possibly underthinking this, overthinking this and/or horrible at regexes in general. Any advice/tips would be appreciated!

here is the way to match a quoted string including escaped quotes (2 consecutive quotes):
\w+\.Text = ("[^"]*(?:""[^"]*)*")

Related

Minify JSON with regex

Problem Description
I want to minify a JSON. Meaning:
Desired Result
Before
{
"keyOne": "First Value",
"keyTwo": "Second Value"
}
After
{"keyOne": "First Value", "keyTwo": "Second Value"}
I want to achieve this using RegEx.
What I tried is to replace \s with an empty string. But this leads to the unwanted result that whitespaces also gets removed from values:
Result of Solution attempt
Before
{
"keyOne": "First Value",
"keyTwo": "Second Value"
}
After
{"keyOne": "FirstValue", "keyTwo": "SecondValue"}
Research done / Solution attempts
Searching Google and Stack Overflow, without success since all found questions target other use cases
Honestly just fooling around with basic RegEx knowledge
To clarify the question: I do not want to do this in JavaScript. I know I can go to the console and run something like copy(JSON.stringify(<the-json>)).
I want to quickly do this in an editor, in this case Webstorm using the Replace Tool – without installing any plugins or switching tools.
Final solution
To steps are needed:
Replace \n with an empty string. This removes linebreaks
Replace \s+" with " to remove whitespances.
You need two steps to achieve that in webstrom:
first replace \n with (nothing!) to remove line breaks;
then \s{2}" with " to remove two whitespaces before each key;
The way object is shown in JS isn't related to the way you can handle it;
{\n
"keyOne": "First Value",\n
"keyTwo": "Second Value"\n
}
the \n characters here are shown to make it more human readable, they don't actually exist in the object itself;
const data = {
"keyOne": "First Value",
"keyTwo": "Second Value"
};
console.log(JSON.stringify(data))
thus you can't apply regex to a regular JavaScript object;
however (very unlikely) if you have a string representation of an object (from somewhere?)
you can apply regex to achieve the result you want like shown in the snippet below:
const strObj = `{
"keyOne": "First Value",
"keyTwo": "Second Value"
}`;
//since it is string we can't access it like normal js objects
console.log(strObj["keyOne"]);
console.log(typeof strObj, strObj);
//replacing the new line with nothing to make it linear
let result = strObj.replace(/\n/g,"")
console.log(typeof result, result);
//casting result to a valid json to an actual js object
let castedResult = JSON.parse(result);
// it will be shown as human readable since its normal object :)
console.log(typeof castedResult,castedResult);
//accessing one its attributes since its normal object now
console.log(castedResult.keyOne);

Why does Selenium inserts a backslash before the Hyphen character?

The following Code fails in IE and Firefox. Never had a problem with Chrome.
foundElement = driver.FindElement(By.Id("btn-GDR"));
It says couldn't find the element #btn\-GDR
Why is Selenium inserting a \ before the -?
Firefox 65.0.2 Version
IE 11.0.9600.19301
EDIT: More Info: I've tried using
"btn\x2dGDR" meaning \x2d is the "-" symbol (ASCII in HEX) but it does not solve the problem. It always insert a "\" before it.
As Selenium converts the different Locator Strategies into it's effective CSS selectors as per the switch - cases the values of class name, id, name, tag name, etc are converted through:
cssEscape(value);
The cssEscape(value) is defined as:
private String cssEscape(String using) {
using = using.replaceAll("([\\s'\"\\\\#.:;,!?+<>=~*^$|%&#`{}\\-\\/\\[\\]\\(\\)])", "\\\\$1");
if (using.length() > 0 && Character.isDigit(using.charAt(0))) {
using = "\\" + Integer.toString(30 + Integer.parseInt(using.substring(0,1))) + " " + using.substring(1);
}
return using;
}
Hence you see the - character being escaped by the \ character.
I will answer my own question since I've found the solution.
I added a wait before finding the element.
WebDriverWait wait = new WebDriverWait(driver, TimeSpan.FromSeconds(15));
wait.Until(SeleniumExtras.WaitHelpers.ExpectedConditions.
PresenceOfAllElementsLocatedBy(By.Id("btn-GDR")));
Turns out that sometimes the element is not present for some strange reason.. I can see it on screen but it takes 2-3 secs for Selenium to properly being able to interact with it. Yes, the element is always visible, enabled and it does exits. Also, when reporting the options Selenium reports adds the backslash before the hyphen to the output message.
FYI I've found the same case here. It was unanswered.
Similar Problem

Parse Or Regex Query Gives Geoquery Error

I am using a Parse OR query to find user matches based on regex and emails within an array. The regex can be quite long as I am generating it from names in the user's address book. My code is below:
PFQuery *emailQuery = [WPUser query];
[emailQuery whereKey:#"email" containedIn:emails];
PFQuery *nameQuery = [WPUser query];
[nameQuery whereKey:#"name" matchesRegex:regex modifiers:#"i"];
PFQuery *query = [PFQuery orQueryWithSubqueries:#[emailQuery, nameQuery]];
[query whereKey:#"objectId" notEqualTo:[WPUser currentUser].objectId];
[query whereKeyExists:#"signedUp"];
[query findObjectsInBackgroundWithBlock:^(NSArray * _Nullable objects, NSError * _Nullable error) {
self.registeredContacts = objects;
BLOCK_ON_MAINTHREAD()
}];
The regex format is (^First.*Last$)|(^First.*Last$) etc. for each name in the address book. I use the i modifier to make it case insensitive.
However, I get a weird error with this query and it seems to have only begun recently: [Error]: geo query within or is not supported (Code: 102, Version: 1.14.2). I am not adding any geo constraints to this query as you can see. If my regex somehow causing Parse to add a geoquery? If I comment out the line of matchesRegex:modifiers: then the query returns as normal...however I am obviously losing the functionality I need.
I do not have symbols or anything as I am also validating names with an NSCharacterSet.
NSMutableCharacterSet *validCharacters = [NSMutableCharacterSet letterCharacterSet];
[validCharacters formUnionWithCharacterSet:[NSCharacterSet whitespaceCharacterSet]];
Why is Parse giving me an error that has no relation to my actual query? If it is related to my regex, any ideas on avoiding it?
I have some Chinese names in my address book and for some reason these names were causing the error I added a line in checking my regex to skip these names and the error is gone.
For reference I added a BOOL check:
BOOL isLatin = [regexName canBeConvertedToEncoding:NSISOLatin1StringEncoding];

awk/regex: parsing error logs not always returned error description

I recently asked for help to parse out Java error stacks from a group of log files and got a very nice solution at the link below (using awk).
Pull out Java error stacks from log files
I marked the question answered and after some debugging and studying I found a few potential issues and since they are unrelated to my initial question but rather due to my limited understanding of awk and regular expressions, I thought it might be better to ask a new question.
Here is the solution:
BEGIN{ OFS="," }
/[[:space:]]+*<Error / {
split("",n2v)
while ( match($0,/[^[:space:]]+="[^"]+/) ) {
name = value = substr($0,RSTART,RLENGTH)
sub(/=.*/,"",name)
sub(/^[^=]+="/,"",value)
$0 = substr($0,RSTART+RLENGTH)
n2v[name] = value
print name value
}
code = n2v["ErrorCode"]
desc[code] = n2v["ErrorDescription"]
count[code]++
if (!seen[code,FILENAME]++) {
fnames[code] = (code in fnames ? fnames[code] ", " : "") FILENAME
}
}
END {
print "Count", "ErrorCode", "ErrorDescription", "Files"
for (code in desc) {
print count[code], code, desc[code], fnames[code]
}
}
One issue I am having with it is that not all ErrorDescriptions are being captured. For example, this error description appears in the output of this script:
ErrorDescription="Database Error."
But this error description does not appear in the results (description copied from actual log file):
ErrorDescription="Operation not allowed for reason code "7" on table "SCHEMA.TABLE".. SQLCODE=-668, SQLSTATE=57016, DRIVER=4.13.127"
Nor does this one:
ErrorDescription="Cannot Find Person For Given Order."
It seems that most error descriptions are not being returned by this script but do exist in the log file. I don't see why some error descriptions would appear and some not. Does anyone have any ideas?
EDIT 1:
Here is a sample of the XML I am parsing:
<Errors>
<Error ErrorCode="ERR_0139"
ErrorDescription="Cannot Find Person For Given Order." ErrorMoreInfo="">
...
...
</Error>
</Errors>
The pattern in the script will not match your data:
/[[:space:]]+*<Error / {
Details:
The "+" tells it to match at least one space.
The space after "Error" tells it to match another space - but your data has no space before the "=".
The "<" is unnecessary (but not part of the problem).
This would be a better pattern:
/^[[:space:]]*ErrorDescription[[:space:]]*=[[:space:]]*".*"/
This regex would only match the error description.
ErrorDescription="(.+?)"
It uses a capturing group to remember your error description.
Demo here. (Tested against a combination of your edit and your previous question error log.)

How can I make a regex to find instances of the word Project not in square brackets?

For example:
$lang['Select Project'] = 'Select Project OK';
$lang['Project'] = 'Project';
I want to find only the instances of the word 'Project' not contained within the square brackets.
I'm using ColdFusion studio's extended replace utility to do a global replace.
Any suggestions?
Code Sample Follows:
<?php
$lang['Project Message Board'] = 'Project Message Board';
$lang['Project'] = 'Project';
$lang['Post Message'] = 'Post Message';
$lang['To'] = 'To';
$lang['Everyone'] = 'Everyone';
$lang['From'] = 'From';
$lang['Private Messsage'] = 'Private Messsage';
$lang['Note: Only private message to programmer'] = '[ Note: Please enter programmers id for private message with comma separate operator ]';
$lang['Select Project'] = 'Select Project';
$lang['message_validation'] = 'Message';
$lang['You must be logged in as a programmer to post messages on the Project Message Board'] = 'You must be logged in as a programmer to post messages on the Project Message Board';
$lang['Your Message Has Been Posted Successfully'] = 'Your message has been posted successfully';
$lang['You must be logged to post messages on the Project Message Board'] = 'You must be logged to post messages on the Project Message Board';
$lang['You must be post project to invite programmers'] = 'You must be post project to invite programmers';
$lang['You must be logged to invite programmers'] = 'You must be logged to invite programmers';
$lang['There is no open project to Post Mail'] = 'There is no open project to Post Mail';
$lang['You are currently logged in as']='You are currently logged in as';
$lang['Tip']='Tip: You can post programming code by placing it within [code] and [/code] tags.';
$lang['Submit']='Submit';
$lang['Preview']='Preview';
$lang['Hide']='Hide';
$lang['Show']='Show';
$lang['You are currently logged in as']='You are currently logged in as';
A regexp for 'Project' to the right of an equals sign would be:
/=.*Project/
a regexp that also does what you ask for, 'Project' that has no equals sign to its right would be:
/Project[^=]*$/
or a match of your example lines comes to:
/^\$lang['[^']+']\s+=\s+'Project';$/
By placing 'Project' in brackets () you can use that match in a replacement, adding the flag /g finds all occurences in the line.
Edit: Below didn't work because look-behind assertions have to be fixed-length. I am guessing that you want to do this because you want to do a global replace of "Project" with something else. In that case, borrowing rsp's idea of matching a 'Project' that is not followed by an equals sign, this should work:
/Project(?![^=]*\=)/
Here is some example code:
<?php
$str1 = "\$lang['Select Project'] = 'Select Project OK';";
$str2 = "\$lang['Project'] = 'Project';";
$str3 = "\$lang['No Project'] = 'Not Found';";
$str4 = "\$lang['Many Project'] = 'Select Project owner or Project name';";
$regex = '/Project(?![^=]*\=)/';
echo "<pre>\n";
//prints: $lang['Select Project'] = 'Select Assignment OK';
echo preg_replace($regex, 'Assignment', $str1) . "\n";
//prints: $lang['Project'] = 'Assignment';
echo preg_replace($regex, 'Assignment', $str2) . "\n";
//prints: $lang['No Project'] = 'Not Found';
echo preg_replace($regex, 'Assignment', $str3) . "\n";
//prints: $lang['Many Project'] = 'Select Assignment owner or Assignment name';
echo preg_replace($regex, 'Assignment', $str4) . "\n";
This should work:
/(?<=\=.*)Project/
That will match only the word "Project" if it appears after an equals sign. This means you could use it in a substitution too, if you want to replace "Project" on the right-hand-side with something else.
Thx for help. Not sure what is unclear? I just want to find all instances of the word 'Project' but only instances to the right of the equals sign (i.e. not included in square brackets). Hope that helps.
This actually looks like a tricky problem. Consider
[blah blah [yakkity] Project blah] Project [blah blah] [ Project
This is a parsing problem, and I don't know of any way to do it with one regex (but would be glad to learn one!). I'd probably do it procedurally, eliminating the pairs of brackets that did not contain other pairs until there were none left, then matching "Project".
While it's not clear what instances you want to find exactly, this will do:
^.+? = (.+?);
But you might consider using simple string manipulation of your language of choice.
edit
^.+?=.+?(Project).+?;$
will only match lines that have string Project after the equal sign.
[^\[]'[^'\[\]]+'[^\]] seems to accomplish what you want!
This one: [^\[]'[^'\[\]]*Project[^'\[\]]*' will find all strings, not inside of the file that are contained in quotes, and contain the word project.
Another edit: [^\[]'(?<ProjectString>[^'\[\]]*Project[^'\[\]]*)'[^\]]
This one matches the string, and returns it as the group "ProjectString". Any regex library should be able to pull that out sufficiently.