Regex to upper case not surrounded by single quotes - regex

hello 'this' is my'str'ing
If I have string like this, I'd like to make it all upper case if not surrounded by single quote.
hello 'this' is my'str'ing=>HELLO 'this' IS MY'str'ING
Is there a easy way I can achieve this in node perhaps using regex?

You can use the following regular expression:
'[^']+'|(\w)
Here is a live example:
var subject = "hello 'this' is my'str'ing";
var regex = /'[^']+'|(\w)/g;
replaced = subject.replace(regex, function(m, group1) {
if (!group1) {
return m;
}
else {
return m.toUpperCase();
}
});
document.write(replaced);
Credit of this answer goes to zx81. For more information see the original answer of zx81.

Since Javascript doesn't support lookbehinds, we have to use \B which matches anything a word boundary doesn't match.
In this case, \B' makes sure that ' isn't to the right of anything in \w ([a-zA-Z0-9_]). Likewise, '\B does a similar check to the left.
(?:(.*?)(?=\B'.*?'\B)(?:(\B'.*?'\B))|(.*?)$) (regex demo)
Use a callback function and check to see if the length of captures 1 or 3 is > 0 and if it is, return an uppercase on the match
**The sample uses \U and \L just to uppercase and lowercase the related matches. Your callback need not ever effect $2's case, so "Adam" can stay "Adam", etc.
Unrelated, but a note to anyone who might be trying to do this in reverse. it's much easier to the the REVERSE of this:
(\B'.+?'\B) regex demo

Related

Regex remove previous character if followed by a character

I'm working on a regex expression. I have two words 1234a and 1234. If ‘a’ is there I want it to return just 123. If ‘a’ is not there then I want it to return 1234.
Since regex engine starts from left to right position, I can’t backtrack to remove 4 if ‘a’ is present. Is it possible to do this in regex? Any help/suggestion is appreciated.
UPDATE:Toto's answer works good.But as an extension of the above problem if the word is test1234asample I need it to return test123 if 'a' is there else if 'a' is not there return test1234.I tried to modify the regex from Toto but it highlights everything.
If your tool supports lookahead, use:
\b([^\Wa]+(?=[^\Wa]a.*)|\w+$)
Demo & explanation
This is a different approach than regex. For regex, you can refer to Toto's comment/answer
function getString(str) {
return str.includes('a') ? str.substring(0, str.indexOf('a') - 1) : str;
}
let str1 = '1234a';
let str2 = '1234';
let str3 = '1234a12312';
console.log(getString(str1));
console.log(getString(str2));
console.log(getString(str3));
One way of doing the exact replacement asked for in OP's would be a simple s/.a// substitution that replaces any substring of length 2 that finishes in 'a' with the empty string. No look ahead or backtrack required.

Regex Find English char in text need more than 3

I want to validate a text that need have more than 3 [aA-zZ] chars, not need continous.
/^(?![_\-\s0-9])(?!.*?[_\-\s]$)(?=.*[aA-zZ]{3,})[_\-\sa-zA-Z0-9]+$/.test("aaa123") => return true;
/^(?![_\-\s0-9])(?!.*?[_\-\s]$)(?=.*[aA-zZ]{3,})[_\-\sa-zA-Z0-9]+$/.test("a1b2c3") => return false;
Can anybody help me?
How about replacing and counting?
var hasFourPlusChars = function(str) {
return str.replace(/[^a-zA-Z]+/g, '').length > 3;
};
console.log(hasFourPlusChars('testing1234'));
console.log(hasFourPlusChars('a1b2c3d4e5'));
You need to group .* and [a-zA-Z] in order to allow optional arbitrary characters between English letters:
^(?![_\-\s0-9])(?!.*?[_\-\s]$)(?=(?:.*[a-zA-Z]){3,})[_\-\sa-zA-Z0-9]+$
^^^ ^
Add this
Demo:
var re = /^(?![_\-\s0-9])(?!.*?[_\-\s]$)(?=(?:.*[aA-zZ]){3,})[_\-\sa-zA-Z0-9]+$/;
console.log(re.test("aaa123"));
console.log(re.test("a1b2c3"));
By the way, [aA-zZ] is not a correct range definition. Use [a-zA-Z] instead. See here for more details.
Correction of the regex
Your repeat condition should include the ".*". I did not check if your regex is correct for what you want to achieve, but this correction works for the following strings:
$testStrings=["aaa123","a1b2c3","a1b23d"];
foreach($testStrings as $s)
var_dump(preg_match('/^(?![_\-\s0-9])(?!.*?[_\-\s]$)(?=.*[a-zA-Z]){3,}[_\-\sa-zA-Z0-9]+$/', $s));
Other implementations
As the language seems to be JavaScript, here is an optimised implementation for what you want to achieve:
"a24be4Z".match(/[a-zA-Z]/g).length>=3
We get the list of all matches and check if there are at least 3.
That is not the "fastest" way as the result needs to be created.
)
/(?:.*?[a-zA-Z]){3}/.test("a24be4Z")
is faster. ".*?" avoids that the "test" method matches all characters up to the end of the string before testing other combinations.
As expected, the first suggestion (counting the number of matches) is the slowest.
Check https://jsperf.com/check-if-there-are-3-ascii-characters .

Parsing of a string with the length specified within the string

Example data:
029Extract this specific string. Do not capture anything else.
In the example above, I would like to capture the first n characters immediately after the 3 digit entry which defines the value of n. I.E. the 29 characters "Extract this specific string."
I can do this within a loop, but it is slow. I would like (if it is possible) to achieve this with a single regex statement instead, using some kind of backreference. Something like:
(\d{3})(.{\1})
With perl, you can do:
my $str = '029Extract this specific string. Do not capture anything else.';
$str =~ s/^(\d+)(.*)$/substr($2,0,$1)/e;
say $str;
output:
Extract this specific string.
You can not do it with single regex, while you can use knowledge where regex stop processing to use substr. For example in JavaScript you can do something like this http://jsfiddle.net/75Tm5/
var input = "blahblah 011I want this, and 029Extract this specific string. Do not capture anything else.";
var regex = /(\d{3})/g;
var matches;
while ((matches = regex.exec(input)) != null) {
alert(input.substr(regex.lastIndex, matches[0]));
}
This will returns both lines:
I want this
Extract this specific string.
Depending on what you really want, you can modify Regex to match only numbers starting from line beginning, match only first match etc
Are you sure you need a regex?
From https://stackoverflow.com/tags/regex/info:
Fools Rush in Where Angels Fear to Tread
The tremendous power and expressivity of modern regular expressions
can seduce the gullible — or the foolhardy — into trying to use
regular expressions on every string-related task they come across.
This is a bad idea in general, ...
Here's a Python three-liner:
foo = "029Extract this specific string. Do not capture anything else."
substr_len = int(foo[:3])
print foo[3:substr_len+3]
And here's a PHP three-liner:
$foo = "029Extract this specific string. Do not capture anything else.";
$substr_len = (int) substr($foo,0,3);
echo substr($foo,3,substr_len+3);

Capture multiple texts.

I have a problem with Regular Expressions.
Consider we have a string
S= "[sometext1],[sometext],[sometext]....,[sometext]"
The number of the "sometexts" is unknown,it's user's input and can vary from one to ..for example,1000.
[sometext] is some sequence of characters ,but each of them is not ",",so ,we can say [^,].
I want to capture the text by some regular expression and then to iterate through the texts in cycle.
QRegExp p=new QRegExp("???");
p.exactMatch(S);
for(int i=1;i<=p.captureCount;i++)
{
SomeFunction(p.cap(i));
}
For example,if the number of sometexts is 3,we can use something like this:
([^,]*),([^,]*),([^,]*).
So,i don't know what to write instead of "???" for any arbitrary n.
I'm using Qt 4.7,I didn't find how to do this on the class reference page.
I know we can do it through the cycles without regexps or to generate the regex itself in cycle,but these solutions don't fit me because the actual problem is a bit more complex than this..
A possible regular expression to match what you want is:
([^,]+?)(,|$)
This will match string that end with a coma "," or the end of the line. I was not sure that the last element would have a coma or not.
An example using this regex in C#:
String textFromFile = "[sometext1],[sometext2],[sometext3],[sometext4]";
foreach (Match match in Regex.Matches(textFromFile, "([^,]+?)(,|$)"))
{
String placeHolder = match.Groups[1].Value;
System.Console.WriteLine(placeHolder);
}
This code prints the following to screen:
[sometext1]
[sometext2]
[sometext3]
[sometext4]
Using an example for QRegex I found online here is an attempt at a solution closer to what you are looking for:
(example I found was at: http://doc.qt.nokia.com/qq/qq01-seriously-weird-qregexp.html)
QRegExp rx( "([^,]+?)(,|$)");
rx.setMinimal( TRUE ); // this is if the Qregex does not understand the +? non-greedy notation.
int pos = 0;
while ( (pos = rx.search(text, pos)) != -1 )
{
someFunction(rx.cap(1));
}
I hope this helps.
We can do that, you can use non-capturing to hook in the comma and then ask for many of the block:
Try:
QRexExp p=new QRegExp("([^,]*)(?:,([^,]*))*[.]")
Non-capturing is explained in the docs: http://doc.qt.nokia.com/latest/qregexp.html
Note that I also bracketed the . since it has meaning in RegExp and you seemed to want it to be a literal period.
I only know of .Net that lets you specify a variable number of captures with a single
expression. Example - (capture.*me)+
It creates a capture object that can be itterated over. Even then it only simulates
what every other regex engine provides.
Most engines provide an incremental match until no matches left from within a
loop. The global flag tells the engine to keep matching from where the last
sucessfull match left off.
Example (in Perl):
while ( $string =~ /([^,]+)/g ) { print $1,"\n" }

What is wrong with the below regular expression(c#3.0)

Consider the below
Case 1: [Success]
Input : X(P)~AK,X(MV)~AK
Replace with: AP
Output: X(P)~AP,X(MV)~AP
Case 2: [Failure]
Input: X(P)~$B,X(MV)~$B
Replace with: C$
Output: X(P)~C$,X(MV)~C$
Actual Output: X(P)~C$B,X(MV)~C$B
I am using the below REGEXP
#"~(\w*[A-Z$%])"
This works fine for case 1 but falied for the second.
Need help
I am using C#3.0
Thanks
It's unclear what exactly your matching requirements are, but changing the regex to #"~(\w*[A-Z$%]+)" should do the trick. (For the examples given, just plain #"~([A-Z$%]+)" should work too.)
It looks like you want something like this:
public static String replaceWith(String input, String repl) {
return Regex.Replace(
input,
#"(?<=~)[A-Z$%]+",
repl
);
}
The (?<=…) is what is called a lookbehind. It's used to assert that to the left there's a tilde, but that tilde is not part of the match.
Now we can test it as follows (as seen on ideone.com):
Console.WriteLine(replaceWith(
"X(P)~AK,X(MV)~AK", "AP"
));
// X(P)~AP,X(MV)~AP
Console.WriteLine(replaceWith(
"X(P)~$B,X(MV)~$B", "C$"
));
// X(P)~C$,X(MV)~C$
Console.WriteLine(replaceWith(
"X(P)~THIS,X(MV)~THAT", "$$$$"
));
// X(P)~$$,X(MV)~$$
Note the last example: $ is a special symbol in substitutions and can have special meanings. $$ actually gets you one dollar sign.
Related questions
How does the regular expression (?<=#)[^#]+(?=#) work?
Your expression (being greedy) replaces the first string that starts with zero or more work characters that ends in [A-Z$%] after an '~' with your substitution.
In the first case you have ~AK, so \w*[A-Z$%] evaluates to the 'AK', matching \w* -> A, and [A-Z$%] -> K
In the second case you cae ~$C so \w*[A-Z$%] evaluates to '$', matching \w* -> nothing, and [A-Z$%] -> $
I think the important thing is that \w is optional (zero or more), but the [A-Z$%] is mandatory. This is why the second case gives '$' not '$C' as the matched part.
Since I don't know what you're trying to achieve I cannot tell you how to fix your expression.