Get second part of a string using RegEx - regex

I have string like this "first#second", and I wonder how to get "second" part without "#" symbol as result of RegEx, not as match capture using brackets
upd: I forgot to add one more special char at the end of string, real string is "first#second*"

Simple regex:
/#(.*)$/
If you really don't want it to be a match capture, and you know there's a # in the string but none in the part you want, you can do
/[^#]*$/
and the whole regex is what you want.

If you must use regex, and you insist on not using capturing groups, you can use lookbehind in flavors that support them like this:
(?<=#).*
Or you can also capture just anything but #, to the end of the string, so something like this:
[^#]*$
The capturing group option, of course, is:
#(.*)
\__/
1
This matches the # too, but group 1 captures the part that you want.
Lastly, a non-regex alternative may look something like this:
secondPart = wholeString.substring( wholeString.indexOf("#") + 1 )
There may be issues with some of these solutions if # can also appear (perhaps escaped) anywhere else in the string.
References
regular-expressions.info
Lookarounds, Brackets for Capturing, Anchors

/[a-z]+#([a-z]+)/

You can use lookaround to exclude parts of an expression.
http://www.regular-expressions.info/lookaround.html

if your using java then
you can consider using Pattern & Matcher class. Pattern gives you a compiled, optimizer version of Regular expression. Matcher gives a complete internals of RE Matches.
Both Pattern.match & String.spilt gives same result where in first is compartively faster.
for e.g)
String s = "first#second#third";
String re = "#";
Pattern p = Pattern.compile(re);
Matcher m = p.matcher();
int ms = 0;
int me = 0;
while( m.find() ) {
System.out.println("start "+m.start()+" end "+ m.end()+" group "+m.group());
me = m.start();
System.out.println(s.substring(ms,me));
ms = m.end();
}
if other language u can consider using back-reference & groups also. if you find any repetitions.

Related

Surrounding one group with special characters in using substitute in vim

Given string:
some_function(inputId = "select_something"),
(...)
some_other_function(inputId = "some_other_label")
I would like to arrive at:
some_function(inputId = ns("select_something")),
(...)
some_other_function(inputId = ns("some_other_label"))
The key change here is the element ns( ... ) that surrounds the string available in the "" after the inputId
Regex
So far, I have came up with this regex:
:%substitute/\(inputId\s=\s\)\(\"[a-zA-Z]"\)/\1ns(/2/cgI
However, when deployed, it produces an error:
E488: Trailing characters
A simpler version of that regex works, the syntax:
:%substitute/\(inputId\s=\s\)/\1ns(/cgI
would correctly inser ns( after finding inputId = and create string
some_other_function(inputId = ns("some_other_label")
Challenge
I'm struggling to match the remaining part of the string, ex. "select_something") and return it as:
"select_something")).
You have many problems with your regex.
[a-zA-Z] will only match one letter. Presumably you want to match everything up to the next ", so you'll need a \+ and you'll also need to match underscores too. I would recommend \w\+. Unless more than [a-zA-Z_] might be in the string, in which case I would do .\{-}.
You have a /2 instead of \2. This is why you're getting E488.
I would do this:
:%s/\(inputId = \)\(".\{-}\)"/\1ns(\2)/cgI
Or use the start match atom: (that is, \zs)
:%s/inputId = \zs\".\{-}"/ns(&)/cgI
You can use a negated character class "[^"]*" to match a quoted string:
%s/\(inputId\s*=\s*\)\("[^"]*"\)/\1ns(\2)/g

How to create "blocks" with Regex

For a project of mine, I want to create 'blocks' with Regex.
\xyz\yzx //wrong format
x\12 //wrong format
12\x //wrong format
\x12\x13\x14\x00\xff\xff //correct format
When using Regex101 to test my regular expressions, I came to this result:
([\\x(0-9A-Fa-f)])/gm
This leads to an incorrect output, because
12\x
Still gets detected as a correct string, though the order is wrong, it needs to be in the order specified below, and in no other order.
backslash x 0-9A-Fa-f 0-9A-Fa-f
Can anyone explain how that works and why it works in that way? Thanks in advance!
To match the \, folloed with x, followed with 2 hex chars, anywhere in the string, you need to use
\\x[0-9A-Fa-f]{2}
See the regex demo
To force it match all non-overlapping occurrences, use the specific modifiers (like /g in JavaScript/Perl) or specific functions in your programming language (Regex.Matches in .NET, or preg_match_all in PHP, etc.).
The ^(?:\\x[0-9A-Fa-f]{2})+$ regex validates a whole string that consists of the patterns like above. It happens due to the ^ (start of string) and $ (end of string) anchors. Note the (?:...)+ is a non-capturing group that can repeat in the string 1 or more times (due to + quantifier).
Some Java demo:
String s = "\\x12\\x13\\x14\\x00\\xff\\xff";
// Extract valid blocks
Pattern pattern = Pattern.compile("\\\\x[0-9A-Fa-f]{2}");
Matcher matcher = pattern.matcher(s);
List<String> res = new ArrayList<>();
while (matcher.find()){
res.add(matcher.group(0));
}
System.out.println(res); // => [\x12, \x13, \x14, \x00, \xff, \xff]
// Check if a string consists of valid "blocks" only
boolean isValid = s.matches("(?i)(?:\\\\x[a-f0-9]{2})+");
System.out.println(isValid); // => true
Note that we may shorten [a-zA-Z] to [a-z] if we add a case insensitive modifier (?i) to the start of the pattern, or just use \p{Alnum} that matches any alphanumeric char in a Java regex.
The String#matches method always anchors the regex by default, we do not need the leading ^ and trailing $ anchors when using the pattern inside it.

Parse string using regex

I need to come up with a regular expression to parse my input string. My input string is of the format:
[alphanumeric].[alpha][numeric].[alpha][alpha][alpha].[julian date: yyyyddd]
eg:
A.A2.ABC.2014071
3.M1.MMB.2014071
I need to substring it from the 3rd position and was wondering what would be the easiest way to do it.
Desired result:
A2.ABC.2014071
M1.MMB.2014071
(?i) will be considered as case insensitive.
(?i)^[a-z\d]\.[a-z]\d\.[a-z]{3}\.\d{7}$
Here a-z means any alphabet from a to z, and \d means any digit from 0 to 9.
Now, if you want to remove the first section before dot, then use this regex and replace it with $1 (or may be \1)
(?i)^[a-z\d]\.([a-z]\d\.[a-z]{3}\.\d{7})$
Another option is replace below with empty:
(?i)^[a-z\d]\.
If the input string is just the long form, then you want everything except the first two characters. You could arrange to substitute them with nothing:
s/^..//
Or you could arrange to capture everything except the first two characters:
/^..(.*)/
If the expression is part of a larger string, then the breakdown of the alphanumeric components becomes more important.
The details vary depending on the language that is hosting the regex. The notations written above could be Perl or PCRE (Perl Compatible Regular Expressions). Many other languages would accept these regexes too, but other languages would require tweaks.
Use this regex:
\w.[A-Z]\d.[A-Z]{3}.\d{7}
Use the above regex like this:
String[] in = {
"A.A2.ABC.2014071", "3.M1.MMB.2014071"
};
Pattern p = Pattern.compile("\\w.[A-Z]\\d.[A-Z]{3}.\\d{7}");
for (String s: in ) {
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println("Result: " + m.group().substring(2));
}
}
Live demo: http://ideone.com/tns9iY

Capture multiple texts.

I have a problem with Regular Expressions.
Consider we have a string
S= "[sometext1],[sometext],[sometext]....,[sometext]"
The number of the "sometexts" is unknown,it's user's input and can vary from one to ..for example,1000.
[sometext] is some sequence of characters ,but each of them is not ",",so ,we can say [^,].
I want to capture the text by some regular expression and then to iterate through the texts in cycle.
QRegExp p=new QRegExp("???");
p.exactMatch(S);
for(int i=1;i<=p.captureCount;i++)
{
SomeFunction(p.cap(i));
}
For example,if the number of sometexts is 3,we can use something like this:
([^,]*),([^,]*),([^,]*).
So,i don't know what to write instead of "???" for any arbitrary n.
I'm using Qt 4.7,I didn't find how to do this on the class reference page.
I know we can do it through the cycles without regexps or to generate the regex itself in cycle,but these solutions don't fit me because the actual problem is a bit more complex than this..
A possible regular expression to match what you want is:
([^,]+?)(,|$)
This will match string that end with a coma "," or the end of the line. I was not sure that the last element would have a coma or not.
An example using this regex in C#:
String textFromFile = "[sometext1],[sometext2],[sometext3],[sometext4]";
foreach (Match match in Regex.Matches(textFromFile, "([^,]+?)(,|$)"))
{
String placeHolder = match.Groups[1].Value;
System.Console.WriteLine(placeHolder);
}
This code prints the following to screen:
[sometext1]
[sometext2]
[sometext3]
[sometext4]
Using an example for QRegex I found online here is an attempt at a solution closer to what you are looking for:
(example I found was at: http://doc.qt.nokia.com/qq/qq01-seriously-weird-qregexp.html)
QRegExp rx( "([^,]+?)(,|$)");
rx.setMinimal( TRUE ); // this is if the Qregex does not understand the +? non-greedy notation.
int pos = 0;
while ( (pos = rx.search(text, pos)) != -1 )
{
someFunction(rx.cap(1));
}
I hope this helps.
We can do that, you can use non-capturing to hook in the comma and then ask for many of the block:
Try:
QRexExp p=new QRegExp("([^,]*)(?:,([^,]*))*[.]")
Non-capturing is explained in the docs: http://doc.qt.nokia.com/latest/qregexp.html
Note that I also bracketed the . since it has meaning in RegExp and you seemed to want it to be a literal period.
I only know of .Net that lets you specify a variable number of captures with a single
expression. Example - (capture.*me)+
It creates a capture object that can be itterated over. Even then it only simulates
what every other regex engine provides.
Most engines provide an incremental match until no matches left from within a
loop. The global flag tells the engine to keep matching from where the last
sucessfull match left off.
Example (in Perl):
while ( $string =~ /([^,]+)/g ) { print $1,"\n" }

Regular Expressions: about Greediness, Laziness and Substrings

I have the following string:
123322
In theory, the regex 1.*2 should match the following:
12 (because * can be zero characters)
12332
123322
If I use the regex 1.*2 it matches 123322.
Using 1.*?2, it will match 12.
Is there a way to match 12332 too?
The perfect thing would be to get all possible matchess in the string (no matter if one match is substring of another)
No, unless there is something else added to the regex to clarify what it should do it will either be greedy or non-greedy. There is no in-betweeny ;)
1(.*?2)*$
you will have multiple captures which you can concatenate to form all possible matches
see here:regex tester
click on 'table' and expand the captures tree
You would need a separate expression for each case, depending on the number of twos you want to match:
1(.*?2){1} #same as 1.*?2
1(.*?2){2}
1(.*?2){3}
...
Generally, this isn't possible. A regex matching engine isn't really designed to find overlapping matches. A quick solution is simply to check the pattern on all substrings manually:
string text = "1123322";
for (int start = 0; start < text.Length - 1; start++)
{
for (int length = 0; length <= text.Length - start; length++)
{
string subString = text.Substring(start, length);
if (Regex.IsMatch(subString, "^1.*2$"))
Console.WriteLine("{0}-{1}: {2}", start, start + length, subString);
}
}
Working example: http://ideone.com/aNKnJ
Now, is it possible to get a whole-regex solution? Mostly, the answer is no. However, .Net does has a few tricks in its sleeve to help us: it allows variable length lookbehind, and allows each capturing group to remember all captures (most engines only return the last match of each group). Abusing these, we can simulate the same for loop inside the regex engine:
string text = "1123322!";
string allMatchesPattern = #"
(?<=^ # Starting at the local end position, look all the way to the back
(
(?=(?<Here>1.*2\G))? # on each position from the start until here (\G),
. # *try* to match our pattern and capture it,
)* # but advance even if you fail to match it.
)
";
MatchCollection matches = Regex.Matches(text, allMatchesPattern,
RegexOptions.ExplicitCapture | RegexOptions.IgnorePatternWhitespace);
foreach (Match endPosition in matches)
{
foreach (Capture startPosition in endPosition.Groups["Here"].Captures)
{
Console.WriteLine("{0}-{1}: {2}", startPosition.Index,
endPosition.Index - 1, startPosition.Value);
}
}
Note that currently there's a small bug there - the engine doesn't try to match the last ending position ($), so you loose a few matches. For now, adding a ! at the end of the string solves that issue.
working example: http://ideone.com/eB8Hb