Get some piece of string regex - regex

I have the strings below and I just want to get the value of string: AttributeReferenceID. What I need to do?
I tried this [A]ttributeReferenceID (?<referenceID>\d+) but can't success. The string that I want is at any part of the log, so the string could be at the first line, second or in the last line.
String to get:
AttributeReferenceID 123
AttributeReferenceID 456
AttributeReferenceID 789
String to discard:
ISCCAttributeReferenceID 091281 [09123na0]
ISCCAttributeReferenceID 123012 [i1208221]
ISCCAttributeReferenceID 091221 [0oas9019]

If you regex flavor accepts positive lookbehinds, you can use the following
(?<=^AttributeReferenceID\s{4})(\d+)
Demo
The regex will look before the match if there is the specific string you're searching for followed by 4 spaces. If the length of the spaces may vary, then you'll have to find another solution. The following should normally work.
(?:^AttributeReferenceID\s+)(\d+)
Demo
And take the first group.

Related

Substitute every character after a certain position with a regex

I'm trying to use a regex replace each character after a given position (say, 3) with a placeholder character, for an arbitrary-length string (the output length should be the same as that of the input). I think a lookahead (lookbehind?) can do it, but I can't get it to work.
What I have right now is:
regex: /.(?=.{0,2}$)/
input string: 'hello there'
replace string: '_'
current output: 'hello th___' (last 3 substituted)
The output I'm looking for would be 'hel________' (everything but the first 3 substituted).
I'm doing this in Typescript, to replace some old javascript that is using ugly split/concatenate logic. However, I know how to make the regex calls, so the answer should be pretty language agnostic.
If you know the string is longer than given position n, the start-part can be optionally captured
(^.{3})?.
and replaced with e.g. $1_ (capture of first group and _). Won't work if string length is <= n.
See this demo at regex101
Another option is to use a lookehind as far as supported to check if preceded by n characters.
(?<=.{3}).
See other demo at regex101 (replace just with underscore) - String length does not matter here.
To mention in PHP/PCRE the start-part could simply be skipped like this: ^.{1,3}(*SKIP)(*F)|.

Regex Lookahead/Lookbehind if more than one occurance

I have string formulas like this:
?{a,b,c,d}
It can be can be embedded like this:
?{a,b,c,?{x,y,z}}
or this is the same:
?{a,b,c,
?{x,y,z}
}
So I have to find those commas, what are in the second and greather "level" brackets.
In the example below I marked the "levels" where I have to find all commas:
?{a,b,c,
?{x,y, <--Those
?{1,2,3} <--Those
}
}
I've tried with lookahead and lookbehind, but I'm totally confused now :/
Here is my latest working try, but it is not good at all:
OnlineRegex
Update:
To avoid misunderstanding, I don't want to count the commas.
I'd like to get groups of commas to replace them.
The condition is find the commas where more than one "open tags" before it like this: ?{
.. without closing tag like this: }
Examlpe.:
In this case I have not replace any commas:
?{1,2,3} ?{a,b,c}
But in this case I have to replace commas between a b c
?{1,2,3,?{a,b,c}}
For the examples which you have provided, the following regex works(gives the desired output as mentioned by you):
(?<!^\?{[^{}]*),(?=[\s\S]*(?:\s*}){2,})
For String ?{a,b,c,d}, see Demo1 No Match
For String, ?{a,b,c,?{x,y,z}}, see Demo2 Match successful
For String,
?{a,b,c,
?{x,y,z}
}
see Demo3 Match Successful
For String,
?{a,b,c,
?{x,y,
?{1,2,3}
}
}
see Demo4 Match Successful
For String ?{1,2,3} ?{a,b,c} ?{1,2,3} ?{a,b,c}, see Demo5 No Match
Explanation:
(?<!^\?{[^{}]*), - negative lookbehind to discard the 1st level commas. The logic applied here is it should not match the comma which is preceded by start of the string followed by ?{ followed by 0+ occurrences of any character except { or }
(?=[\s\S]*(?:\s*}){2,}) - The comma matched above must be followed by atleast 2 occurrences of }(consecutive or having only whitespaces between them)
Your question is rather unclear #norbre, but I presume you'd like to extract (i.e. "count") the number of commas.
You can't do this with a regex. Regexps can't count number of occurences. However, you can use this to extract the "internal part" and then use a spreadsheet formula to count number of commas:
^(?:\?{[a-zA-Z0-9,]+?,\n??\s*?\?{)([a-zA-Z0-9,?{}\n\s]+?(?:\n*?\s*?|})+)(?:[a-zA-Z0-9,\n\s]*})$
Try: https://regex101.com/r/Rr0eFo/5
Examples
1.
Input:
?{a,b,c,?{e,f},1,2,3}
Output:
e,f}
2.
Input:
?{a,b,c,
?{x,y,z,e,
?{1,2,3,?{f,g,3},4,5,6}
}
,d,e,f}
Output:
x,y,z,e,
?{1,2,3,?{f,g,3},4,5,6}
}
3.
Input:
?{a,b,c,?{e},1,2,3}
Output:
e}
(note that there are no commas here!)
One caveat however. As I have said, regexps can't count number of occurences.
Hence, the following sample (don't know if it's valid or not for your case) would return wrong match:
?{a,b,c,?{e,f}
,1,2,3,?{a,b}
}
Output:
e,f}
,1,2,3,?{a,b}
OK replacing commas is another story so I'll add another answer.
Your regexp engine would need to support recursion.
Still I don't see a way to do it with one regex - one match would either contain the first comma or contain everything between the braces!
What I suggest is to use one regexp to get "what is inside the inner braces", run a replace (, => "") and assemble the whole line again using submatches from the regexp.
Here it is: (\?{[^?{}]*)((?>[^?{}]|(?R))+?)([^?{}]*?\})
Try: https://regex101.com/r/IzTeY0/3
Example 1:
Input:
?{a,b,c,
?{x,y,z,e,
?{1,2,3,?{f,g,3},4,5,6}
}
,d,e,f}
Submatches:
1. ?{a,b,c,
2. ?{x,y,z,e,
?{1,2,3,?{f,g,3},4,5,6}
}
3.
,d,e,f}
Replace all commas in submatch 2 with anything you want, then reassamble the whole string using submatches 1 and 3.
Again, this would break the regexp:
?{a,b,c,?{e,f}
,1,2,3,?{a,b}
}
Submatch 2 would look like this:
?{e,f}
,1,2,3,?{a,b}

1 to 5 of the same groups in REGEX

For a string such as:
abzyxcabkmqfcmkcde
Notice that there are string patterns between ab and c in bold. To capture the first string pattern:
ab([a-z]{3,5})c
Is it possible to match both of the groups from the sample string? Actually, there should be 1 to 5 groups.
Note: python style regex.
You can verify that a given string conforms to the 1-5 repetitions of ab([a-z]{3,5})c using this regex
(?:ab([a-z]{3,5})c){1,5}
or this one if there are characters expected between the groups
(?:ab([a-z]{3,5})c.*?){1,5}
You will only be able to extract the last matching group from that string however, not any of the previous ones. to get a previous one you need to use hsz's approach
Just match all results - i.e. with g flag:
/ab([a-z]{3,5})c/g
or some method like in Python:
re.findall(pattern, string, flags=0)

Regex for searching strings matching the following one

I am searching strings matching the following one in my source code:
<CONSTANT_STRING_1> <CONSTANT_STRING_2> <VARIABLE_DIGITS> <CONSTANT_STRING_3>
where
<CONSTANT_STRING_1>, <CONSTANT_STRING_2> and <CONSTANT_STRING_3> are constant strings like "ABC", ""DEF" and "GHI".
<VARIABLE_DIGITS> is a random number of 14 digits like "12345678901234"
Note: there are white spaces between words.
What I am looking for is to search <CONSTANT_STRING_1> <CONSTANT_STRING_2> <WHATEVER> <CONSTANT_STRING_3>. How can I build the Regex?
I am reading that by "constant string" you mean character strings? If so the below should work to find that full string you are looking for. Btw the website linked below is really great for visualizing this type of problem... give it a try :)
(([a-zA-Z]+\s){2})[0-9]{14}\s([a-zA-Z]+)$
Debuggex Demo
To break it down...
(([a-zA-Z]+\s){2}) means a string of one or more characters comprised of either LC or UC letters followed by a space and that whole thing (chars + space) repeated twice
[0-9]{14}\s 14 digits followed by a space. As #Avinash said \d{14}\s is another way of writing this portion
([a-zA-Z]+)$ Another string of one or more characters. The $ indicates that this ends the string you are searching for
You could try the below regex.
<CONSTANT_STRING_1> <CONSTANT_STRING_2> \d{14} <CONSTANT_STRING_3>
Where, \d{14} matches exactly the 14 digit number.

Using Regex is there a way to match outside characters in a string and exclude the inside characters?

I know I can exclude outside characters in a string using look-ahead and look-behind, but I'm not sure about characters in the center.
What I want is to get a match of ABCDEF from the string ABC 123 DEF.
Is this possible with a Regex string? If not, can it be accomplished another way?
EDIT
For more clarification, in the example above I can use the regex string /ABC.*?DEF/ to sort of get what I want, but this includes everything matched by .*?. What I want is to match with something like ABC(match whatever, but then throw it out)DEF resulting in one single match of ABCDEF.
As another example, I can do the following (in sudo-code and regex):
string myStr = "ABC 123 DEF";
string tempMatch = RegexMatch(myStr, "(?<=ABC).*?(?=DEF)"); //Returns " 123 "
string FinalString = myStr.Replace(tempMatch, ""); //Returns "ABCDEF". This is what I want
Again, is there a way to do this with a single regex string?
Since the regex replace feature in most languages does not change the string it operates on (but produces a new one), you can do it as a one-liner in most languages. Firstly, you match everything, capturing the desired parts:
^.*(ABC).*(DEF).*$
(Make sure to use the single-line/"dotall" option if your input contains line breaks!)
And then you replace this with:
$1$2
That will give you ABCDEF in one assignment.
Still, as outlined in the comments and in Mark's answer, the engine does match the stuff in between ABC and DEF. It's only the replacement convenience function that throws it out. But that is supported in pretty much every language, I would say.
Important: this approach will of course only work if your input string contains the desired pattern only once (assuming ABC and DEF are actually variable).
Example implementation in PHP:
$output = preg_replace('/^.*(ABC).*(DEF).*$/s', '$1$2', $input);
Or JavaScript (which does not have single-line mode):
var output = input.replace(/^[\s\S]*(ABC)[\s\S]*(DEF)[\s\S]*$/, '$1$2');
Or C#:
string output = Regex.Replace(input, #"^.*(ABC).*(DEF).*$", "$1$2", RegexOptions.Singleline);
A regular expression can contain multiple capturing groups. Each group must consist of consecutive characters so it's not possible to have a single group that captures what you want, but the groups themselves do not have to be contiguous so you can combine multiple groups to get your desired result.
Regular expression
(ABC).*(DEF)
Captures
ABC
DEF
See it online: rubular
Example C# code
string myStr = "ABC 123 DEF";
Match m = Regex.Match(myStr, "(ABC).*(DEF)");
if (m.Success)
{
string result = m.Groups[1].Value + m.Groups[2].Value; // Gives "ABCDEF"
// ...
}