Skipping first character regex - regex

I'm currently using the following code to remove certain characters from a string in an array
myArray[x] = myArray[x].replaceAll("[aeiou]","");
which works fine, but I need to ignore the first character of the string so for example if an array element was Alan it would be stripped to Aln.
I'm not sure if using a replaceAll is the best way about doing it but the only other way I can think of is removing the first character, applying the above regex to the string, appending the character back on and then inserting back into the array, which seems a long winded way of doing it.

You can use a negative lookbehind to assert that the pattern is not preceded by the line start marker (^):
public static void main(String[] args) throws Exception {
final String[] input = {"abe", "bae"};
for(final String s: input) {
System.out.println(s.replaceAll("(?<!^)[aeiou]", ""));
}
}
Output:
ab
b

What about something like ..
myArray[x] = myArray[x].replaceAll("(^.)[aeiou]", "\\1");
// upd
Negative lookbehind is your solution, like Boris answered.

Related

regex to extract substring for special cases

I have a scenario where i want to extract some substring based on following condition.
search for any pattern myvalue=123& , extract myvalue=123
If the "myvalue" present at end of the line without "&", extract myvalue=123
for ex:
The string is abcdmyvalue=123&xyz => the it should return myvalue=123
The string is abcdmyvalue=123 => the it should return myvalue=123
for first scenario it is working for me with following regex - myvalue=(.?(?=[&,""]))
I am looking for how to modify this regex to include my second scenario as well. I am using https://regex101.com/ to test this.
Thanks in Advace!
Some notes about the pattern that you tried
if you want to only match, you can omit the capture group
e* matches 0+ times an e char
the part .*?(?=[&,""]) matches as least chars until it can assert eiter & , or " to the right, so the positive lookahead expects a single char to the right to be present
You could shorten the pattern to a match only, using a negated character class that matches 0+ times any character except a whitespace char or &
myvalue=[^&\s]*
Regex demo
function regex(data) {
var test = data.match(/=(.*)&/);
if (test === null) {
return data.split('=')[1]
} else {
return test[1]
}
}
console.log(regex('abcdmyvalue=123&3e')); //123
console.log(regex('abcdmyvalue=123')); //123
here is your working code if there is no & at end of string it will have null and will go else block there we can simply split the string and get the value, If & is present at the end of string then regex will simply extract the value between = and &
if you want to use existing regex then you can do it like that
var test = data1.match(/=(.*)&|=(.*)/)
const result = test[1] ? test[1] : test[2];
console.log(result);

Nicer way to access match results?

My requirement is to transform some textual message ids. Input is
a.messageid=X0001E
b.messageid=Y0001E
The task is to turn that into
a.messageid=Z00001E
b.messageid=Z00002E
In other words: fetch the first part each line (like: a.), and append a slightly different id.
My current solution:
val matcherForIds = Regex("(.*)\\.messageid=(X|Y)\\d{4,6}E")
var idCounter = 5
fun transformIds(line: String): String {
val result = matcherForIds.matchEntire(line) ?: return line
return "${result.groupValues.get(1)}.messageid=Z%05dE".format(messageCounter++)
}
This works, but find the way how I get to first match "${result.groupValues.get(1)} to be not very elegant.
Is there a nicer to read/more concise way to access that first match?
You may get the result without a separate function:
val line = s.replace("""^(.*\.messageid=)[XY]\d{4,6}E$""".toRegex()) {
"${it.groupValues[1]}Z%05dE".format(messageCounter++)
}
However, as you need to format the messageCounter into the result, you cannot just use a string replacement pattern and you cannot get rid of ${it.groupValues[1]}.
Also, note:
You may get rid of double backslashes by means of the triple-quoted string literal
There is no need adding .messageid= to the replacement if you capture that part into Group 1 (see (.*\.messageid=))
There is no need capturing X or Y since you are not using them later, thus, (X|Y) can be replaced with a more efficient character class [XY].
The ^ and $ make sure the pattern should match the entire string, else, there will be no match and the string will be returned as is, without any modification.
See the Kotlin demo online.
Maybe not really what you are looking for, but maybe it is. What if you first ensure (filter) the lines of interest and just replace what needs to be replaced instead, e.g. use the following transformation function:
val matcherForIds = Regex("(.*)\\.messageid=(X|Y)\\d{4,6}E")
val idRegex = Regex("[XY]\\d{4,6}E")
var idCounter = 5
fun transformIds(line: String) = idRegex.replace(line) {
"Z%05dE".format(idCounter++)
}
with the following filter:
"a.messageid=X0001E\nb.messageid=Y0001E"
.lineSequence()
.filter(matcherForIds::matches)
.map(::transformIds)
.forEach(::println)
In case there are also other strings that are relevant which you want to keep then the following is also possible but not as nice as the solution at the end:
"a.messageid=X0001E\nnot interested line, but required in the output!\nb.messageid=Y0001E"
.lineSequence()
.map {
when {
matcherForIds.matches(it) -> transformIds(it)
else -> it
}
}
.forEach(::println)
Alternatively (now just copying Wiktors regex, as it already contains all we need (complete match from begin of line ^ upto end of line $, etc.)):
val matcherForIds = Regex("""^(.*\.messageid=)[XY]\d{4,6}E$""")
fun transformIds(line: String) = matcherForIds.replace(line) {
"${it.groupValues[1]}Z%05dE".format(idCounter++)
}
This way you ensure that lines that completely match the desired input are replaced and the others are kept but not replaced.

Match longest substring with regex [duplicate]

I tried looking for an answer to this question but just couldn't finding anything and I hope that there's an easy solution for this. I have and using the following code in C#,
String pattern = ("(hello|hello world)");
Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);
var matches = regex.Matches("hello world");
Question is, is there a way for the matches method to return the longest pattern first? In this case, I want to get "hello world" as my match as opposed to just "hello". This is just an example but my pattern list consist of decent amount of words in it.
If you already know the lengths of the words beforehand, then put the longest first. For example:
String pattern = ("(hello world|hello)");
The longest will be matched first. If you don't know the lengths beforehand, this isn't possible.
An alternative approach would be to store all the matches in an array/hash/list and pick the longest one manually, using the language's built-in functions.
Regular expressions (will try) to match patterns from left to right. If you want to make sure you get the longest possible match first, you'll need to change the order of your patterns. The leftmost pattern is tried first. If a match is found against that pattern, the regular expression engine will attempt to match the rest of the pattern against the rest of the string; the next pattern will be tried only if no match can be found.
String pattern = ("(hello world|hello wor|hello)");
Make two different regex matches. The first will match your longer option, and if that does not work, the second will match your shorter option.
string input = "hello world";
string patternFull = "hello world";
Regex regexFull = new Regex(patternFull, RegexOptions.IgnoreCase);
var matches = regexFull.Matches(input);
if (matches.Count == 0)
{
string patternShort = "hello";
Regex regexShort = new Regex(patternShort, RegexOptions.IgnoreCase);
matches = regexShort.Matches(input);
}
At the end, matches will be be the output of "full" or "short", but "full" will be checked first and will short-circuit if it is true.
You can wrap the logic in a function if you plan on calling it many times. This is something I came up with (but there are plenty of other ways you can do this).
public bool HasRegexMatchInOrder(string input, params string[] patterns)
{
foreach (var pattern in patterns)
{
Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);
if (regex.IsMatch(input))
{
return true;
}
}
return false;
}
string input = "hello world";
bool hasAMatch = HasRegexMatchInOrder(input, "hello world", "hello", ...);

Using RegEx split the string

I have a string like '[1]-[2]-[3],[4]-[5],[6,7,8],[9]' or '[Computers]-[Apple]-[Laptop],[Cables]-[Cables,Connectors],[Adapters]', I'd like the Pattern to get the list result, but don't know how to figure out the pattern. Basically the comma is the split, but [6,7,8] itself contains the comma as well.
the string: [1]-[2]-[3],[4]-[5],[6,7,8],[9]
the result:
[1]-[2]-[3]
[4]-[5]
[6,7,8]
[9]
or
the string: [Computers]-[Apple]-[Laptop],[Cables]-[Cables,Connectors],[Adapters]
the result:
[Computers]-[Apple]-[Laptop]
[Cables]-[Cables,Connectors]
[Adapters]
,(?=\[)
This pattern splits on any comma that is followed by a bracket, but keeps the bracket within the result text.
The (?=*stuff*) is known as a "lookahead assertion". It acts as a condition for the match but is not itself part of the match.
In C# code:
String inputstring = "[Computers]-[Apple]-[Laptop],[Cables]-[Cables,Connectors],[Adapters]";
foreach(String s in Regex.Split(inputstring, #",(?=\[)"))
System.Console.Out.WriteLine(s);
In Java code:
String inputstring = "[Computers]-[Apple]-[Laptop],[Cables]-[Cables,Connectors],[Adapters]";
Pattern p = Pattern.compile(",(?=\\[)"));
for(String s : p.split(inputstring))
System.out.println(s);
Either produces:
[Computers]-[Apple]-[Laptop]
[Cables]-[Cables,Connectors]
[Adapters]
Although I believe the best approach here is to use split (as presented by #j__m's answer), here's an approach that uses matching rather than splitting.
Regex:
(\[.*?\](?!-))
Example usage:
String input = "[Computers]-[Apple]-[Laptop],[Cables]-[Cables,Connectors],[Adapters]";
Pattern p = Pattern.compile("(\\[.*?\\](?!-))");
Matcher m = p.matcher(input);
while (m.find()) {
System.out.println(m.group(1));
}
Resulting output:
[Computers]-[Apple]-[Laptop]
[Cables]-[Cables,Connectors]
[Adapters]
An answer that doesn't use regular expressions (if that's worth something in ease of understanding what's going on) is:
substitute "]#[" for "],["
split on "#"

boost regex to extract a number from string

I have a string
resource = "/Music/1"
the string can take multiple numeric values after "/Music/" . I new to regular expression stuff . I tried following code
#include <iostream>
#include<boost/regex.hpp>
int main()
{
std::string resource = "/Music/123";
const char * pattern = "\\d+";
boost::regex re(pattern);
boost::sregex_iterator it(resource.begin(), resource.end(), re);
boost::sregex_iterator end;
for( ; it != end; ++it)
{
std::cout<< it->str() <<"\n";
}
return 0;
}
vickey#tb:~/trash/boost$ g++ idExtraction.cpp -lboost_regex
vickey#tb:~/trash/boost$ ./a.out
123
works fine . But even when the string happens to be something like "/Music23/123" it give me a value 23 before 123. When I use the pattern "/\d+" it would give results event when the string is /23/Music/123. What I want to do is extract the only number after "/Music/" .
I think part of the problem is that you haven't defined very well (at least to us) what it is you are trying to match. I'm going to take some guesses. Perhaps one will meet your needs.
The number at the end of your input string. For example "/a/b/34". Use regex "\\d+$".
A path element that is entirely numeric. For example "/a/b/12/c" or "/a/b/34" but not "/a/b56/d". Use regex "(?:^|/)(\\d+)(?:/|$)" and get captured group [1]. You might do the same thing with lookahead and lookbehind, perhaps with "(?<=^|/)\\d+(?=/|$)".
If there will never be anything after the last slash could you just use a regex or string.split() to get everything after the last slash. I'd get you code but I'm on my phone now.