Regular Expression Split - regex

I have a string as mentioned below. I have been trying to split using regular expression and going through the forums, I found ([^|]+) which would match everything except (pipe) However I want to break this into two using regular expressions, but not been able to do this. So one expression would be (xyz) which would extract from GA till everything before the pipe character, the second would be (abc) which would extract anything after the first pipe.
GA1.2.1127630839.1468526914|3847EFF358ABEC90-01A39B0290BAC298

The first is ^[^|]+ and the second is [^|]+$.
The idea is to use your negated character class with anchors. ^ will match the string start and $ will matchthe string end.
These two patterns have no lookarounds and will work with almost any regex flavor.

Guessing at popular languages. :-)
Python:
'GA1.2.1127630839.1468526914|3847EFF358ABEC90-01A39B0290BAC298'.split('|')
JavaScript:
'GA1.2.1127630839.1468526914|3847EFF358ABEC90-01A39B0290BAC298'.split('|')
PHP:
explode('|', 'GA1.2.1127630839.1468526914|3847EFF358ABEC90-01A39B0290BAC298')
Go:
strings.Split("GA1.2.1127630839.1468526914|3847EFF358ABEC90-01A39B0290BAC298", "|")
Ruby:
'GA1.2.1127630839.1468526914|3847EFF358ABEC90-01A39B0290BAC298'.split('|')

EDIT
After clarification, I get what you're asking. Fiddling with regex101.com, I found that those two expressions should give you what you want:
^.*(?=\|) gets the first part, and
(?<=\|).* gets the second.
When you click on the link, you can see it in action.
PREVIOUS ANSWER
Many alternatives to regular expressions as #smarx's answer reveals.
But something along those lines should do it:
R
myString <- 'GA1.2.1127630839.1468526914|3847EFF358ABEC90-01A39B0290BAC298'
part1 <- sub(pattern = "(.*)\\|(.*)", x = myString, replacement = "\\1")
part2 <- sub(pattern = "(.*)\\|(.*)", x = myString, replacement = "\\2")
R requires doubling all backslashes, some other languages don't.
Python
import re
myString = 'GA1.2.1127630839.1468526914|3847EFF358ABEC90-01A39B0290BAC298'
part1 = re.sub(pattern="(.*)\|(.*)", repl = "\\1", string = myString)
part1 = re.sub(pattern="(.*)\|(.*)", repl = "\\2", string = myString)

Related

Replace pattern with pattern in vb.net string

I want to replace "0A ","0B ",...,"1A ","1B ",... patterns with "0A|","0B|",...,"1A|","1B|",... from string vb.net
I can write individual replace lines like
string = string.Replace("0A ", "0A|")
string = string.Replace("0B ", "0B|")
.
.
.
string = string.Replace("0Z ", "0Z|")
But, I would have to write too many lines(26*10*2- Two because such scenario occurs twice) and it just doesn't seem to be a good solution. Can someone give me a good regex solution for this?
Use Regex.Replace:
result = Regex.Replace(string, "(\d+[A-Z]+) ", "$1|")
I used the pattern \d+[A-Z]+ to represent the text under the assumption that your series of data might see more than one digit/letter. This seems to be working in the demo below.
Demo
Regex: \s Substitution: |
Details:
\s Matches any whitespace character
Regex demo
VB.NET code:
Regex.Replace("0A ", "\s", "|") Output: 0A|

Escaping dollars groovy

I'm having trouble escaping double dollars from a string to be used with regex functions pattern/matcher.
This is part of the String:
WHERE oid_2 = $$test$$ || oid_2 = $$test2$$
and this is the closest code I've tried to get near the solution:
List<String> strList = new ArrayList<String>();
Pattern pattern = Pattern.compile("\$\$.*?\$\$");
log.debug("PATTERN: "+pattern)
Matcher matcher = pattern.matcher(queryText);
while (matcher.find()) {
strList.add(matcher.group());
}
log.debug(strList)
This is the debug output i get
- PATTERN: $$.*?$$
- []
So the pattern is actually right, but the placeholders are not found in the string.
As a test I've tried to replace "$$test$$" with "XXtestXX" and everything works perfectly. What am I missing? I've tried "/$" strings, "\\" but still have no solution.
Note that a $ in regex matches the end of the string. To use it as a literal $ symbol, you need to escape it with a literal backslash.
You used "\$\$.*?\$\$" that got translated into a literal string like $$.*?$$ that matches 2 end of string positions, any 0+ chars as few as possible and then again 2 end of strings, which has little sense. You actually would need a backslash to first escape the $ that is used in Groovy to inject variables into a double quoted string literal, and then use 2 backslashes to define a literal backslash - "\\\$\\\$.*?\\\$\\\$".
However, when you work with regex, slashy strings are quite helpful since all you need to escape a special char is a single backslash.
Here is a sample code extracting all matches from the string you have in Groovy:
def regex = /\$\$.*?\$\$/;
def s = 'WHERE oid_2 = $$test$$ || oid_2 = $$test2$$'
def m = s =~ regex
(0..<m.count).each { print m[it] + '\n' }
See the online demo.
Anyone who gets here might like to know another answer to this, if you want to use Groovy slashy strings:
myComparisonString ==~ /.*something costs [$]stuff.*/
I couldn't find another way of putting a $ in a slashy string, at least if the $ is to be followed by text. If, conversely, it is followed by a number (or presumably any non-letter), this will work:
myComparisonString ==~ /.*something costs \$100.*/
... the trouble being, of course, that the GString "compiler" (if that's its name) would recognise "$stuff" as an interpolated variable.

Add space before each Capitalized word (minus the 1st one) using Scala code

I'm new to Scala... so far I'm really liking it. :)
Right now I'm playing with Play Framework and I'm amazed by how straightforward it is to get going.
Well... the problem at hand is that I'd like to make a string as the following one more readable:
UsersGroupedByRegistrationMonthYear.csv
The output should be:
Users Grouped By Registration Month Year.csv
Can you lend a hand?
Not regex, but a pretty straight forward approach.
val str = "UsersGroupedByRegistrationMonthYear.csv"
str.flatMap(c => if (c.isUpper) Seq(' ', c) else Seq(c)).trim
You can search using this regex with 2 capturing groups:
([a-z0-9])([A-Z])
and replace using this pattern:
$1 $2
RegEx Demo
Code:
repl = input.replaceAll("([a-z0-9])([A-Z])", "$1 $2");
One alternative is to use String.split with regex lookarounds to tokenize your string by caps without throwing them away and then to combine tokens back into a string with spaces between tokens:
val in = "UsersGroupedByRegistrationMonthYear.csv"
val out = in.split("(?=[A-Z])").mkString(" ")
println("\"%s\"\nbecomes\n\"%s\"".format(in, out))
This yields:
"UsersGroupedByRegistrationMonthYear.csv"
becomes
"Users Grouped By Registration Month Year.csv"

Trying to match a string in the format of domain\username using Lua and then mask the pattern with '#'

I am trying to match a string in the format of domain\username using Lua and then mask the pattern with #.
So if the input is sample.com\admin; the output should be ######.###\#####;. The string can end with either a ;, ,, . or whitespace.
More examples:
sample.net\user1,hello -> ######.###\#####,hello
test.org\testuser. Next -> ####.###\########. Next
I tried ([a-zA-Z][a-zA-Z0-9.-]+)\.?([a-zA-Z0-9]+)\\([a-zA-Z0-9 ]+)\b which works perfectly with http://regexr.com/. But with Lua demo it doesn't. What is wrong with the pattern?
Below is the code I used to check in Lua:
test_text="I have the 123 name as domain.com\admin as 172.19.202.52 the credentials"
pattern="([a-zA-Z][a-zA-Z0-9.-]+).?([a-zA-Z0-9]+)\\([a-zA-Z0-9 ]+)\b"
res=string.match(test_text,pattern)
print (res)
It is printing nil.
Lua pattern isn't regular expression, that's why your regex doesn't work.
\b isn't supported, you can use the more powerful %f frontier pattern if needed.
In the string test_text, \ isn't escaped, so it's interpreted as \a.
. is a magic character in patterns, it needs to be escaped.
This code isn't exactly equivalent to your pattern, you can tweek it if needed:
test_text = "I have the 123 name as domain.com\\admin as 172.19.202.52 the credentials"
pattern = "(%a%w+)%.?(%w+)\\([%w]+)"
print(string.match(test_text,pattern))
Output: domain com admin
After fixing the pattern, the task of replacing them with # is easy, you might need string.sub or string.gsub.
Like already mentioned pure Lua does not have regex, only patterns.
Your regex however can be matched with the following code and pattern:
--[[
sample.net\user1,hello -> ######.###\#####,hello
test.org\testuser. Next -> ####.###\########. Next
]]
s1 = [[sample.net\user1,hello]]
s2 = [[test.org\testuser. Next]]
s3 = [[abc.domain.org\user1]]
function mask_domain(s)
s = s:gsub('(%a[%a%d%.%-]-)%.?([%a%d]+)\\([%a%d]+)([%;%,%.%s]?)',
function(a,b,c,d)
return ('#'):rep(#a)..'.'..('#'):rep(#b)..'\\'..('#'):rep(#c)..d
end)
return s
end
print(s1,'=>',mask_domain(s1))
print(s2,'=>',mask_domain(s2))
print(s3,'=>',mask_domain(s3))
The last example does not end with ; , . or whitespace. If it must follow this, then simply remove the final ? from pattern.
UPDATE: If in the domain (e.g. abc.domain.org) you need to also reveal any dots before that last one you can replace the above function with this one:
function mask_domain(s)
s = s:gsub('(%a[%a%d%.%-]-)%.?([%a%d]+)\\([%a%d]+)([%;%,%.%s]?)',
function(a,b,c,d)
a = a:gsub('[^%.]','#')
return a..'.'..('#'):rep(#b)..'\\'..('#'):rep(#c)..d
end)
return s
end

Using Regex is there a way to match outside characters in a string and exclude the inside characters?

I know I can exclude outside characters in a string using look-ahead and look-behind, but I'm not sure about characters in the center.
What I want is to get a match of ABCDEF from the string ABC 123 DEF.
Is this possible with a Regex string? If not, can it be accomplished another way?
EDIT
For more clarification, in the example above I can use the regex string /ABC.*?DEF/ to sort of get what I want, but this includes everything matched by .*?. What I want is to match with something like ABC(match whatever, but then throw it out)DEF resulting in one single match of ABCDEF.
As another example, I can do the following (in sudo-code and regex):
string myStr = "ABC 123 DEF";
string tempMatch = RegexMatch(myStr, "(?<=ABC).*?(?=DEF)"); //Returns " 123 "
string FinalString = myStr.Replace(tempMatch, ""); //Returns "ABCDEF". This is what I want
Again, is there a way to do this with a single regex string?
Since the regex replace feature in most languages does not change the string it operates on (but produces a new one), you can do it as a one-liner in most languages. Firstly, you match everything, capturing the desired parts:
^.*(ABC).*(DEF).*$
(Make sure to use the single-line/"dotall" option if your input contains line breaks!)
And then you replace this with:
$1$2
That will give you ABCDEF in one assignment.
Still, as outlined in the comments and in Mark's answer, the engine does match the stuff in between ABC and DEF. It's only the replacement convenience function that throws it out. But that is supported in pretty much every language, I would say.
Important: this approach will of course only work if your input string contains the desired pattern only once (assuming ABC and DEF are actually variable).
Example implementation in PHP:
$output = preg_replace('/^.*(ABC).*(DEF).*$/s', '$1$2', $input);
Or JavaScript (which does not have single-line mode):
var output = input.replace(/^[\s\S]*(ABC)[\s\S]*(DEF)[\s\S]*$/, '$1$2');
Or C#:
string output = Regex.Replace(input, #"^.*(ABC).*(DEF).*$", "$1$2", RegexOptions.Singleline);
A regular expression can contain multiple capturing groups. Each group must consist of consecutive characters so it's not possible to have a single group that captures what you want, but the groups themselves do not have to be contiguous so you can combine multiple groups to get your desired result.
Regular expression
(ABC).*(DEF)
Captures
ABC
DEF
See it online: rubular
Example C# code
string myStr = "ABC 123 DEF";
Match m = Regex.Match(myStr, "(ABC).*(DEF)");
if (m.Success)
{
string result = m.Groups[1].Value + m.Groups[2].Value; // Gives "ABCDEF"
// ...
}