Match ${variable} occurrences in a text using regex - regex

I am currently working in an application where I need to find all occurrences of strings like ${[0-9-a-zA-Z]} in a bigger string. Here is my method:
def countVariables(str) {
def pattern = ~'${sss}'
def matcher = str =~ pattern
print matcher.count
}
Now the problem.
When I pass a string like "asidb ${sss} asodniasndin", I get:
groovy.lang.MissingPropertyException: No such property: sss for class: ConsoleScript83
I think that, given that in Groovy ${} are properties, I'm having these conflicts.
In this case, I would have to run the whole text searching for the dollar sign and replacing it for something else? Or is there a simpler way to do this?
Regards!

Are you using single quotes so groovy doesn't do the expansion and just gives you a string?
Ie:
countVariables( 'asidb ${sss} asodniasndin' )

Related

RegEx. Get the value from the quotes and check for the attribute name [duplicate]

What would be a quick way to extract the value of the title attributes for an HTML table:
...
<li>Proclo</li>
<li>Proclus</li>
<li>Ptolemy</li>
<li>Pythagoras</li></ul><h3>S</h3>
...
so it would return Proclo, Proclus, Ptolemy, Pythagoras,.... in strings for each line. I'm reading the file using a StreamReader. I'm using C#.
Thank you.
This C# regex will find all title values:
(?<=\btitle=")[^"]*
The C# code is like this:
Regex regex = new Regex(#"(?<=\btitle="")[^""]*");
Match match = regex.Match(input);
string title = match.Value;
The regex uses positive lookbehind to find the position where the title value starts. It then matches everything up to the ending double quote.
Use the regexp below
title="([^"]+)"
and then use Groups to browse through matched elements.
EDIT: I have modified the regexp to cover the examples provided in comment by #Staffan Nöteberg

Regex to replace text between slash and colon in xpath

I have xpaths e.g( "/name:ABC/dep:HR/eid:123" ). I have input string in this format and expecting output data to be in "/ABC/HR/123".
Please share your thoughts how to use regex pattern in scala or Java.
See regex in use here
(?<=/)[^:]*:
See code in use here
object Main extends App {
val xpath = "/name:ABC/dep:HR/eid:123"
val regex = "(?<=/)[^:]*:".r
println(regex.replaceAllIn(xpath, ""))
}
Results in /ABC/HR/123

Perform Regex on value returned by Regex

This is probably straightforward but I'm not even sure which phrase I should google to find the answer. Forgive my noobiness.
I've got strings (filenames) that look like this:
site12345678_date20160912_23001_to_23100_of_25871.txt
What this naming convention means is "Records 23001 through 23100 out of 25871 for site 12345678 for September 12th 2016 (20160912)"
What I want to do is extract the date part (those digits between _date and the following _)
The Regex: .*(_date[0-9]{8}).* will return the string _date20160912. But what I'm actually looking for is just 20160912. Obviously, [0-8]{8} doesn't give me what I want in this case because that could be confused with the site, or potentially record counts
How can I responsibly accomplish this sort of 'substringing' with a single regular expression?
You just need to shift you parentheses so as to change the capture group from including '_date' in it. Then you would want to look for your capture group #1:
If done in python, for example, it would look something like:
import re
regex = '.*_date([0-9]{8}).*'
str = 'site12345678_date20160912_23001_to_23100_of_25871.txt'
m = re.match(regex, str)
print m.group(0) # the whole string
print m.group(1) # the string you are looking for '20160912'
See it in action here: https://eval.in/641446
The Regex: .*(_date[0-9]{8}).* will return the string _date20160912.
That means you are using the regex in a method that requires a full string match, and you can access Group 1 value. The only thing you need to change in the regex is the capturing group placement:
.*_date([0-9]{8}).*
^^^^^^^^^^
See the regex demo.

Using backslashes in Groovy

I'm using groovy to write a script that replaces UNC server names and a part of the directory structure. I have the following:
def patternToFind = /\\\\([a-zA-Z0-9-]+)\\share\\([a-zA-Z]+)/
def patternToReplace = '\\\\\\\\SHARESERVER\\\\share\\\\OPS'
This works, but all those \'s are pretty ugly. I understand in the regex why \\\\ is used to find \\, but what is confusing me is why in the replacement I'm doing I have to use four \'s to equal one \.
If anyone has a nicer way to do this I would greatly appreciate it. The goal is to replace
\\<server>\share\<env>
with the correct value for <server> and <env>
Thanks!
EDIT: I guess I should clarify. SHARESERVER and OPS are actually variables. So truly the end result would be something like:
def serverName = //some passed in server
def env = //some passed in env
def patternToFind = /\\\\([a-zA-Z0-9-]+)\\NAS\\([a-zA-Z]+)/
def patternToReplace = '\\\\\\\\' + serverName + '\\\\share\\\\' + env
So the only way I think of doing it is building a string literal to replace the section I'm looking for with.
And I'll be the first to admit that I suck at reg ex, so if you can use them to capture a value in a string and replace just that value with another, I'm all ears.
Doesn't it work with
def patternToReplace = $/\\SHARESERVER\share\OPS/$
If you want to use a literal replacement string (as opposed to one that involves $n backreferences) with a regular expression in Java then the safest thing to do is use Matcher.quoteReplacement:
def patternToReplace = Matcher.quoteReplacement(/\\SHARESERVER\shares\OPS/)

Article spinner with 2 tiers

I made an article spinner that used regex to find words in this syntax:
{word1|word2}
And then split them up at the "|", but I need a way to make it support tier 2 brackets, such as:
{{word1|word2}|{word3|word4}}
What my code does when presented with such a line, is take "{{word1|word2}" and "{word3|word4}", and this is not as intended.
What I want is when presented with such a line, my code breaks it up as "{word1|word2}|{word3|word4}", so that I can use this with the original function and break it into the actual words.
I am using c#.
Here is the pseudo code of how it might look like:
Check string for regex match to "{{word1|word2}|{word3|word4}}" pattern
If found, store each one as "{word1|word2}|{word3|word4}" in MatchCollection (mc1)
Split the word at the "|" but not the one inside the brackets, and select a random one (aka, "{word1|word2}" or "{word3|word4}")
Store the new results aka "{word1|word2}" and "{word3|word4}" in a new MatchCollection (mc2)
Now search the string again, this time looking for "{word1|word2}" only and ignore the double "{{" "}}"
Store these in mc2.
I can not split these up normally
Here is the regex I use to search for "{word1|word2}":
Regex regexObj = new Regex(#"\{.*?\}", RegexOptions.Singleline);
MatchCollection m = regexObj.Matches(originalText); //How I store them
Hopefully someone can help, thanks!
Edit: I solved this using a recursive method. I was building an article spinner btw.
That is not parsable using a regular expression, instead you have to use a recursive descent parser. Map it to JSON by replacing:
{ with [
| with ,
wordX with "wordX" (regex \w+)
Then your input
{{word1|word2}|{word3|word4}}
becomes valid JSON
[["word1","word2"],["word3","word4"]]
and will map directly to PHP arrays when you call json_decode.
In C#, the same should be possible with JavaScriptSerializer.
I'm really not completely sure WHAT you're asking for, but I'll give it a go:
If you want to get {word1|word2}|{word3|word4} out of any occurrence of {{word1|word2}|{word3|word4}} but not {word1|word2} or {word3|word4}, then use this:
#"\{(\{[^}]*\}\|\{[^}]*\})\}"
...which will match {{word1|word2}|{word3|word4}}, but with {word1|word2}|{word3|word4} in the first matching group.
I'm not sure if this will be helpful or even if it's along the right track, but I'll try to check back every once in a while for more questions or clarifications.
s = "{Spinning|Re-writing|Rotating|Content spinning|Rewriting|SEO Content Machine} is {fun|enjoyable|entertaining|exciting|enjoyment}! try it {for yourself|on your own|yourself|by yourself|for you} and {see how|observe how|observe} it {works|functions|operates|performs|is effective}."
print spin(s)
If you want to use the [square|brackets|syntax] use this line in the process function:
'/[(((?>[^[]]+)|(?R))*)]/x',