Find guids between strings - regex

Problem:
Trying to get all matches where guid = guid. I'm expecting to receive a collection of matches where one match looks like:
{9659BAE5-632F-4195-BD5D-414C1F2C1066} = {6E298F2A-129A-4491-B053-F12D67561572}
I'm trying to match all of the guid = guid between GlobalSection(NestedProjects) = preSolution and EndGlobalSection specifically. There are other places in the file where guid = guid exists.
Here is a data snippet:
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "Client", "Applications\", "{297BE1A3-A6A3-4835-BB87-63B4B4E2AE0D}"
ProjectSection(ProjectDependencies) = postProject
{A459406A-94FF-4CA9-8183-C7472419CC7D} = {A459406A-94FF-4CA9-8183-C7472419CC7D}
EndProjectSection
EndProject
GlobalSection(NestedProjects) = preSolution
{3D84A2B1-536D-4953-B331-D86E421905E7} = {AF46FD2E-710D-49CD-A203-CB0F8B7EF415}
{02CB05EC-6902-417E-AD50-B3910B245B22} = {2F54A6F1-5D32-4673-8AEE-B845CC622D64}
{DE303EF0-E3B1-4BA9-8CB3-544D37D29576} = {2F54A6F1-5D32-4673-8AEE-B845CC622D64}
{5A095236-0EE1-4480-B7A6-833ECCFE4257} = {AF070137-227F-42F7-9487-00CB26C46E04}
{6CCA189C-0D45-4E80-8486-38AB3E625E69} = {AF070137-227F-42F7-9487-00CB26C46E04}
{EAE3152A-C003-4E39-BFB7-B4F7CACE1606} = {AF070137-227F-42F7-9487-00CB26C46E04}
{9659BAE5-632F-4195-BD5D-414C1F2C1066} = {6E298F2A-129A-4491-B053-F12D67561572}
EndGlobalSection
EndGlobal
What I've Tried:
Here's what I'm using to match guid = guid
{[A-Za-z0-9]{8}-[A-Za-z0-9]{4}-[A-Za-z0-9]{4}-[A-Za-z0-9]{4}-[A-Za-z0-9]{12}} = {[A-Za-z0-9]{8}-[A-Za-z0-9]{4}-[A-Za-z0-9]{4}-[A-Za-z0-9]{4}-[A-Za-z0-9]{12}}
This works fine except it doesn't discriminate against the location of the match, obviously. So I receive other matches from other parts of the file.
I've been trying to use a positive look behind like so (with many variations):
(?<=GlobalSection\(NestedProjects\) = preSolution(\r\n|.)+?)
Am I misusing the lookbehind or something else?

I tried the following regex and got correct results using your example.
(?<= GlobalSection\(NestedProjects\) = preSolution(\r\n|.)+?){[A-Za-z0-9]{8}-[A-Za-z0-9]{4}-[A-Za-z0-9]{4}-[A-Za-z0-9]{4}-[A-Za-z0-9]{12}} = {[A-Za-z0-9]{8}-[A-Za-z0-9]{4}-[A-Za-z0-9]{4}-[A-Za-z0-9]{4}-[A-Za-z0-9]{12}}
You may need to use RegexOptions.Multiline as an option in your regex since it is C# (if you're not already).
EDIT
I wrote a little test program with your data snippet. I doubled it to make sure it did not just match the first group after GlobalSection(NestedProjects) = preSolution
matches only returned the GUIDs between GlobalSection(NestedProjects) = preSolution and EndGlobalSection for both sections.
The line {A459406A-94FF-4CA9-8183-C7472419CC7D} = {A459406A-94FF-4CA9-8183-C7472419CC7D} was not in the matched results as I would expect. I hope something in this code helps you out.
static void Main(string[] args)
{
string input = System.IO.File.ReadAllText(#"c:\test\directory\test.txt");
string pattern =
#"(?<=GlobalSection\(NestedProjects\) = preSolution(\r\n|.)+?){[A-Za-z0-9]{8}-[A-Za-z0-9]{4}-[A-Za-z0-9]{4}-[A-Za-z0-9]{4}-[A-Za-z0-9]{12}} = {[A-Za-z0-9]{8}-[A-Za-z0-9]{4}-[A-Za-z0-9]{4}-[A-Za-z0-9]{4}-[A-Za-z0-9]{12}}";
Regex re = new Regex(pattern, RegexOptions.Multiline);
MatchCollection matches = re.Matches(input);
foreach (var match in matches)
{
Console.Write(match);
}
}

Related

How to use TokenSequencePattern

I'm just getting started with CoreNLP's TokenSequencePattern and I can't get simple matches to work. All im trying to do is to match a token from the input text. The code below executes without errors but doesn't match anything. However, if u change the match expression to [] then it matches the two sentences.
Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, parse");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
Annotation document = new Annotation("This is sent 1. And here is sent 2");
pipeline.annotate(document);
List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class);
Env env = TokenSequencePattern.getNewEnv();
env.setDefaultStringMatchFlags(NodePattern.CASE_INSENSITIVE);
env.setDefaultStringPatternFlags(Pattern.CASE_INSENSITIVE);
TokenSequencePattern pattern = TokenSequencePattern.compile(env,"[ { word:\"sent\" } ]");
TokenSequenceMatcher matcher = pattern.getMatcher(sentences);
while ( matcher.find() ) {
System.out.println( matcher.group() );
}
Thank you!
List<CoreLabel> tokens =
document.get(CoreAnnotations.TokensAnnotation.class);
TokenSequencePattern pattern= TokenSequencePattern.compile("[ {
word:\"sent\" } ]");
TokenSequenceMatcher matcher = pattern.getMatcher(tokens);
while (matcher.find())
{
String matchedString = matcher.group();
List<CoreMap> matchedTokens = matcher.groupNodes();
System.out.println(matchedString + " " + matchedTokens);
}

Using Matcher to extract URL domain name

static String AdrPattern="http://www.([^&]+)\\.com\\.*";
static Pattern WebUrlPattern = Pattern.compile (AdrPattern);
static Matcher WebUrlMatcher;
WebUrlMatcher = WebUrlPattern.matcher ("keyword");
if(WebUrlMatcher.matches())
String extractedPath = WebUrlMatcher.group (1);
Considering above codes, My aim is to extract the domain name from the URL and dismiss the rest. But the trouble is that, first of all, if the URL has deeper path, it will not ignore it and second, it does not work for all URL with .com extension.
For example, if the URL is http://www.lego.com/en-us/technic/?domainredir=technic.lego, the result will not be lego but lego.com/en-us/technic/?domainredir=technic.lego.
Use
static String AdrPattern="http://www\\.([^&]+)\\.com.*";
^^ ^
You escaped the final dot, and it was treated as a literal, and matches could not match the entire string. Also, the first dot must be escaped.
Also, to make the regex a bit more strict, you can replace the [^&]+ with [^/&].
UPDATE:
static String AdrPattern="http://www\\.([^/&]+)\\.com/([^/]+)/([^/]+)/([^/]+).*";
static Pattern WebUrlPattern = Pattern.compile (AdrPattern);
static Matcher WebUrlMatcher = WebUrlPattern.matcher("http://www.lego.com/en-us/technic/?domainredir=technic.lego");
if(WebUrlMatcher.matches()) {
String extractedPath = WebUrlMatcher.group(1);
String extractedPart1 = WebUrlMatcher.group(2);
String extractedPart2 = WebUrlMatcher.group(3);
String extractedPart3 = WebUrlMatcher.group(4);
}
Or, with \G:
static String AdrPattern="(?:http://www\\.([^/&]+)\\.com/|(?!^)\\G)/?([^/]+)";
static String AdrPattern="http://www\\.([^/&]+)\\.com/([^/]+)/([^/]+)/([^/]+)";
static Pattern WebUrlPattern = Pattern.compile (AdrPattern);
static Matcher WebUrlMatcher = WebUrlPattern.matcher("http://www.lego.com/en-us/technic/?domainredir=technic.lego");
int cnt = 0;
while(WebUrlMatcher.find()) {
if (cnt == 0) {
String extractedPath = WebUrlMatcher.group(1);
String extractedPart = WebUrlMatcher.group(2);
cnt = cnt + 1;
}
else {
String extractedPart = WebUrlMatcher.group(2);
}
}

How to extract youtube video id with Regex.Match

i try to extract video ID from youtube using Regex.Match, for example I have www.youtube.com/watch?v=3lqexxxCoDo and i want to extract only 3lqexxxCoDo.
Dim link_vids As Match = Regex.Match(url_comments.Text, "https://www.youtube.com/watch?v=(.*?)$")
url_v = link_vids.Value.ToString
MessageBox.Show(url_v)
how i can extract video id ?, thanks !
Finally got the solution
Dim Str() As String
Str = url_comments.Text.Split("=")
url_v = Str(1)
Private Function getID(url as String) as String
Try
Dim myMatches As System.Text.RegularExpressions.Match 'Varible to hold the match
Dim MyRegEx As New System.Text.RegularExpressions.Regex("youtu(?:\.be|be\.com)/(?:.*v(?:/|=)|(?:.*/)?)([a-zA-Z0-9-_]+)", RegexOptions.IgnoreCase) 'This is where the magic happens/SHOULD work on all normal youtube links including youtu.be
myMatches = MyRegEx.Match(url)
If myMatches.Success = true then
Return myMatches.Groups(1).Value
Else
Return "" 'Didn't match something went wrong
End If
Catch ex As Exception
Return ex.ToString
End Try
End Function
This function will return just the video ID.
you can basically replace "www.youtube.com/watch?v=" with "" using "String.Replace"
MSDN String.Replace
url.Replace("www.youtube.com/watch?v=","")
You can use this expression, in PHP I am using this.
function parseYtId($vid)
{
if (preg_match('%(?:youtube(?:-nocookie)?\.com/(?:[^/]+/.+/|(?:v|e(?:mbed)?)/|.*[?&]v=)|youtu\.be/)([^"&?/ ]{11})%i', $vid, $match)) {
$vid = $match[1];
}
return $vid;
}

Regex- Get file name after last '\'

I have file name like
C:\fakepath\CI_Logo.jpg.
I need a regex for getting CI_Logo.jpg. Tried with \\[^\\]+$, but didn't workout..
Below is my Javascript Code
var regex="\\[^\\]+$";
var fileGet=$('input.clsFile').val();
var fileName=fileGet.match(regex);
alert(fileName);
Minimalist approach: demo
([\w\d_\.]+\.[\w\d]+)[^\\]
Use this
String oldFileName = "slashed file name";
String[] fileNameWithPath = oldFileName.split("\\\\");
int pathLength = fileNameWithPath.length;
oldFileName = fileNameWithPath[pathLength-1];
in java,
I guess,You can modify this for any other langs.
Edit:
make sure you split with "\\\\" - four slashes

Need the Groovy way to do partial file substitutions

I have a file that I need to modify. The part I need to modify (not the entire file), is similar to the properties shown below. The problem is that I only need to replace part of the "value", the "ConfigurablePart" if you will. I receive this file so can not control it's format.
alpha.beta.gamma.1 = constantPart1ConfigurablePart1
alpha.beta.gamma.2 = constantPart2ConfigurablePart2
alpha.beta.gamma.3 = constantPart3ConfigurablePart3
I made this work this way, though I know it is really bad!
def updateFile(String pattern, String updatedValue) {
def myFile = new File(".", "inputs/fileInherited.txt")
StringBuffer updatedFileText = new StringBuffer()
def ls = System.getProperty('line.separator')
myFile.eachLine{ line ->
def regex = Pattern.compile(/$pattern/)
def m = (line =~ regex)
if (m.matches()) {
def buf = new StringBuffer(line)
buf.replace(m.start(1), m.end(1), updatedValue)
line = buf.toString()
}
println line
updatedFileText.append(line).append(ls)
}
myFile.write(updatedFileText.toString())
}
The passed in pattern is required to contain a group that is substituted in the StringBuffer. Does anyone know how this should really be done in Groovy?
EDIT -- to define the expected output
The file that contains the example lines needs to be updated such that the "ConfigurablePart" of each line is replaced with the updated text provided. For my ugly solution, I would need to call the method 3 times, once to replace ConfigurablePart1, once for ConfigurablePart2, and finally for ConfigurablePart3. There is likely a better approach to this too!!!
*UPDATED -- Answer that did what I really needed *
In case others ever hit a similar issue, the groovy code improvements I asked about are best reflected in the accepted answer. However, for my problem that did not quite solve my issues. As I needed to substitute only a portion of the matched lines, I needed to use back-references and groups. The only way I could make this work was to define a three-part regEx like:
(.*)(matchThisPart)(.*)
Once that was done, I was able to use:
it.replaceAdd(~/$pattern/, "\$1$replacement\$3")
Thanks to both replies - each helped me out a lot!
It can be made more verbose with the use of closure as args. Here is how this can be done:
//abc.txt
abc.item.1 = someDummyItem1
abc.item.2 = someDummyItem2
abc.item.3 = someDummyItem3
alpha.beta.gamma.1 = constantPart1ConfigurablePart1
alpha.beta.gamma.2 = constantPart2ConfigurablePart2
alpha.beta.gamma.3 = constantPart3ConfigurablePart3
abc.item.4 = someDummyItem4
abc.item.5 = someDummyItem5
abc.item.6 = someDummyItem6
Groovy Code:-
//Replace the pattern in file and write to file sequentially.
def replacePatternInFile(file, Closure replaceText) {
file.write(replaceText(file.text))
}
def file = new File('abc.txt')
def patternToFind = ~/ConfigurablePart/
def patternToReplace = 'NewItem'
//Call the method
replacePatternInFile(file){
it.replaceAll(patternToFind, patternToReplace)
}
println file.getText()
//Prints:
abc.item.1 = someDummyItem1
abc.item.2 = someDummyItem2
abc.item.3 = someDummyItem3
alpha.beta.gamma.1 = constantPart1NewItem1
alpha.beta.gamma.2 = constantPart2NewItem2
alpha.beta.gamma.3 = constantPart3NewItem3
abc.item.4 = someDummyItem4
abc.item.5 = someDummyItem5
abc.item.6 = someDummyItem6
Confirm file abc.txt. I have not used the method updateFile() as done by you, but you can very well parameterize as below:-
def updateFile(file, patternToFind, patternToReplace){
replacePatternInFile(file){
it.replaceAll(patternToFind, patternToReplace)
}
}
For a quick answer I'd just go this route:
patterns = [pattern1 : constantPart1ConfigurablePart1,
pattern2 : constantPart2ConfigurablePart2,
pattern3 : constantPart3ConfigurablePart3]
def myFile = new File(".", "inputs/fileInherited.txt")
StringBuffer updatedFileText = new StringBuffer()
def ls = System.getProperty('line.separator')
myFile.eachLine{ line ->
patterns.each { pattern, replacement ->
line = line.replaceAll(pattern, replacement)
}
println line
updatedFileText.append(line).append(ls)
}
myFile.write(updatedFileText.toString())