RegExp multipart phrase - regex

I have such text (not RegEx, but simple text i my text file) phrase: \abc{xyz}{uvw}, and I'd like to change it to: xyz : uvw. xyz and uvw can be various, but \abc{}{} is constant.
Is there any way to replace this first phrase, to this second? I have it 380 times in my file, and changing it manually will take me an hour. I have also other similar phrases with other names than abc.
If I'd have a RegEx to replace it would be a big advantage for me. Could you help me with writing this RegExp?
PS. If you'd like to know, this phrases are LaTeX user defined commands.
Type of RegEx doesn't matter. I just need to correct the text (it's a lecture of my professor of Math). I need to correct it, but I gave no lots of time.

You should be able to use the following :
match \\abc{([^}]*)}{([^}]*)}
replace by \1 : \2
You can try it here.
[^}]* matches every character but }, it is used to match the content of the brackets without risk of overflowing.
Aside from that we escape the \ to match its literal character, group the content of two brackets in their own capturing group to which we refer in the replacement pattern.

Related

Regex to match word that contains exact word

I'm trying to filter specific files in Java by their names using Regex.
Idea being a lot of files are called SomethingSupport.java, AnotherSupport.java, MoreThingsSupport.java, so as they all have the "Support.java" I was trying to do:
[Support.java]
But of course that's meant for characters so it will filter S,u,p,o, etc... Looking through RegExr I've tried:
(Support.java)
But it takes all "Support.java" occurrences but I'm trying to take ThingsSupport.java, SomethingSupport.java, etc. not Support.java.
Parentheses just group things. There is no difference between what regex "Support.java" and regex "(Support.java)" would match.
Note that . is regexpese for any character, so, e.g. Supportxjava, unlikely as it is that you have a file with that name in your source base, would match too. \. is regexpese for "an actual dot, please".
I think you're looking for the regex .*Support.*\.java. Which means: Absolutely anything (0 or more any character), followed by the string Support, followed by absolutely anything again, followed by .java.
That would find e.g. FooSupportBar.java, Support.java, HelloSupport.java, and SupportHello.java. It wouldn't find anything that doesn't end in .java.

Select word with regex when previous words are specific (and sometimes variable)

I am trying to highlight (or find) any word that is preceded by another word, being define, and another specific word to be highlighted (as), when define is present, etc. Basically, I need to find words that are found because of other regex searches, but only targetting each word independently.
For example, having the following string:
define MyFile as File
In that case, define is searched using the regex statement \b-?define\b. I also need to find MyFile if it is preceded directly by define. Plus, as needs to be found as well only if it is preceded directly by a word, in this case MyFile, which is preceded by define, and this goes on and on.
How can this be done? I have messed around quite a bit to find how to highlight MyFile correctly, without any success. As for the specific recursive search of as and File, I am clueless.
Keep in mind that all the regex expressions must be separate, since I will use this as a Sublime Text custom syntax highlight match finder.
define\s([\w]+)\sas\s([\w]+)$
This regex code would capture all words after define separated by a space and all words after as separated by space as well
check this regex : https://regex101.com/r/aQ0yO0/2
For not having context of what the data looks like...this is a naive way of doing it but it's pretty intuitive. However, it doesn't use regex. The other examples are good ways to use regex.
seq = "word1 defined as blah blahh blahhh word2 defined as hello helloo"
words_of_interest = []
list_of_words = seq.split(" ")
for i,word in enumerate(list_of_words):
if word == "defined":
words_of_interest.append(list_of_words[i-1])
print words_of_interest
#['word1', 'word2']
The regular expression is always going to encompass the "define" as well. The trick is to use capture groups and refer to them afterwards. The specific way how to do this depends on the "flavor" of your regex.
As I'm not familiar with Sublime's regex, I'm just going to present an example in sed:
$ sed -e 's/define \([A-Za-z]*\)/include \1/g' <<< "define MyFile as File"
include MyFile as File
This example replaces all "define"s with "include"s - and adds whatever was captured by what's inside the group (the regex [A-Za-z]* in this case). Not too useful, but hopefully explanatory :)
The capture group is denoted by the escaped brackets, and (in sed) referenced by the escaped number (representing the index) of the group.
I believe it's capture groups as a concept that you're looking for, rather than any specific regex.

RegEx "replace all but" for Notepad++ v6.3

First timer and relatively inexperienced with RegEx and Notepad++. What I am trying to do is replace everything but the policy numbers in these two firewall session. Mind you, I have a list multiple lists 700+ lines long so I want to replace everything in one pass, leaving just the policy number for each line.
id 1978781/s23,vsys 0,flag 00200440/4000/0003,policy 4332,time 5972, dip 0 module 0
id 1997645/s23,vsys 0,flag 00200440/4000/0003,policy 30562,time 6283, dip 0 module 0
There are thousands of different policy numbers, so a simple search wont do.
I would like my lines to look like this after a replace.
4332
30562
After two hours of trying to learn RegEx for this one problem, I realized this its more involved than I expected, and I need to spend time learning this since its a very powerful tool. This could really save a lot of time, which unfortunately I don't have at the moment. I'm looking forward to learning more about RegEx and appreciate any help or direction you could give me.
Given the fact the lines always look the same you can use the following
^.+policy (\d+).+$
Replace by : $1
The dot is a wild card so , .+ means find everything before the word "policy ". Then find a group of digits (\d+ is for finding digits) and save them (thats what the parenthesis are for in many regex engines). Then find all the characters till the end of the line.
The ^ character means start of line. The $ means end of line.
You can try the following:
Find:
^.*policy ([0-9]+).*$
Replace with:
\1
Why does this work?
The dot matches any character, and the star means "zero or more of" the character preceding it. This means that .* matches everything.
What you want is to match everything before and after the policy and erase it, and keep just the policy number, so between your everything matchers you look for the string "policy xxxxx" where the xxxxx are numbers.
Each term surrounded by parenthesis in your regex is saved to be used in the replacement. I put parenthesis around the number matcher, [0-9]+ and then use what was matched in the repace part with \1. If your regex contains several parenthesized parts, you can get them with \1, \2, \3...
Regexes are really powerful, you should read a tutorial about them to learn what they can offer.

Struggling with regex in yahoo pipe

I'm using Yahoo Pipes to build a scraper that would scrape our company micro-site via xPath and generate an RSS feed that I can then embed on the main site.
So far I got as far as scraping the Job title and location from the page but I can't get the items to link out to the micro-site.
Here's my pipe so far: http://pipes.yahoo.com/pipes/pipe.info?_id=2bb5b8fedd0064b64d0e8861e3fc8fd5
I think I need to extract the href link from each node and then apply regex but I really can't get my head around it.
The link looks like this in the code: www2.jobs.badenochandclark.ch/JavaScript:OpenAssignment('a960c93a-11fe-4751-bc27-83a48429c3ba',%20'/Jobs/Details/a960c93a-11fe-4751-bc27-83a48429c3ba');
But I'm struggling to generate a regex that would basically do this:
www2.jobs.badenochandclark.ch/JavaScript:OpenAssignment('a960c93a-11fe-4751-bc27-83a48429c3ba',%20'/Jobs/Details/a960c93a-11fe-4751-bc27-83a48429c3ba');
So I'm stuck on how to extract a link and then how to build that on to the pipe. Any help or nudge in the right direction would be really appreciated.
Here you go..
http://pipes.yahoo.com/pipes/pipe.info?_id=d564b802185d5777d757ed4189470941
Used slightly less complicated code in the regex module. It often being easier to erase the code you do not want than trying to extract and assign to a variable
in plx.link.href find this-> JavaScript(.+)Jobs replace with->jobs
in plx.link.href find this-> \'\); replace with->leave blank
the trailing bit of code '); requires the backslashes as ' and ) are control charecters adding the backslash \ makes regex read them literlally as text characters.
This bit of regex a(.+?)b means match or grab everything between a & b and comes in handy for this sort of thing a lot.
Full-fledged URL-parsing isn't simple, but given enough constraints it becomes manageable.
For example, if you know
that JavaScript:OpenAssignment( always follows a /,
that the first argument is always a hexadecimal+dashes string in quotes,
that the second argument (at least the portion you need) is also in quotes,
and that you can discard the remainder of URL after the "function,"
then something like this might be a starting point:
\/JavaScript:OpenAssignment\([^'"]*['"][0-9a-fA-F\-]+['"][^,)]*,[^'")]*['"]([0-9a-fA-F\-]+)['"].*
Then, $1 would contain the match you desire to keep. The explanation follows.
\/ Slashes need to be escaped (usually).
JavaScript:OpenAssignment Our function of interest.
\( Parentheses need to be escaped too.
[^'"]* We're looking for a quote next, so ignore any
string of non-quotes, e.g. %20.
['"] A quote character.
[0-9a-fA-F\-]+ A hexadecimal-and-dashes string.
['"] A quote character.
[^,)]* We're looking for a comma next, so ignore any
string of non-quotes, e.g., again, %20.
, A comma character.
[^'"]* We're looking for a quote again, so ignore any
string of non-quotes, e.g. %20.
['"] A quote character.
([0-9a-fA-F\-]+) A hexadecimal-and-dashes string, this time captured.
['"] A quote character.
.* The rest of the string that we don't care about.

Remove stuff, retrieve numbers, retrieve text with spaces in place of dots, remove the rest

This is my first question, so I hope I didn't mess too much with the title and the formatting.
I have a bunch of file a client of mine sent me in this form:
Name.Of.Chapter.021x212.The.Actual.Title.Of.the.Chapter.DOC.NAME-Some.stuff.Here.ext
What I need is a regex to output just:
212 The Actual Title Of the Chapter
I'm not gonna use it with any script language in particular; it's a batch renaming of files through an app supporting regex (which already "preserves" the extension).
So far, all I was able to do was this:
/.*x(\d+)\.(.*?)\.[A-Z]{3}.*/ -->REPLACE: $1 $2
(Capture everything before a number preceded by an "x", group numbers after the "x", group everything following until a 3 digit Uppercase word is met, then capture everything that follows)
which gives me back:
212 The.Actual.Title.Of.the.Chapter
Having seen the result I thought that something like:
/.*x(\d+)\.([^.]*?)\.[A-Z]{3}.*/ -->REPLACE: $1 $2
(Changed second group to "Capture everything which is not a dot...") would have worked as expected.
Instead, the whole regex fails to match completely.
What am I missing?
TIA
ciĆ 
ale
.*x(\d+)\. matches Name.Of.Chapter.021x212.
\.[A-Z]{3}.* matches .DOC.NAME-Some.stuff.Here.ext
But ([^.]*?) does not match The.Actual.Title.Of.the.Chapter because this regex does not allow for any periods at all.
since you are on Mac, you could use the shell
$ s="Name.Of.Chapter.021x212.The.Actual.Title.Of.the.Chapter.DOC.NAME-Some.stuff.Here.ext"
$ echo ${s#*x}
212.The.Actual.Title.Of.the.Chapter.DOC.NAME-Some.stuff.Here.ext
$ t=${s#*x}
$ echo ${t%.[A-Z][A-Z][A-Z].*}
212.The.Actual.Title.Of.the.Chapter
Or if you prefer sed, eg
echo $filename | sed 's|.[^x]*x||;s/\.[A-Z][A-Z][A-Z].*//'
For processing multiple files
for file in *.ext
do
newfile=${file#*x}
newfile=${newfile%.[A-Z][A-Z][A-Z].*}
# or
# newfile=$(echo $file | sed 's|.[^x]*x||;s/\.[A-Z][A-Z][A-Z].*//')
mv "$file" "$newfile"
done
To your question "How can I remove the dots in the process of matching?" the answer is "You can't." The only way to do that is by processing the result of the match in a second step, as others have said. But I think there's a more basic question that needs to be addressed, which is "What does it mean for a regex to match a given input?"
A regex is usually said to match a string when it describes any substring of that string. If you want to be sure the regex describes the whole string, you need to add the start (^) and end ($) anchors:
/^.*x(\d+)\.(.*?)\.[A-Z]{3}.*$/
But in your case, you don't need to describe the whole string; if you get rid of the .* at either end, it will serve your just as well:
/x(\d+)\.(.*?)\.[A-Z]{3}/
I recommend you not get in the habit of "padding" regexes with .* at beginning and end. The leading .* in particular can change the behavior of the regex in unexpected ways. For example, it there were two places in the input string where x(\d+)\. could match, your "real" match would have started at the second one. Also, if it's not anchored with ^ or \A, a leading .* can make the whole regex much less efficient.
I said "usually" above because some tools do automatically "anchor" the match at the beginning (Python's match()) or at both ends (Java's matches()), but that's pretty rare. Most of the shells and command-line tools available on *nix systems define a regex match in the traditional way, but it's a good idea to say what tool(s) you're using, just in case.
Finally, a word or two about vocabulary. The parentheses in (\d+) cause the matched characters to be captured, not grouped. Many regex flavors also support non-capturing parentheses in the form (?:\d+), which are used for grouping only. Any text that is included in the overall match, whether it's captured or not, is said to have been consumed (not captured). The way you used the words "capture" and "group" in your question is guaranteed to cause maximum confusion in anyone who assumes you know what you're talking about. :D
If you haven't read it yet, check out this excellent tutorial.