regex find and replace multiple chars of different kinds in one expression - regex

I need to replace all left { and right } curly braces as well as all percentage signs % to their respective HTML entities in a document.
I'm using Sublime Text 2's nice little star icon/button in Find and Replace. I came up with (\{)|(\})|(\%) to match the chars I need. There might be better ways, but hey... it seems to work.
What would the replacement string look like for this? I mean one expression, not with a programming language.
Basically it's replacing what I find with group $1 with something and then the same for group $2 with something else. Here in pseudo-code:
(\{)|(\})|(\%) ==> $1 replace with { AND $2 replace with } AND... etc.
Is this possible? I can provide some target sample data if needed.
Back story
These three characters can't be placed as is inside an attribute's value in HAML, like
:text => "blabliblu {20% lalla...}"
etc., without being escaped.
The percentage sign could theoretically be escaped with \% but the curly braces can not be escaped with \{ and \}, at least not when i'm preprocessing the HAML with Livereload (Win7). Maybe it's a Ruby thing? Anyhow, I'm going for the HTML entity approach.

Related

How to do regular Expression in AutoIt Script

In Autoit script Iam unable to do Regular expression for the below string Here the numbers will get changed always.
Actual String = _WinWaitActivate("RX_IST2_AM [PID:942564 NPID:10991 SID:498702881] sbivvrwm060.dev.ib.tor.Test.com:30000","")
Here the PID, NPID & SID : will be changing and rest of the things are always constant.
What i have tried below is
_WinWaitActivate("RX_IST2_AM [PID:'([0-9]{1,6})' NPID:'([0-9]{1,5})' SID:'([0-9]{1,9})' sbivvrwm060.dev.ib.tor.Test.com:30000","")
Can someone please help me
As stated in the documentation, you should write the prefix REGEXPTITLE: and surround everything with square brackets, but "escape" all including ones as the dots (.) and spaces () with a backslash (\) and instead of [0-9] you might use \d like "[REGEXPTITLE:RX_IST2_AM\ \[PID:(\d{1,6})\ NPID:(\d{1,5})\ SID:(\d{1,9})\] sbivvrwm060\.dev\.ib\.tor\.Test\.com:30000]" as your parameter for the Win...(...)-Functions.
You can even omit the round brackets ((...)) but keep their content if you don't want to capture the content to process it further like with StringRegExp(...) or StringRegExpReplace(...) - using the _WinWaitActivete(...)-Function it won't make sense anyways as it is only matching and not replacing or returning anything from your regular expression.
According to regex101 both work, with the round brackets and without - you should always use a tool like this site to confirm that your expression is actually working for your input string.
Not familiar with autoit, but remember that regex has to completely match your string to capture results. For example, (goat)s will NOT capture the word goat if your string is goat or goater.
You have forgotten to add a ] in your regex, so your pattern doesn't match the string and capture groups will not be extracted. Also I'm not completely sold on the usage of '. Based on this page, you can do something like StringRegExp(yourstring, 'RX_IST2_AM [PID:([0-9]{1,6}) NPID:([0-9]{1,5}) SID:([0-9]{1,9})]', $STR_REGEXPARRAYGLOBALMATCH) and $1, $2 and $3 would be your results respectively. But maybe your approach works too.

How can I strip double quotes and braces from my strings before insert in Rails4?

I am parsing values from xml and saving them to variables. I was able to strip all but the braces and double quotes from the string. The value displays like this on the page: ["MPEG Video"].
Here is an exampled of the parse saving it to a variable:
#video_format = REXML::XPath.each(media_parse_doc, "//track[#type='Video']/Format/text()") { |element| element }
I tried using .ts like this:
#video_format = (REXML::XPath.each(media_parse_doc, "//track[#type='Video']/Format/text()") { |element| element } ).ts('[]"','')
but it did not work. I saw some examples telling to you gsub and I looked at the api dock for gsub but I am not understanding the thought logic in the examples to be able to apply it correctly to my own case. Here is one of the examples:
"foobar".gsub(/^./, "") # => "oobar"
I understand it is removing te first character but I don't know how to set it up to remove " and [.
Why the /^? Is that ascii for something? Can someone please show me the correct syntax to remove the double quotes and braces from my varialbes and explain the logic process so I can better understand to use on my own in the future?
Thank you for the help!
If you want to understand regular expressions, check out http://rubular.com/.
"foobar".gsub(/^./, "") # => "oobar" that particular example will substitue the first letter of the string with "" (ie, nothing). The reason is that the ^ says "pin the match to the beginning of the string", and the . says "match any character" - so, it'll match any character at the beginning of the string. The encosing / characters are just the standard delimiters for a regular expression - so it's only the ^. that you need to figure out.
To replace double quotes: 'fo"o"bar'.gsub(/"/, "") # => "foobar"
To replace left square bracket: 'fo[o[bar'.gsub(/\[/, "") # => "foobar" (because square brackets are a special character in regex, you have to prefix them with a \ when you want to use them as a 'normal' character.
to replace all quotes and square brackers in one: 'fo[o"[b]"ar'.gsub(/("|\[|\])/, "") # => "foobar"
(the parenthesis indicate a group, and the pipes | indicate 'or'. So, ("|\[|\]) means "match any of the things in this group: a quote, or a left square bracket, or a right square bracket".
But really what you should do is do a good intro tutorial to regular expressions and start from the basics. Once you understand that, it shouldn't be too hard to start composing simple regular expressions of your own.
If you're on a mac, this app is very useful for writing your own regex's: http://krillapps.com/patterns/

Complex regex single quote replace

I have a set of strings for which I would like to replace single quotes by double quotes. But, sometimes the single quote to replace is at the end of the line, sometimes the single quote should be replaced since it follow a S for possessive.
Example :
The song 'Miss you' is featured in The Rolling Stones' album 'Voodoo Lounge'
should be
The song "Miss you" is featured in The Rolling Stones' album "Voodoo Lounge"
Thanks your help :)
Regular expressions can only deal with raw text. It can't tell context or grammar. So it is pretty much impossible to build up a regular expression that will correctly identify the occurrences of non-possessive s characters.
However, if you'd like to ignore such cases, and match rest of them, you can use the following regex with lookaround assertions:
(?<!s)'(?!s\b)
Note that this will not match for valid cases like Blurred Lines, Dangerous etc.
Working demo

RegEx to match string between delimiters or at the beginning or end

I am processing a CSV file and want to search and replace strings as long as it is an exact match in the column. For example:
xxx,Apple,Green Apple,xxx,xxx
Apple,xxx,xxx,Apple,xxx
xxx,xxx,Fruit/Apple,xxx,Apple
I want to replace 'Apple' if it is the EXACT value in the column (if it is contained in text within another column, I do not want to replace). I cannot see how to do this with a single expression (maybe not possible?).
The desired output is:
xxx,GRAPE,Green Apple,xxx,xxx
GRAPE,xxx,xxx,GRAPE,xxx
xxx,xxx,Fruit/Apple,xxx,GRAPE
So the expression I want is: match the beginning of input OR a comma, followed by desired string, followed by a comma OR the end of input.
You cannot put ^ or $ in character classes, so I tried \A and \Z but that didn't work.
([\A,])Apple([\Z,])
This didn't work, sadly. Can I do this with one regular expression? Seems like this would be a common enough problem.
It will depend on your language, but if the one you use supports lookarounds, then you would use something like this:
(?<=,|^)Apple(?=,|$)
Replace with GRAPE.
Otherwise, you will have to put back the commas:
(^|,)Apple(,|$)
Or
(\A|,)Apple(,|\Z)
And replace with:
\1GRAPE\2
Or
$1GRAPE$2
Depending on what's supported.
The above are raw regex (and replacement) strings. Escape as necessary.
Note: The disadvatage with the latter solution is that it will not work on strings like:
xxx,Apple,Apple,xxx,xxx
Since the comma after the first Apple got consumed. You'd have to call the regex replacement at most twice if you have such cases.
Oh, and I forgot to mention, you can have some 'hybrids' since some language have different levels of support for lookbehinds (in all the below ^ and \A, $ and \Z, \1 and $1 are interchangeable, just so I don't make it longer than it already is):
(?:(?<=,)|(?<=^))Apple(?=,|$)
For those where lookbehinds cannot be of variable width, replace with GRAPE.
(^|,)Apple(?=,|$)
And the above one for where lookaheads are supported but not lookbehinds. Replace with \1Apple.
This does as you wish:
Find what: (^|,)(?:Apple)(,|$)
Replace with: $1GRAPE$2
This works on regex101, in all flavors.
http://regex101.com/r/iP6dZ8
I wanted to share my original work-around (before the other answers), though it feels like more of a hack.
I simply prepend and append a comma on the string before doing the simpler:
/,Apple,/,GRAPE,/g
then cut off the first and last character.
PHP looks like:
$line = substr(preg_replace($search, $replace, ','.$line.','), 1, -1);
This still suffers from the problem of consecutive columns (e.g. ",Apple,Apple,").

JSP Tag Spacing Regex

We are suppose to migrate all our apps from one type of server to another. The new servers do not accept invalid JSP tags where a space is not inserted between the attributes. For example, the following.
<input type="text"name="myField" />
The following regex was given to us to use, but it seems to not be perfect.
[\w.-]+[\s]*=[\s]*"[^"]+"[^\s/%>]
For example, it returns string assignments like the following.
span.style.fontWeight = "bold";
Can anyone suggest a better regex for locating just the invalid JSP code?
UPDATE
I was this regex to work using the Eclipse Search > File functionality.
Try simply this RegEx: (<.+?[^" ]+?="[^"]+?")([^ ]+?)(.+?>). Will locate all "tags" with a " not followed by a space. Then you can replace the captured groups like this: $1 $2$3 to add a space.
Tenub's answer is nearly correct, but as Rachel G. mentioned, it will return false positives when the closing bracket immediately follows the closing quotation mark.
(<[^?%].+?[^" ]+?="[^"]+?")([^/ >]+?)([^>]*(?:/|\?|%)?>)
Should give you the results you're after.
Disclaimer: This is not a strict checker. You could have a tag such as <..." asdf/> go undetected, but as the tags are presumably well formed enough to work under the old system, this should be sufficient.
Simple version:
Find: (=\s*"[^"]*")(\w)
Replace with: $1 $2
Explanation
The find regex looks for = followed by optional whitespace followed by "...", immediately followed by a single alphanumeric character or underscore.
It's separated out into two capturing groups, which are represented by $1 and $2 in the replace expression - with a space inserted between them.
[Minor Issue: This won't work for attribute values that include escaped double quotation marks. Haven't addressed this as am assuming it is pretty unlikely. However, it justifies doing a manual find/replace rather than "replace all" just in case.]