JSP Tag Spacing Regex - regex

We are suppose to migrate all our apps from one type of server to another. The new servers do not accept invalid JSP tags where a space is not inserted between the attributes. For example, the following.
<input type="text"name="myField" />
The following regex was given to us to use, but it seems to not be perfect.
[\w.-]+[\s]*=[\s]*"[^"]+"[^\s/%>]
For example, it returns string assignments like the following.
span.style.fontWeight = "bold";
Can anyone suggest a better regex for locating just the invalid JSP code?
UPDATE
I was this regex to work using the Eclipse Search > File functionality.

Try simply this RegEx: (<.+?[^" ]+?="[^"]+?")([^ ]+?)(.+?>). Will locate all "tags" with a " not followed by a space. Then you can replace the captured groups like this: $1 $2$3 to add a space.

Tenub's answer is nearly correct, but as Rachel G. mentioned, it will return false positives when the closing bracket immediately follows the closing quotation mark.
(<[^?%].+?[^" ]+?="[^"]+?")([^/ >]+?)([^>]*(?:/|\?|%)?>)
Should give you the results you're after.
Disclaimer: This is not a strict checker. You could have a tag such as <..." asdf/> go undetected, but as the tags are presumably well formed enough to work under the old system, this should be sufficient.

Simple version:
Find: (=\s*"[^"]*")(\w)
Replace with: $1 $2
Explanation
The find regex looks for = followed by optional whitespace followed by "...", immediately followed by a single alphanumeric character or underscore.
It's separated out into two capturing groups, which are represented by $1 and $2 in the replace expression - with a space inserted between them.
[Minor Issue: This won't work for attribute values that include escaped double quotation marks. Haven't addressed this as am assuming it is pretty unlikely. However, it justifies doing a manual find/replace rather than "replace all" just in case.]

Related

Notepad++ replace text with RegEx search result

I would like replace a standard string in a file, with another that is a result of a regular expression. The standard text looks like:
<xsl:variable name="ServiceCode" select="###"/>
I would like to replace ### with a servicecode, that I can find later in the same file, from this URL:
<a href="/Services/xyz" target="_self">
The regular expression (?<=\/Services\/)(.*)(?=\" )
returns the required service code "xyz".
So, I opened Notepad++, added "###" to the "Find what" and this RegEx to the "Replace with" section, and expected that the ### text will be replaced by xyz.
But I got this result:
<xsl:variable name="ServiceCode" select="?<=/Services/.*?=" "/>
I am new to RegEx, do I need to use different syntax in the replace section than I use to find a string? Can someone give me a hint how to achieve the required result? The goal is to standardize tons of files with similar structure as now all servicecodes are hardcoded in several places in the file. Thanks.
You could use a lookahead for capturing the part ahead.
Search for: (?s)###(?=.*/Services/([^"]+)") and replace with: $1
(?s) makes the dot also match newlines (there is also a checkbox available in np++)
[^"] matches a character that is not "
The replacement $1 corresponds to capture of first parenthesized subpattern.
I am no expert at RegEx but I think I may be able to help. It looks like you might be going at this the wrong way. The regex search that you are using would normally work like this:
The parenthesis () in RegEx allow you to select part of your search and use that in the replace section.
You place (?<=\/Services\/)(.*)(?=\" ) into the "Find what" section in Notepad++.
Then in the "Replace with" section you could use \1 or \2 or \3 to replace the contents of your search with what was found in the (?<=\/Services\/) or (.*) or (?=\" ) searches respectively.
Depending on the structure of your files, you would need to use a RegEx search that selects both lines of code (and the specific parts you need), then use a combination of \1\2\3 etc. to replace everything exactly how it was, except for the ### which you could replace with the \number associated with xyz.
See http://docs.notepad-plus-plus.org/index.php/Regular_Expressions for more info.

How to do regular Expression in AutoIt Script

In Autoit script Iam unable to do Regular expression for the below string Here the numbers will get changed always.
Actual String = _WinWaitActivate("RX_IST2_AM [PID:942564 NPID:10991 SID:498702881] sbivvrwm060.dev.ib.tor.Test.com:30000","")
Here the PID, NPID & SID : will be changing and rest of the things are always constant.
What i have tried below is
_WinWaitActivate("RX_IST2_AM [PID:'([0-9]{1,6})' NPID:'([0-9]{1,5})' SID:'([0-9]{1,9})' sbivvrwm060.dev.ib.tor.Test.com:30000","")
Can someone please help me
As stated in the documentation, you should write the prefix REGEXPTITLE: and surround everything with square brackets, but "escape" all including ones as the dots (.) and spaces () with a backslash (\) and instead of [0-9] you might use \d like "[REGEXPTITLE:RX_IST2_AM\ \[PID:(\d{1,6})\ NPID:(\d{1,5})\ SID:(\d{1,9})\] sbivvrwm060\.dev\.ib\.tor\.Test\.com:30000]" as your parameter for the Win...(...)-Functions.
You can even omit the round brackets ((...)) but keep their content if you don't want to capture the content to process it further like with StringRegExp(...) or StringRegExpReplace(...) - using the _WinWaitActivete(...)-Function it won't make sense anyways as it is only matching and not replacing or returning anything from your regular expression.
According to regex101 both work, with the round brackets and without - you should always use a tool like this site to confirm that your expression is actually working for your input string.
Not familiar with autoit, but remember that regex has to completely match your string to capture results. For example, (goat)s will NOT capture the word goat if your string is goat or goater.
You have forgotten to add a ] in your regex, so your pattern doesn't match the string and capture groups will not be extracted. Also I'm not completely sold on the usage of '. Based on this page, you can do something like StringRegExp(yourstring, 'RX_IST2_AM [PID:([0-9]{1,6}) NPID:([0-9]{1,5}) SID:([0-9]{1,9})]', $STR_REGEXPARRAYGLOBALMATCH) and $1, $2 and $3 would be your results respectively. But maybe your approach works too.

Regex match between two regex expressions

This has been driving me crazy, I can't find a solution that works!
I'm trying to do a regex between a couple of tags, bad idea I've heard but necessary this time :P
What I have at the start is a <body class="foo"> where foo can vary between files - <body.*?> search works fine to locate the only copy in each file.
At the end I have a <div id="bar">, bar doesn't change between files.
eg.
<body class="foo">
sometext
some more text
<maybe even some tags>
<div id="bar">
What I need to do is select everything between the two tags but not including them - everything between the closing > on body and the opening < on div - sometext to maybe even some tags.
I've tried a bunch of things, mostly variations on (?<=<body.*>)(.*?)(?=<div id="bar">) but I'm actually getting invalid expressions at worst on notepad++, http://regexpal.com/ and no matches at best.
Any help appreciated!
You are attempting to implement variable-length lookbehind in which most regular expression languages and notepad++ does not support. I assume you are using notepad++ so you can use the \K escape sequence.
<body[^>]*>\K.*?(?=<div id="bar">)
The \K escape sequence resets the starting point of the reported match and any previously consumed characters are no longer included. Make sure you have the . matches newline checkbox checked as well.
Alternatively, you can use a capturing group and avoid using lookaround assertions.
<body[^>]*>(.*?)<div id="bar">
Note: Using a capturing group, you can refer to group index "1" to get your match result.
Use the following pattern:
/<body[^>]*>(.*?)<div id="bar">/

regex find and replace multiple chars of different kinds in one expression

I need to replace all left { and right } curly braces as well as all percentage signs % to their respective HTML entities in a document.
I'm using Sublime Text 2's nice little star icon/button in Find and Replace. I came up with (\{)|(\})|(\%) to match the chars I need. There might be better ways, but hey... it seems to work.
What would the replacement string look like for this? I mean one expression, not with a programming language.
Basically it's replacing what I find with group $1 with something and then the same for group $2 with something else. Here in pseudo-code:
(\{)|(\})|(\%) ==> $1 replace with { AND $2 replace with } AND... etc.
Is this possible? I can provide some target sample data if needed.
Back story
These three characters can't be placed as is inside an attribute's value in HAML, like
:text => "blabliblu {20% lalla...}"
etc., without being escaped.
The percentage sign could theoretically be escaped with \% but the curly braces can not be escaped with \{ and \}, at least not when i'm preprocessing the HAML with Livereload (Win7). Maybe it's a Ruby thing? Anyhow, I'm going for the HTML entity approach.

Matching all occurrences of a html element attribute in notepad++ regex

I have a file which has hundreds of links like this:
<h3>aspnet</h3>
Ex 1
Ex 2
Ex 3
So I want to remove all the elements
icon="..."
from all the lines. I went through the official Notepad++ regex wiki and have come up with this after several trials:
icon=\"[^\.]+\"
The problem with this is, it is selecting past the second double quote and stopping at the next occurring double quote. To illustrate, this will select the following content:
icon="data:image/png;base64,...jbvebich4sec9zgth1sfue1cdt...">EX 1</a> <a href="
If I modify the above regex to,
icon=\"[^\.]+\">
Then it is almost perfect, but it is also selecting the >:
icon="data:image/png;base64,...jbvebich4sec9zgth1sfue1cdt...">
The regex I am looking for would select like this:
icon="data:image/png;base64,...jbvebich4sec9zgth1sfue1cdt..."
I also tried the following, but it doesn't match anything at all
icon=\"[^\.]+\"$
Just match anything but a quote, followed by a quote:
icon="[^"]+"
Just tested with notepad++ 6.2.2 and confirmed that this matches correctly as written.
Broken down:
icon="
This is fairly obvious, match the literal text icon=".
[^"]+
This means to match any character that is not a ". Adding the + after it means "one or more times."
Finally we match another literal ".
I am not a notepad++ user. so don't know how notepad++ plays with regex, but can you try to replace
icon=\"[^>]* to (empty string) ?
Try this solution:
This is I just check was working as you wanted it.
The way achieving your goal:
Find what: (icon.*")|.*?
Replace with: $1