(simple) AHK: RegexMatch "\n[^\n]$" doesn't work - regex

What am I doing wrong here?
Shells := "`nAlpha`nBetta`nOmega"
RegexMatch(Shells, "\n[^\n]$", LastLetter)
MsgBox % "The last letter is: " . LastLetter
The last letter should be Omega, but it doesn't happen so in my case.
EDIT:
1) "`n" is a single LineFeed character.
2) LastLetter is (the name of) a variable that must contain string "`nOmega".

You have to use a quantifier in addition to \z token (I'm not sure how multi-line mode is handled in AutoHotKey regex engine but you can leave $ intact if multi-line mode is off by default):
RegexMatch(Shells, "\n[^\n]*\z", LastLetter)

Related

regexp for renaming multiple files using 'rename.pl'

I'm using rename.pl to rename a multiple files. I am having trouble coming up with the right regexp. My file names are of the form:
nn.some.title.string.ext
I want to change just the first '.' to ' - '. I thought this would work but it does not.
s/\.?/ - /
Can someone help me out with this? TIA.
\.? can match a sequence of zero characters, so s/\.?/ - / replaces the dot or the empty string at the start of the input.
"abc.def.ghi" ⇒ " - abc.def.ghi"
".abc" ⇒ " - abc"
To replace the first ., you can use the following:
s/\./ - /
"abc.def.ghi" ⇒ "abc - def.ghi"
To substitute all . but a leading one or the one in the extension, you can use the following:
s/(?!^)\.(?!\w+\z)/ - /g
Probably you will want to make sure that firs point is not the last. I mean if by any chance you will have nn_some_title_string.ext file name the script will not change a last dot.
$fileName = "nn.some.title.string.ext";
$fileName =~s/\.(?=\w+\.\w+)/-/;
print "FileName: " . $fileName ."\n";
Just try this for simple regex pattern.
my $str = "nn.some.title.string.ext";
$str=~s/^([^\.]*)\./$1-/i;
or
$str=~s/\./-/;
#^([^\.]*) Starting point upto the first dot
print $str;

Using preg_match in php to verify username ( or others )

I read through the similar questions that already been asked but I still couldnn't get it right .
http://regexr.com/39b64
\S should return everything on the keyboard except space , tab and enter .
^$ should be a whole match as it starts from ^ and ends with $ .
There was a link that also uses something similar like the above with addition of {0,} which should be infinite letters but it doesn't work on regexr.com when I tested .
Another link suggested to remove the $ and replace it with \z but it doesn't work on regexr.com as well .
I'm planning to user preg_match to see whether a not the username enter is with all characters on the keyboard except space , tab and enter .
Username = "abcCD0123_" valid
Username = "abcCD0123_!##$%^&)_[]-=\',;/`~) valid
Username = " abcd123~!#$##%[];,.;'" invalid
Username = "abcd123~!#$##%[];,.;' " invalid
Username = " abcd123~ !#$##%[];,.;' " invalid
Something like that cause' I read about a question where someone suggested to do the verification matching on the php side instead of html side for security reasons .
edit : I tried ...
/^[\S]+$/
/^[\S]*$/
/^[\S]{0,}$/
/^[^\s\S]+$/
/^[^\s\S]*$/
/^[^\s\S]{0,}$/
/^[A-Za-z0-9~!##$%^&*()_+{}|:"<>?`-=[]\;',./]+$/
/^[A-Za-z0-9~!##$%^&*()_+{}|:"<>?`-=[]\;',./]*$/
/^[A-Za-z0-9~!##$%^&*()_+{}|:"<>?`-=[]\;',./]{0,}$/
( something like this for this i can't remember cause' I modified a lot of times on this one )
You can just check that the line is composed by only non-space characters (demo):
/^\S+$/
Strings with multiple lines
The regex assumes that you are checking a single username at time (what you probably want to do in your code). But as shown in the demo and as described by user3218114 in his answer, if you have a multiple line string, you need to use the m flag to allow ^ and $ to match also for begin end of each line (otherwise it will just match begin/end of the string). This is probably why your tests weren't working.
/^\S+$/m
You need to use m (PCRE_MULTILINE) modifier if you want to use ^ and $
When this modifier is set, the "start of line" and "end of line" constructs match immediately following or immediately before any newline in the subject string, respectively, as well as at the very start and end.
Here is demo to check for any string/line that contains non white space and length is in between 8 to 50
^[^\s]{8,50}$
Online demo
OR
^\S{8,50}$
Online demo
Sample code: (Focus on m modifier)
$re = "/^[^\\s]{8,50}$/m";
$str = "...";
preg_match_all($re, $str, $matches);
Based on your examples I suppose you want to have something like that:
/^[a-z0-9_!##$%&^()_\[\]=\\',;\/~`-]+$/i
It's one character group [] which contains all allowed characters. Note, however, that one cannot just put all chars in there, some characters have special meanins in regexp and must be escaped by \ ([,],(),),/ and \ itself). You also have to be careful where to put -. In the case of a-z it means all characters between a and z (including a and z). That's why I put the - char itself at the end.
To match really everything except white-space use /^\S+$/

Help With Particular Regular Expression - Not Containing Some String

How do I say, in regular expressions:
Any portion of a string beginning with a capital letter, containing at least one space character, not containing the string
" _ " (space underscore space), and ending with the string "!!!" (without the quotes)?
I am having trouble with the "not containing" part.
Here is what I have so far:
[A-Z].* .*!!!
How do I modify this to also specify "Not containing ' _ '"?
It does not need to be the specific string " _ ". How can I say "not containing" ANY string? For instance not containing "dog"?
Edit: I'd like the solution to be compatible with Php's "preg_replace"
Edit: Examples:
Examples for " _ ":
Abc xyz!!! <---Matches
Hello World!!! <---Matches
Has _ Space Underscore Space!!! <--- Does Not Match
Examples for "dog":
What a dog!!! <--- Does not match, (contains "dog")
Hello World!!! <--- Matches
The x(?!y) expression matches x only if it is not immediately followed by y. So, this seems to be the thing you want:
[A-Z](?!%s)(.(?!%s))* (.(?!%s))*!!!
Where %s is your forbidden string.
Any possible regex for this would be probably much more complicated than two regexes. One like yours: [A-Z].* .*!!! and the second applied on matched strings and checking whether _ is contained.
Start with Any capital; have an optional string of everything except an underscore, a space, and then the everything except underscore again, followed by three exclamation marks.
[A-Z][^_]*[ ][^_]*!!!
First test your string for any occurance of " _ ", since that is a no match. Then check for what you want.
That's what I would do instead of spending a lot of time trying to figure out one regular expression for it.
Here's a nice site for testing your expressions: Nregex
Edit: I read some more on your question and see that my answer wasn't really good, so here's another attempt. A modification of one of the expressions above:
[A-Z](?! _ )(\w(?! _ ))* (\w(?! _ ))*!!!
[A-Z]([^ ]*(?! _ ) ?)*!!!
Edit: I missed your requirement for at least one space. The below regex includes that requirement:
[A-Z][^ ]* (?!_ )((?! _ ).)*!!!
I used to resolve it via grep's -v option (if I'm on Linux and/or can use grep).
So search for something, then skip the uninteresting parts.
grep something <INPUT> | grep -v uninteresting
Without grep (damn windows, without admin rights) but with Vim:
vim -c "v/i'm searching for this/d" -c "g/and don't need this/d" -c "w checkoutput" <INPUT>
(This opens , then deletes every line what does not match what you need, then deletes every line, what you does not need, then save the results as checkoutput. Which you should check.)
HTH
There is a nice little program in which you can built your regex together with testing it.
http://software.marioschneider-online.de/?C%23%3A_RegEx_Test

Regex for quoted string with escaping quotes

How do I get the substring " It's big \"problem " using a regular expression?
s = ' function(){ return " It\'s big \"problem "; }';
/"(?:[^"\\]|\\.)*"/
Works in The Regex Coach and PCRE Workbench.
Example of test in JavaScript:
var s = ' function(){ return " Is big \\"problem\\", \\no? "; }';
var m = s.match(/"(?:[^"\\]|\\.)*"/);
if (m != null)
alert(m);
This one comes from nanorc.sample available in many linux distros. It is used for syntax highlighting of C style strings
\"(\\.|[^\"])*\"
As provided by ePharaoh, the answer is
/"([^"\\]*(\\.[^"\\]*)*)"/
To have the above apply to either single quoted or double quoted strings, use
/"([^"\\]*(\\.[^"\\]*)*)"|\'([^\'\\]*(\\.[^\'\\]*)*)\'/
Most of the solutions provided here use alternative repetition paths i.e. (A|B)*.
You may encounter stack overflows on large inputs since some pattern compiler implements this using recursion.
Java for instance: http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6337993
Something like this:
"(?:[^"\\]*(?:\\.)?)*", or the one provided by Guy Bedford will reduce the amount of parsing steps avoiding most stack overflows.
/(["\']).*?(?<!\\)(\\\\)*\1/is
should work with any quoted string
"(?:\\"|.)*?"
Alternating the \" and the . passes over escaped quotes while the lazy quantifier *? ensures that you don't go past the end of the quoted string. Works with .NET Framework RE classes
/"(?:[^"\\]++|\\.)*+"/
Taken straight from man perlre on a Linux system with Perl 5.22.0 installed.
As an optimization, this regex uses the 'posessive' form of both + and * to prevent backtracking, for it is known beforehand that a string without a closing quote wouldn't match in any case.
This one works perfect on PCRE and does not fall with StackOverflow.
"(.*?[^\\])??((\\\\)+)?+"
Explanation:
Every quoted string starts with Char: " ;
It may contain any number of any characters: .*? {Lazy match}; ending with non escape character [^\\];
Statement (2) is Lazy(!) optional because string can be empty(""). So: (.*?[^\\])??
Finally, every quoted string ends with Char("), but it can be preceded with even number of escape sign pairs (\\\\)+; and it is Greedy(!) optional: ((\\\\)+)?+ {Greedy matching}, bacause string can be empty or without ending pairs!
An option that has not been touched on before is:
Reverse the string.
Perform the matching on the reversed string.
Re-reverse the matched strings.
This has the added bonus of being able to correctly match escaped open tags.
Lets say you had the following string; String \"this "should" NOT match\" and "this \"should\" match"
Here, \"this "should" NOT match\" should not be matched and "should" should be.
On top of that this \"should\" match should be matched and \"should\" should not.
First an example.
// The input string.
const myString = 'String \\"this "should" NOT match\\" and "this \\"should\\" match"';
// The RegExp.
const regExp = new RegExp(
// Match close
'([\'"])(?!(?:[\\\\]{2})*[\\\\](?![\\\\]))' +
'((?:' +
// Match escaped close quote
'(?:\\1(?=(?:[\\\\]{2})*[\\\\](?![\\\\])))|' +
// Match everything thats not the close quote
'(?:(?!\\1).)' +
'){0,})' +
// Match open
'(\\1)(?!(?:[\\\\]{2})*[\\\\](?![\\\\]))',
'g'
);
// Reverse the matched strings.
matches = myString
// Reverse the string.
.split('').reverse().join('')
// '"hctam "\dluohs"\ siht" dna "\hctam TON "dluohs" siht"\ gnirtS'
// Match the quoted
.match(regExp)
// ['"hctam "\dluohs"\ siht"', '"dluohs"']
// Reverse the matches
.map(x => x.split('').reverse().join(''))
// ['"this \"should\" match"', '"should"']
// Re order the matches
.reverse();
// ['"should"', '"this \"should\" match"']
Okay, now to explain the RegExp.
This is the regexp can be easily broken into three pieces. As follows:
# Part 1
(['"]) # Match a closing quotation mark " or '
(?! # As long as it's not followed by
(?:[\\]{2})* # A pair of escape characters
[\\] # and a single escape
(?![\\]) # As long as that's not followed by an escape
)
# Part 2
((?: # Match inside the quotes
(?: # Match option 1:
\1 # Match the closing quote
(?= # As long as it's followed by
(?:\\\\)* # A pair of escape characters
\\ #
(?![\\]) # As long as that's not followed by an escape
) # and a single escape
)| # OR
(?: # Match option 2:
(?!\1). # Any character that isn't the closing quote
)
)*) # Match the group 0 or more times
# Part 3
(\1) # Match an open quotation mark that is the same as the closing one
(?! # As long as it's not followed by
(?:[\\]{2})* # A pair of escape characters
[\\] # and a single escape
(?![\\]) # As long as that's not followed by an escape
)
This is probably a lot clearer in image form: generated using Jex's Regulex
Image on github (JavaScript Regular Expression Visualizer.)
Sorry, I don't have a high enough reputation to include images, so, it's just a link for now.
Here is a gist of an example function using this concept that's a little more advanced: https://gist.github.com/scagood/bd99371c072d49a4fee29d193252f5fc#file-matchquotes-js
here is one that work with both " and ' and you easily add others at the start.
("|')(?:\\\1|[^\1])*?\1
it uses the backreference (\1) match exactley what is in the first group (" or ').
http://www.regular-expressions.info/backref.html
One has to remember that regexps aren't a silver bullet for everything string-y. Some stuff are simpler to do with a cursor and linear, manual, seeking. A CFL would do the trick pretty trivially, but there aren't many CFL implementations (afaik).
A more extensive version of https://stackoverflow.com/a/10786066/1794894
/"([^"\\]{50,}(\\.[^"\\]*)*)"|\'[^\'\\]{50,}(\\.[^\'\\]*)*\'|“[^”\\]{50,}(\\.[^“\\]*)*”/
This version also contains
Minimum quote length of 50
Extra type of quotes (open “ and close ”)
If it is searched from the beginning, maybe this can work?
\"((\\\")|[^\\])*\"
I faced a similar problem trying to remove quoted strings that may interfere with parsing of some files.
I ended up with a two-step solution that beats any convoluted regex you can come up with:
line = line.replace("\\\"","\'"); // Replace escaped quotes with something easier to handle
line = line.replaceAll("\"([^\"]*)\"","\"x\""); // Simple is beautiful
Easier to read and probably more efficient.
If your IDE is IntelliJ Idea, you can forget all these headaches and store your regex into a String variable and as you copy-paste it inside the double-quote it will automatically change to a regex acceptable format.
example in Java:
String s = "\"en_usa\":[^\\,\\}]+";
now you can use this variable in your regexp or anywhere.
(?<="|')(?:[^"\\]|\\.)*(?="|')
" It\'s big \"problem "
match result:
It\'s big \"problem
("|')(?:[^"\\]|\\.)*("|')
" It\'s big \"problem "
match result:
" It\'s big \"problem "
Messed around at regexpal and ended up with this regex: (Don't ask me how it works, I barely understand even tho I wrote it lol)
"(([^"\\]?(\\\\)?)|(\\")+)+"

Replace patterns that are inside delimiters using a regular expression call

I need to clip out all the occurances of the pattern '--' that are inside single quotes in long string (leaving intact the ones that are outside single quotes).
Is there a RegEx way of doing this?
(using it with an iterator from the language is OK).
For example, starting with
"xxxx rt / $ 'dfdf--fggh-dfgdfg' ghgh- dddd -- 'dfdf' ghh-g '--ggh--' vcbcvb"
I should end up with:
"xxxx rt / $ 'dfdffggh-dfgdfg' ghgh- dddd -- 'dfdf' ghh-g 'ggh' vcbcvb"
So I am looking for a regex that could be run from the following languages as shown:
+-------------+------------------------------------------+
| Language | RegEx |
+-------------+------------------------------------------+
| JavaScript | input.replace(/someregex/g, "") |
| PHP | preg_replace('/someregex/', "", input) |
| Python | re.sub(r'someregex', "", input) |
| Ruby | input.gsub(/someregex/, "") |
+-------------+------------------------------------------+
I found another way to do this from an answer by Greg Hewgill at Qn138522
It is based on using this regex (adapted to contain the pattern I was looking for):
--(?=[^\']*'([^']|'[^']*')*$)
Greg explains:
"What this does is use the non-capturing match (?=...) to check that the character x is within a quoted string. It looks for some nonquote characters up to the next quote, then looks for a sequence of either single characters or quoted groups of characters, until the end of the string. This relies on your assumption that the quotes are always balanced. This is also not very efficient."
The usage examples would be :
JavaScript: input.replace(/--(?=[^']*'([^']|'[^']*')*$)/g, "")
PHP: preg_replace('/--(?=[^\']*'([^']|'[^']*')*$)/', "", input)
Python: re.sub(r'--(?=[^\']*'([^']|'[^']*')*$)', "", input)
Ruby: input.gsub(/--(?=[^\']*'([^']|'[^']*')*$)/, "")
I have tested this for Ruby and it provides the desired result.
This cannot be done with regular expressions, because you need to maintain state on whether you're inside single quotes or outside, and regex is inherently stateless. (Also, as far as I understand, single quotes can be escaped without terminating the "inside" region).
Your best bet is to iterate through the string character by character, keeping a boolean flag on whether or not you're inside a quoted region - and remove the --'s that way.
If bending the rules a little is allowed, this could work:
import re
p = re.compile(r"((?:^[^']*')?[^']*?(?:'[^']*'[^']*?)*?)(-{2,})")
txt = "xxxx rt / $ 'dfdf--fggh-dfgdfg' ghgh- dddd -- 'dfdf' ghh-g '--ggh--' vcbcvb"
print re.sub(p, r'\1-', txt)
Output:
xxxx rt / $ 'dfdf-fggh-dfgdfg' ghgh- dddd -- 'dfdf' ghh-g '-ggh-' vcbcvb
The regex:
( # Group 1
(?:^[^']*')? # Start of string, up till the first single quote
[^']*? # Inside the single quotes, as few characters as possible
(?:
'[^']*' # No double dashes inside theses single quotes, jump to the next.
[^']*?
)*? # as few as possible
)
(-{2,}) # The dashes themselves (Group 2)
If there where different delimiters for start and end, you could use something like this:
-{2,}(?=[^'`]*`)
Edit: I realized that if the string does not contain any quotes, it will match all double dashes in the string. One way of fixing it would be to change
(?:^[^']*')?
in the beginning to
(?:^[^']*'|(?!^))
Updated regex:
((?:^[^']*'|(?!^))[^']*?(?:'[^']*'[^']*?)*?)(-{2,})
Hm. There might be a way in Python if there are no quoted apostrophes, given that there is the (?(id/name)yes-pattern|no-pattern) construct in regular expressions, but it goes way over my head currently.
Does this help?
def remove_double_dashes_in_apostrophes(text):
return "'".join(
part.replace("--", "") if (ix&1) else part
for ix, part in enumerate(text.split("'")))
Seems to work for me. What it does, is split the input text to parts on apostrophes, and replace the "--" only when the part is odd-numbered (i.e. there has been an odd number of apostrophes before the part). Note about "odd numbered": part numbering starts from zero!
You can use the following sed script, I believe:
:again
s/'\(.*\)--\(.*\)'/'\1\2'/g
t again
Store that in a file (rmdashdash.sed) and do whatever exec magic in your scripting language allows you to do the following shell equivalent:
sed -f rmdotdot.sed < file containing your input data
What the script does is:
:again <-- just a label
s/'\(.*\)--\(.*\)'/'\1\2'/g
substitute, for the pattern ' followed by anything followed by -- followed by anything followed by ', just the two anythings within quotes.
t again <-- feed the resulting string back into sed again.
Note that this script will convert '----' into '', since it is a sequence of two --'s within quotes. However, '---' will be converted into '-'.
Ain't no school like old school.