Autohotkey variable assignment with dynamic hotstrings module - regex

I have been pointed toward this AutoHotKey module which allows for dynamic hotstrings.
One of the examples is the calculation of percentages while typing (e.g. 5/40% will become 8%). To do this, the following code is necessary:
hotstrings("(\d+)\/(\d+)%", "percent")
percent:
p := Round($1 / $2 * 100)
Send, %p%`%
Return
I want to use this module to replace dots . with middle dots ∙ within words. I have figured out how to "find" the text, but not how to replace it correctly. I need to reference the initial text in order to put it in the replacement text. In the above code, using p := Round($1 / $2 * 100) uses the numbers input to calculate the percentage, but I can't figure out how to do the same with letters.
My code is the following:
hotstrings("[a-z]\.[a-z](\.[a-z])*", "word")
word:
a := $1
b := $2
Send, a{U+22C5}b
Return
But this just replaces the whole thing with a single middle dot and doesn't replace the surrounding letters. Also, I don't know how to consider the possibility of multiple dots (a.b.c.d for example). In python, I'd do a for loop but I don't really know AutoHotKey.
How can I do this?
Thanks

Few problems here.
First one is the regex.
Firstly, you don't really want to think of the regex as if it was matching an infinitely long string of a.b.c.d.e.f.g.h.i.j.k.l.... instead you just want to think of a single case x.y. Those cases can be right next to each other.
So get rid of (\.[a-z])*.
Secondly, you don't have capture groups. Or well, you do have one in there, but I'm assuming you accidentally did it. If you're not yet familiar with Regex capture groups, I'd recommend learning them, they're quite useful in certain cases (like here!).
But anyway, to create capture groups, you just put ( ) around the part of the Regex you want to capture.
So you want to capture the characters before and after the . (or well, actually only the latter one, this approach will have a problem, more on that later). So your Regex would now look like this:
([a-z])\.([a-z])
Upon a match, the hotstrings() function would output two variables, $1 and $2 (that's all they are, names of variables).
When you refer to the variables, $1 gives you the character before the ., and $2 gives you the character after the ..
So now we get onto the second problem, referring to the capture group variables.
a := $1
b := $2
Send, a{U+22C5}b
Here you create the variables a and b for no reason, though that's not an issue of course, but how you try to refer to the variables a and b is a problem.
You're using a send command, so you're in legacy AHK syntax. In legacy AHK syntax you refer to variables by wrapping them around in %%.
So your send command would look like this:
Send, %a%{U+22C5}%b%
But lets not write legacy AHK (even though the hotstrings() function totally is legacy AHK).
To switch over to modern AHK (expression syntax) we would do specify a single % followed up by a space. And then we can do this:
SendInput, % $1 "{U+22C5}" $2
Also skipped defining the useless variables a and b and switched over SendInput due to it being the recommended faster and more reliable send mode.
And now would have an almost working script like so:
hotstrings("([a-z])\.([a-z])", "word")
return
word:
SendInput, % $1 "{U+22C5}" $2
Return
It just would have the problem of chaining multiple a.b.c.d.e.f.g... not working very well. But that's fine, since the Regex could do with more improvements.
We want to use a positive lookbehind and capture only the character after the . like so:
(?<=[a-z])\.([a-z])
Also, I'd say it would be fitting to replace [a-z] with \w (match any word character). So the Regex and the whole script would be:
hotstrings("(?<=\w)\.(\w)", "word")
return
word:
SendInput, % "{U+22C5}" $1
Return
And now it should work just as requested.
And if my talks about legacy vs modern AHK confuse you (that's to be expected if you don't know the difference), I'd recommend giving e.g. this a read:
https://www.autohotkey.com/docs/Language.htm

Related

perl regex, remove what is captured

I've successfully captured data with this:
/^.{144}(.{15}).{34}(.{1})/
which results in this:
TTGGCCCCCACTCTC T
I want to remove the same characters from the same locations. I tried a simple substitution:
s/^.{144}(.{15}).{34}(.{1})//
That removes everything described. How do I remove only (...)?
Substitution works like
s/match/replace/
So it will replace youre complete "match" with "replace". If you want to keep part of your match, you must set references of the groups in the replacement string.
s/^.{144}(.{15}).{34}(.{1})// # replace all with nothing
s/^.{144}(.{15}).{34}(.{1})/$1/ # replace all with group 1 (.{15}) -> not what you want
s/^(.{144}).{15}(.{34}).{1}/$1$2/ # keeps group 1 and 2 and removes ".{15}" between them and all at the end.
The last one you need.
Try regex101. There you can give a pattern and it shows you the groups. There is a debugger, too.
The replacement side in the regex is substituted instead of everything that was matched (while there are ways to alter this to some extent), so you need to capture things intended to be kept as well, and put them back in the replacement side. Like
$var =~ s/^(.{144})(.{15})(.{34})(.)(.*)/$1$3$5/;
(the last capture was added in a comment)   or
$var =~ s/^(.{144})\K(.{15})(.{34})(.)(.*)/$3$5/;
Now the 15 chars and the single char are removed from $var, while you still have all of $N (1--5) available to work with as needed. (In the second version the \K keeps all matches previous to it so that they are not getting replaced, and thus we don't need $1 in the replacement side.) Please see perlretut for details.
However, as a comment enlightens us, there is a problem with this: It is not known before runtime which groups need be kept! So it could be 1,3,5 or perhaps 2 and 4 (or 7 and 11?).
What need be kept becomes known, and need be set, before the regex runs.
One way to do that: once the list of capture groups to keep is known store their indices in an array, then capture all matches into an array† and form the replacement and rewrite the string by hand
my #keep_idx = qw(0 2 4); # indices of capture groups to keep
my #captures = $var =~ /^(.{144})(.{15})(.{34})(.)(.*)/;
# Rewrite the variable using only #keep_idx -indexed captures
$var = join '', grep { defined } #captures[#keep_idx];
# Use #captures as needed...
The code above simply filters by grep any possibly non-existent "captures" -- a pattern may allow for a variable number of capture groups (so there may not exist group #5 for example). But I'd rather check those #captures explicitly (were there as many as expected? were they all of the expected form? etc).
There are other ways to do this.‡
† In newer perls (from version 5.25.7) there is the #{^CAPTURE} predefined variable with all captures, so one can run the match $var =~ /.../; and then use it. No need to assign captures.
‡ I'd like to mention one way that may be tempting, and can be seen around, but is best avoided.
One can form a string for the replacement side and double-evaluate it, like so
my $keep = q($1.$3.$5); # perl *code*, concatenating variables
$var =~ s/.../$keep/ee; # DANGEROUS. Runs any code in $keep
Here the modifiers /ee evaluate the right-hand side, and in a way that exposes the program to evaluating code (in $keep) that may have been slipped to it. Search for this for more information but I'd say best don't use it where it matters.
Thanks for everyone's help. I don't get how the comments work and kept fowling those up. I've decided that the cleanest (if not most elegant) way is to create two patterns. I'm keeping other solutions for future study. This is a different example,
The list of data I want to note, then delete:
/.{41}.{24}(\D{4}).{63}.{16}(\D{2}).{22}.{228}/
Data I want to keep:
/(.{41})(.{24})\D{4}(.{63})(.{16})\D{2}(.{22})(.{228})/
It's genetic data I'm working with. I need to note insertions then delete them to re-establish the original positions for alignment purposes.
If I understand correctly, I need to upvote this to close. An idiot as myself can only do what he can do. I'll try. :)

vs code replace two different sides of something

suppose I have code that looks something like
myFunc(someInput);
and suppose I run this function in many different places, on many different inputs (someInput could be various things, I need to perserve whatever it is).
All the sudden I realize I need to perform another function on the input. So I would like to replace every instance with
myFunc(nutherFunc(someInput));
I could run a replace of myFunc( with myFunc(nutherFunc( but would have to manually close the nutherFunc call everywhere. is there a way, using regex or otherwise, that I can replace myFunc(nutherFunc( AND )) while preservint the input?
Said another way, can I say "replace these two character sets but keep what is in between them"?
You can use a regex with a capture group to accomplish this. I'd recommend
myFunc\( # match the literal characters "myFunc(" (We have to escape the paren)
(\w+) # capture group so we can refer to the argument of `myFunc` in the replacement
\) # a literal close paren
with a replacement of
myFunc(notherFunc($1))
Where the $1 represents the group that was captured between parens.
Here's a video: https://clip.brianschiller.com/wjuWYgc-2019-12-17-replace.mp4

REGEX in R: extracting words from a string

i guess this is a common problem, and i found quite a lot of webpages, including some from SO, but i failed to understand how to implement it.
I am new to REGEX, and I'd like to use it in R to extract the first few words from a sentence.
for example, if my sentence is
z = "I love stack overflow it is such a cool site"
id like to have my output as being (if i need the first four words)
[1] "I love stack overflow"
or (if i need the last four words)
[1] "such a cool site"
of course, the following works
paste(strsplit(z," ")[[1]][1:4],collapse=" ")
paste(strsplit(z," ")[[1]][7:10],collapse=" ")
but i'd like to try a regex solution for performance issues as i need to deal with very huge files (and also for the sake of knowing about it)
I looked at several links, including
Regex to extract first 3 words from a string and
http://osherove.com/blog/2005/1/7/using-regex-to-return-the-first-n-words-in-a-string.html
so i tried things like
gsub("^((?:\S+\s+){2}\S+).*",z,perl=TRUE)
Error: '\S' is an unrecognized escape in character string starting ""^((?:\S"
i tried other stuff but it usually returned me either the whole string, or the empty string.
another problem with substr is that it returns a list. maybe it looks like the [[]] operator is slowing things a bit (??) when dealing with large files and doing apply stuff.
it looks like the Syntax used in R is somewhat different ?
thanks !
You've already accepted an answer, but I'm going to share this as a means of helping you understand a little more about regex in R, since you were actually very close to getting the answer on your own.
There are two problems with your gsub approach:
You used single backslashes (\). R requires you to escape those since they are special characters. You escape them by adding another backslash (\\). If you do nchar("\\"), you'll see that it returns "1".
You didn't specify what the replacement should be. Here, we don't want to replace anything, but we want to capture a specific part of the string. You capture groups in parentheses (...), and then you can refer to them by the number of the group. Here, we have just one group, so we refer to it as "\\1".
You should have tried something like:
sub("^((?:\\S+\\s+){2}\\S+).*", "\\1", z, perl = TRUE)
# [1] "I love stack"
This is essentially saying:
Work from the start of the contents of "z".
Start creating group 1.
Find non-whitespace (like a word) followed by whitespace (\S+\s+) two times {2} and then the next set of non-whitespaces (\S+). This will get us 3 words, without also getting the whitespace after the third word. Thus, if you wanted a different number of words, change the {2} to be one less than the number you are actually after.
End group 1 there.
Then, just return the contents of group 1 (\1) from "z".
To get the last three words, just switch the position of the capturing group and put it at the end of the pattern to match.
sub("^.*\\s+((?:\\S+\\s+){2}\\S+)$", "\\1", z, perl = TRUE)
# [1] "a cool site"
For getting the first four words.
library(stringr)
str_extract(x, "^\\s*(?:\\S+\\s+){3}\\S+")
For getting the last four.
str_extract(x, "(?:\\S+\\s+){3}\\S+(?=\\s*$)")

Regex for SublimeText Snippet

I've been stuck for a while on this Sublime Snippet now.
I would like to display the correct package name when creating a new class, using TM_FILEPATH and TM_FILENAME.
When printing TM_FILEPATH variable, I get something like this:
/Users/caubry/d/[...]/src/com/[...]/folder/MyClass.as
I would like to transform this output, so I could get something like:
com.[...].folder
This includes:
Removing anything before /com/[...]/folder/MyClass.as;
Removing the TM_FILENAME, with its extension; in this example MyClass.as;
And finally finding all the slashes and replacing them by dots.
So far, this is what I've got:
${1:${TM_FILEPATH/.+(?:src\/)(.+)\.\w+/\l$1/}}
and this displays:
com/[...]/folder/MyClass
I do understand how to replace splashes with dots, such as:
${1:${TM_FILEPATH/\//./g/}}
However, I'm having difficulties to add this logic to the previous one, as well as removing the TM_FILENAME at the end of the logic.
I'm really inexperienced with Regex, thanks in advance.
:]
EDIT: [...] indicates variable number of folders.
We can do this in a single replacement with some trickery. What we'll do is, we put a few different cases into our pattern and do a different replacement for each of them. The trick to accomplish this is that the replacement string must contain no literal characters, but consist entirely of "backreferences". In that case, those groups that didn't participate in the match (because they were part of a different case) will simply be written back as an empty string and not contribute to the replacement. Let's get started.
First, we want to remove everything up until the last src/ (to mimic the behaviour of your snippet - use an ungreedy quantifier if you want to remove everything until the first src/):
^.+/src/
We just want to drop this, so there's no need to capture anything - nor to write anything back.
Now we want to match subsequent folders until the last one. We'll capture the folder name, also match the trailing /, but write back the folder name and a .. But I said no literal text in the replacement string! So the . has to come from a capture as well. Here comes the assumption into play, that your file always has an extension. We can grab the period from the file name with a lookahead. We'll also use that lookahead to make sure that there's at least one more folder ahead:
^.+/src/|\G([^/]+)/(?=[^/]+/.*([.]))
And we'll replace this with $1$2. Now if the first alternative catches, groups $1 and $2 will be empty, and the leading bit is still removed. If the second alternative catches, $1 will be the folder name, and $2 will have captured a period. Sweet. The \G is an anchor that ensures that all matches are adjacent to one another.
Finally, we'll match the last folder and everything that follows it, and only write back the folder name:
^.+/src/|\G([^/]+)/(?=[^/]+/.*([.]))|\G([^/]+)/[^/]+$
And now we'll replace this with $1$2$3 for the final solution. Demo.
A conceptually similar variant would be:
^.+/src/|\G([^/]+)/(?:(?=[^/]+/.*([.]))|[^/]+$)
replaced with $1$2. I've really only factored out the beginning of the second and third alternative. Demo.
Finally, if Sublime is using Boost's extended format string syntax, it is actually possible to get characters into the replacement conditionally (without magically conjuring them from the file extension):
^.+/src/|\G(/)?([^/]+)|\G/[^/]+$
Now we have the first alternative for everything up to src (which is to be removed), the third alternative for the last slash and file name (which is to be removed), and the middle alternative for all folders you want to keep. This time I put the slash to be replaced optionally at the beginning. With a conditional replacement we can write a . there if and only if that slash was matched:
(?1.:)$2
Unfortunately, I can't test this right now and I don't know an online tester that uses Boost's regex engine. But this should do the trick just fine.

Notepad++ masschange using regular expressions

I have issues to perform a mass change in a huge logfile.
Except the filesize which is causing issues to Notepad++ I have a problem to use more than 10 parameters for replacement, up to 9 its working fine.
I need to change numerical values in a file where these values are located within quotation marks and with leading and ending comma: ."123,456,789,012.999",
I used this exp to find and replace the format to:
,123456789012.999, (so that there are no quotation marks and no comma within the num.value)
The exp used to find is:
([,])(["])([0-9]+)([,])([0-9]+)([,])([0-9]+)([,])([0-9]+)([\.])([0-9]+)(["])([,])
and the exp to replace is:
\1\3\5\7\9\10\11\13
The problem is parameters \11 \13 are not working (the chars eg .999 as in the example will not appear in the changed values).
So now the question is - is there any limit for parameters?
It seems for me as its not working above 10. For shorter num.values where I need to use only up to 9 parameters the string for serach and replacement works fine, for the example above the search works but not the replacement, the end of the changed value gets corrupted.
Also, it came to my mind that instead of using Notepad++ I could maybe change the logfile on the unix server directly, howerver I had issues to build the correct perl syntax. Anyone who could help with that maybe?
After having a little play myself, it looks like back-references \11-\99 are invalid in notepad++ (which is not that surprising, since this is commonly omitted from regex languages.) However, there are several things you can do to improve that regular expression, in order to make this work.
Firstly, you should consider using less groups, or alternatively non-capture groups. Did you really need to store 13 variables in that regex, in order to do the replacement? Clearly not, since you're not even using half of them!
To put it simply, you could just remove some brackets from the regex:
[,]["]([0-9]+)[,]([0-9]+)[,]([0-9]+)[,]([0-9]+)[.]([0-9]+)["][,]
And replace with:
,\1\2\3\4.\5,
...But that's not all! Why are you using square brackets to say "match anything inside", if there's only one thing inside?? We can get rid of these, too:
,"([0-9]+),([0-9]+),([0-9]+),([0-9]+)\.([0-9]+)",
(Note I added a "\" before the ".", so that it matches a literal "." rather than "anything".)
Also, although this isn't a big deal, you can use "\d" instead of "[0-9]".
This makes your final, optimised regex:
,"(\d+),(\d+),(\d+),(\d+)\.(\d+)",
And replace with:
,\1\2\3\4.\5,
Not sure if the regex groups has limitations, but you could use lookarounds to save 2 groups, you could also merge some groups in your example. But first, let's get ride of some useless character classes
(\.)(")([0-9]+)(,)([0-9]+)(,)([0-9]+)(,)([0-9]+)(\.)([0-9]+)(")(,)
We could merge those groups:
(\.)(")([0-9]+)(,)([0-9]+)(,)([0-9]+)(,)([0-9]+)(\.)([0-9]+)(")(,)
^^^^^^^^^^^^^^^^^^^^
We get:
(\.)(")([0-9]+)(,)([0-9]+)(,)([0-9]+)(,)([0-9]+\.[0-9]+)(")(,)
Let's add lookarounds:
(?<=\.)(")([0-9]+)(,)([0-9]+)(,)([0-9]+)(,)([0-9]+\.[0-9]+)(")(?=,)
The replacement would be \2\4\6\8.
If you have a fixed length of digits at all times, its fairly simple to do what you have done. Even though your expression is poorly written, it does the job. If this is the case, look at Tom Lords answer.
I played around with it a little bit myself, and I would probably use two expressions - makes it much easier. If you have to do it in one, this would work, but be pretty unsafe:
(?:"|(\d+),)|(\.\d+)"(?=,) replace by \1\2
Live demo: http://regex101.com/r/zL3fY5