Use AutoHotkey to implement sed like replacement on clipboard - regex

I want to use AutoHotkey to implement sed like replacement on the clipboard. I have tried several different ways to implement it, although I would like to make something which can be easily extended and be as functional as sed. Ideally it would take the act and take the same commands as sed and replace the current clipboard with the output. Since I use Ditto I will then have both the origlinal and output saved.
The solutions I have thought of and tested are to either make a hotstring which performs one specific sed replacement, e.g. using RegExreplace:
; Make text in clipboard "Camel Case" (retaining all spaces):
:*:xcamelsc::
haystack := Clipboard
needle := "\h*([a-zA-Zåäö])([a-zåäö]*)\h*" ; this retains new lines
replacement := "$U1$2 "
result := RegExReplace(haystack, needle, replacement)
Clipboard =
Clipboard := result
ClipWait
sleep 100
send ^v
return
Another example is
;replace multiple underscore with one space
:*:xr_sc::
haystack := Clipboard
needle := "[\h]*[\_]+"
replacement := " "
result := RegExReplace(haystack, needle, replacement)
Clipboard =
Clipboard := result
ClipWait
return
The flaw with this system is that I would have to make ~500 combinations of hotstrings for each possible combination I would like to have (e.g. a separate hotstring which to make all space underscore). I am not sure how to easily extend this.
Another way to do this is to use a GUI which previews the output and makes it possible to do more things, as implemented in clipboard replace. For this to be satisfactory I have made a hotstring which opens the GUI with the initial replacement filled in, and a hotkeys which automatically performs the replacement and pastes the output, etc. This system only requires that I specify the thing to replace, but I would rather have a system similar to the above which uses variables for all possible replacements so that I can refer to e.g. /^[\t]// to directly perform replacement.
A solution to do this would be to have a hotstring activate if I type
"xr[a string of text to indicate what to replace][a string of text to indicate what to replace with]xx"
i.e. "xx" would take the word just typed, parse it into the command, and perform it.
This would mean that if I type "xr_sxx", the "s" part would be interpreted as two separate variables, and the "" would be assigned the needle and the "s" would be looked up in a table and then inserted in the replacement variable of the RegExReplace.
Does anyone know of a way to do this?

This system only requires that I specify the thing to replace, but I
would rather have a system similar to the above which uses variables
for all possible replacements so that I can refer to e.g. /^[\t]// to
directly perform replacement.
Does anyone know of an easy way to do this?
Rather than specifying Hotstring for each scenario to perform a similar function I've used a method with a Hotkey, Input, and values stored in an Associative Array's.
Here's an Example:
data := {"xcamelsc": ["\h*([a-zA-Zåäö])([a-zåäö]*)\h*", "$U1$2 "]
, "xr_sc": ["[\h]*[\_]+", " "]}
f1::
word := ""
key := ""
Loop {
input, key, V L1 M T1, {Space}{Enter}{Tab}
if (errorlevel == "EndKey:Space") {
if (data.HasKey(word)) {
sendInput % "{BackSpace " StrLen(word)+1 "}"
haystack := Clipboard
needle := data[word].1
replacement := data[word].2
result := RegExReplace(haystack, needle, replacement)
Clipboard =
Clipboard := result
ClipWait
sleep 100
send ^v
}
word := ""
Break
}
else {
word .= Format("{1:L}", key)
}
}
return
; Necessary for typing mistakes when using Input
$BackSpace::
word := SubStr(word, 1, -1)
sendInput, {BackSpace}
return
esc::exitapp

Related

RegEx to format Wikipedia's infoboxes code [SOLVED]

I am a contributor to Wikipedia and I would like to make a script with AutoHotKey that could format the wikicode of infoboxes and other similar templates.
Infoboxes are templates that displays a box on the side of articles and shows the values of the parameters entered (they are numerous and they differ in number, lenght and type of characters used depending on the infobox).
Parameters are always preceded by a pipe (|) and end with an equal sign (=). On rare occasions, multiple parameters can be put on the same line, but I can sort this manually before running the script.
A typical infobox will be like this:
{{Infobox XYZ
| first parameter = foo
| second_parameter =
| 3rd parameter = bar
| 4th = bazzzzz
| 5th =
| etc. =
}}
But sometime, (lazy) contributors put them like this:
{{Infobox XYZ
|first parameter=foo
|second_parameter=
|3rd parameter=bar
|4th=bazzzzz
|5th=
|etc.=
}}
Which isn't very easy to read and modify.
I would like to know if it is possible to make a regex (or a serie of regexes) that would transform the second example into the first.
The lines should start with a space, then a pipe, then another space, then the parameter name, then any number of spaces (to match the other lines lenght), then an equal sign, then another space, and if present, the parameter value.
I try some things using multiple capturing groups, but I'm going nowhere... (I'm even ashamed to show my tries as they really don't work).
Would someone have an idea on how to make it work?
Thank you for your time.
The lines should start with a space, then a pipe, then another space, then the parameter name, then a space, then an equal sign, then another space, and if present, the parameter value.
First the selection, it's relatively trivial:
^\s*\|\s*([^=]*?)\s*=(.*)$
Then the replacement, literally your description of what you want (note the space at the beginning):
| $1 = $2
See it in action here.
#Blindy:
The best code I have found so far is the following : https://regex101.com/r/GunrUg/1
The problem is it doesn't align the equal signs vertically...
I got an answer on AutoHotKey forums:
^i::
out := ""
Send, ^x
regex := "O)\s*\|\s*(.*?)\s*=\s*(.*)", width := 1
Loop, Parse, Clipboard, `n, `r
If RegExMatch(A_LoopField, regex, _)
width := Max(width, StrLen(_[1]))
Loop, Parse, Clipboard, `n, `r
If RegExMatch(A_LoopField, regex, _)
out .= Format(" | {:-" width "} = {2}", _[1],_[2]) "`n"
else
out .= A_LoopField "`n"
Clipboard := out
Send, ^v
Return
With this script, pressing Ctrl+i formats the infobox code just right (I guess a simple regex isn't enough to do the job).

using Autohotkey to replace diacritics accents in clipboard

I'm trying to write a script in Autohotkey that will take the currently highlighted word, copy it into the clipboard, and then replace accented characters with their non-accented versions. For example, if the word honorábilem is in the clipboard, I want to change it to honorabilem.
This is what I have tried:
F1::
SetTitleMatchMode RegEx
clipboard =
Send, ^c
wordToParse := %clipboard%
wordToParse = RegExReplace(wordToParse,"á","a") ; also tried this: StringReplace, clipboard, clipboard, á, a, All
MsgBox, % clipboard
But the contents of the clipboard don't change. The á never gets replaced with a. Appreciate any help.
The contents of the clipboard don't change (after the change from sending CTRL+C) becuase you're simply not changing the contents of the clipboard after that.
And another mistake you have is assigning values to variables wrong.
I'd assume you don't know the difference between = and :=.
The difference is that using = to assign values is deprecated legacy AHK and should never be used. You're assigning literal text to a variable. As opposed to assigning the result of evaluating some expression, which is what := does.
This line wordToParse = RegExReplace(wordToParse,"á","a") assigns literal text to that variable instead of calling the RegExReplace() function and assigning its result to the variable.
Also, no reason to regex replace if you're not using regex.
The StrReplace() function is what you want.
And then there's also the usage of legacy syntax in an expression:
wordToParse := %clipboard%
Referring to a variable by wrapping it in % is what you'd do in a legacy syntax.
But since you're not doing that, you're using :=, as you should, just ditch the %s.
Revised script:
F1::
;This does nothing for us, removed
;SetTitleMatchMode RegEx
;Empty clipboard
Clipboard := ""
;Switched to SendInput, it's documented as faster and more reliable
SendInput, ^c
;Wait for the clipboard to contain something
ClipWait
wordToParse := Clipboard
wordToParse := StrReplace(wordToParse, "á", "a")
;Since you want to display the contents of the clipboard in
;a message box, first we need to set what we want into it
Clipboard := wordToParse
MsgBox, % Clipboard
return

Very slow RegEx in AHK yet fast in Notepad++

I'd like to find a certain string in a webpage. I decided to use RegEx. (I know my RegExes are quite terrible, however, they work). My two expressions are very fast when used in Notepad++ (probably < 1s) and on Regex101, but they are horribly slow when used in AutoHotKey – about 2-5 minutes. How do I fix this?
sWindowInfo2 = http://www.archiwum.wyborcza.pl/Archiwum/1,0,4583161,20060208LU-DLO,Dzis_bedzie_Piast,.html
whr := ComObjCreate("WinHttp.WinHttpRequest.5.1")
whr.Open("GET", sWindowInfo2, false ), whr.Send()
whr.ResponseText
sPage := ""
sPage := whr.ResponseText
; get city name (if exists) – the following is very slooooow
if RegExMatch(sPage, "[\s\S]+<dzial>Gazeta\s(.+)<\/dzial>[\s\S]+")
{
sCity := RegExReplace(sPage, "[\s\S]+<dzial>Gazeta\s(.+)<\/dzial>[\s\S]+", "$1")
;MsgBox, % sCity
city := 1
}
if RegExMatch(sPage, "[\s\S]+<metryczka>GW\s(.+)\snr[\s\S]+")
{
sCity := RegExReplace(sPage, "[\s\S]+<metryczka>GW\s(.+)\snr[\s\S]+", "$1")
city := 1
}
EDIT:
In the page I provided the match is Lublin. Have a look at: https://regex101.com/r/qJ2pF8/1
You do not need to use RegExReplace to get the captured value. As per reference, you can pass the 3rd var into RegExMatch:
OutputVar
OutputVar is the unquoted name of a variable in which to store a match object, which can be used to retrieve the position, length and value of the overall match and of each captured subpattern, if any are present.
So, use a much simpler pattern:
FoundPos := RegExMatch(sPage, "<metryczka>GW\s(.+)\snr", SubPat) ;
It will return the position of the match, and will store "Lublin" in SubPat[1].
With this pattern, you avoid heavy backtracking you had with [\s\S]+<metryczka>GW\s(.+)\snr[\s\S]+ as the first [\s\S]+ matched up to the end of the string, and then backtracked to accommodate for the subsequent subpatterns. The longer the string, the slower the operation is.

How can I use a PL/SQLregular expression to replace an HTML tag and its contents with a like number of '?s'?

In Pl/SQL i need to replace something like;
'MOUSE RAT <FONT COLOR="#FF0000">DOG</FONT> CAT ELEPHANT'
with
'MOUSE RAT ????????????????????????????????? CAT ELEPHANT'
Basically I need to replace an HTML tag and everything in between with a placeholder of '?' equal to the same length as the string I am replacing. The good news is the tag will always be a font tag.
Will a REGEXP_REPLACE do this?
IF so what does the pattern look like?
REGEXP_REPLACE() replaces a pattern, so whilst it's useful for finding what you want to replace you can't replace the removed string with something of the same length.
The following will replace the HTML:
regexp_replace(str, '</?FONT.*>')
You then need to add in question marks to the length of the removed string, i.e. the length of the string prior to removal minus the length of the string now.
I'm not really certain there's a good way of going about this unfortunately. You'll have to use a character to notify you that this is where the question marks need to be once the string has been replaced. Something like the following would work:
replace( regexp_replace(str, '</?FONT.*>', '?')
, '?'
, lpad( '?'
, length(str) - length(regexp_replace(str, '</?FONT.*>', '?')) - 1
, '?'
)
)
I really don't like it though... If the entire thing is HTML it would be easier and better to use a proper parser and then you could just replace all the data in the one node.
Although I like PL/SQL, I would not recommend that. PL/SQL mighty tool for data manipulation but is not too handy for parsing. This is example where Java stored procedure can be more efficient. Especially when you will have to refactor your code several times.
Also REGEX_REPLACE works with VARCHARs(max size 32KB) while you maybe need work with CLOBs.
I wrote a function that is easier to understand than Ben's code, but probably less efficient, and certainly less elegant. I haven't decided which to use, what do you think?
FUNCTION REPLACE_WITH_PLACEHOLDER(IN_STRING IN VARCHAR2, START_STRING IN VARCHAR2, END_STRING IN VARCHAR2, PLACEHOLDER IN VARCHAR2) RETURN VARCHAR2
IS
OUT_STRING VARCHAR2(32767);
START_POSITION BINARY_INTEGER := 0;
END_POSITION BINARY_INTEGER;
SEARCH_LENGTH BINARY_INTEGER;
SEARCH_STRING VARCHAR2(500);
REPLACE_STRING VARCHAR2(500);
BEGIN
OUT_STRING := IN_STRING;
START_POSITION := INSTR(OUT_STRING,START_STRING);
WHILE START_POSITION > 1
LOOP
END_POSITION := INSTR(OUT_STRING,END_STRING,START_POSITION) + LENGTH(END_STRING);
IF END_POSITION > 0
THEN
SEARCH_LENGTH := (END_POSITION - START_POSITION);
SEARCH_STRING := SUBSTR(OUT_STRING,START_POSITION,SEARCH_LENGTH);
REPLACE_STRING := LPAD(PLACEHOLDER,SEARCH_LENGTH,PLACEHOLDER);
OUT_STRING := REPLACE(OUT_STRING,SEARCH_STRING,REPLACE_STRING);
ELSE
EXIT;
END IF;
START_POSITION := INSTR(OUT_STRING,START_STRING);
END LOOP;
RETURN OUT_STRING;
END REPLACE_WITH_PLACEHOLDER;

How can I find the count of semicolon separated values?

I have a list of all email ids which I have copied from the 'To' field, from an email I received in MS Outlook. These values (email ids) are separated by a semicolon. I have copied this big list of email ids into Excel. Now I want to find the number of email ids in this list; basically by counting the number of semi colons.
One way I can do this is by writing C code. i.e. store the big list as string buffer, and keep comparing the chars to ";" in a while(char == ';') loop.
But I want to do it quickly.
Is there any quick way to find that out using either:
1.) Regular expression (I use powergrep for processing the regexps)
2.) In excel itself (any excel macro/plugin for that?)
3.) DOS script method
4.) Any other quick way of getting it done?
I believe the following should work in Excel:
= Len(A1) - Len(Substitute(A1, ";", "")) + 1
/EDIT: if you've pasted the email addresses over several cells, you can count the cells with the following function:
= CountA(A1:BY1)
CountA counts non-empty cells in a given range. You can specify the range by typing =CountA( into a cell and then selecting your cell range with the mouse cursor.
Bash/Cygwin One-Liner
$ echo "user#domain.tld;user#domain.tld;user#domain.tld" | sed -e 's/;/\n/g' | wc -l
3
If you already have Cygwin installed it's effectively instant. If not, cygwin is worth installing IMHO. It basically provides a Linux bash prompt overlaid over your Windows system.
As an aside, stuff like this is why I prefer *nix over Windows for work stuff, I can't live on a windows box without Cygwin since bash scripts are so much more powerful than batch scripts.
If counting the number of semicolons is good enough for you, you can do it in Perl using this solution: Perl FAQ 4.24: How can I count the number of occurrences of a substring within a string
PowerShell:
> $a = 'blah;blah;blah'
> $a.Split(';').Count
3
3) if you don't have neither cygwin, nor powershell installed try this .cmd
#echo off
set /a i = 0
for %%i in (name1#mail.com;name2#mail.com;name3#mail.com) do set /a i = i + 1
#echo %i%
If you are using Excel you can use this code and expose it.
Public Function CountSubString(ByVal RHS As String, ByVal Delimiter As String) As Integer
Dim V As Variant
V = Split(RHS, Delimiter)
CountSubString = UBound(V) + 1
End Function
If you have .NET you can make a little command line utility
Module CountSubString
Public Sub Main(ByVal Args() As String)
If Args.Length <> 2 Then
Console.WriteLine("wrong arguments passed->")
Else
Dim I As Integer = 0
Dim Items() = Split(Args(0), Args(1))
Console.WriteLine("There are " & CStr(UBound(Items) + 1) & "
End If
End Sub
End Module
Load the list in your favorite (not Notepad!) editor, replace ; by \n, see in the status bar how many lines you have, remove the last line if needed.
C# 3.0 with LINQ would make this easy if it is an option for you over C
myString.ToCharArray().Count(char => char = ';')
If awk, echo is awailable (and it is, even on windows):
echo "addr1;addr2;addr3...." | awk -F ";" "{print NF}"
looping over it with a while loop and counting the ';' is probably going to be the fastest, and the most readable.
Consider Konrad's suggestions, that too will loop through the string and check every char and see if it is a simicolon, and then in modifies the string (may or may not be mutable, I don't know with excel), and then it counts the length between it and the original string.