ahk - Get text after caracter (space) - regex

I'm new to Autohotkeys. I'm trying to remove all the text up to the first space on each line, getting everything else.
example:
txt1=something
txt2=other thing
var.="-1" " " txt1 " " txt2 "`n"
var.="2" " " txt1 " " txt2 "`n"
var.="4" " " txt1 " " txt2 "`n"
;; more add ...
FinalVar:=var
;...
msgbox % FinalVar
RETURN
Current output:
-1 something other thing
2 something other thing
4 something other thing
how I wish (all lines of FinalVar whitout need Loop):
something other thing
something other thing
something other thing
In bash i could use something like SED
Is there a fast way to do the same thing in ahk?
Thanks to your atention. Sorry my english!

You can use a combination of the InStr command
InStr()
Searches for a given occurrence of a string, from the left or the right.
FoundPos := InStr(Haystack, Needle , CaseSensitive := false, StartingPos := 1, Occurrence := 1)
and SubStr command.
SubStr()
Retrieves one or more characters from the specified position in a string.
NewStr := SubStr(String, StartingPos , Length)
With InStr you find the position of the first space in var.
With SubStr you extract everything after that position to the end of the string like this:
StartingPos := InStr(var, " ")
var := SubStr(var, StartingPos + 1)
Note the + 1, it is there because you need to start extracting the text 1 position after the space, otherwise the space will be the first character in the extracted text.
To replace the leading text in all lines you can use RegExReplace
RegExReplace()
Replaces occurrences of a pattern (regular expression)
inside a string.
NewStr := RegExReplace(Haystack, NeedleRegEx , Replacement := "", OutputVarCount := "", Limit := -1, StartingPosition := 1)
FinalVar := RegExReplace(var, "m`a)^(.*? )?(.*)$", "$2")
m`a)are RegEx options, ^(.*? )?(.*)$ is the actual search pattern.
m Multiline. Views Haystack as a collection of individual lines (if
it contains newlines) rather than as a single continuous line.
`a: `a recognizes any type of newline, namely `r, `n, `r`n,
`v/VT/vertical tab/chr(0xB), `f/FF/formfeed/chr(0xC), and
NEL/next-line/chr(0x85).

Related

How to only replace the vowels of words that match the words in a given array with a "*"?

I need to create a ruby method that accepts a string and an array and if any of the words in the string matches the words in the given array then all the vowels of the matched words in the string should be replaced with a "*". I have tried to do this using regex and an "if condition" but I don't know why this does not work. I'd really appreciate if somebody could explain me where I have gone wrong and how I can get this code right.
def censor(sentence, arr)
if arr.include? sentence.downcase
sentence.downcase.gsub(/[aeiou]/, "*")
end
end
puts censor("Gosh, it's so hot", ["gosh", "hot", "shoot", "so"])
#expected_output = "G*sh, it's s* h*t"
are.include? sentence.downcase reads, “If one of the elements of arr equals sentence.downcase ...”, not what you want.
baddies = ["gosh", "it's", "hot", "shoot", "so"]
sentence = "Gosh, it's so very hot"
r = /\b#{baddies.join('|')}\b/i
#=> /\bgosh|it's|hot|shoot|so\b/i
sentence.gsub(r) { |w| w.gsub(/[aeiou]/i, '*') }
#=> "G*sh *t's s* very h*t"
In the regular expression, \b is a word break and #{baddies.join('|')} requires a match of one of the baddies. The word breaks are to avoid, for example, "so" matching "solo" or "possible". One could alternatively write:
/\b#{Regexp.union(baddies).source}\b/
#=> /\bgosh|it's|hot|shoot|so\b/
See Regexp::union and Regexp#source. source is needed because Regexp.union(baddies) is unaffected by the case-indifference modifier (i).
Another approach is split the sentence into words, manipulate each word, then rejoin all the pieces to form a new sentence. One difficulty with this approach concerns the character "'", which serves double-duty as a single quote and an apostrophe. Consider
sentence = "She liked the song, 'don't box me in'"
baddies = ["don't"]
the approach I've given here yields the correct result:
r = /\b#{baddies.join('|')}\b/i
#=> /\bdon't\b/i
sentence.gsub(r) { |w| w.gsub(/[aeiou]/i, '*') }
#=> "She liked the song 'd*n't box me in'"
If we instead divide up the sentence into parts we might try the following:
sentence.split(/([\p{Punct}' ])/)
#=> ["She", " ", "liked", " ", "", " ", "the", " ", "song", ",", "",
# " ", "", "'", "don", "'", "t", " ", "box", " ", "me", " ", "in", "'"]
As seen, the regex split "don't" into "don" and "'t", not what we want. Clearly, distinguishing between single quotes and apostrophes is a non-trivial task. This is made difficult by the the fact that words can begin or end with apostrophes ("'twas") and most nouns in the possessive form that end with "s" are followed by an apostrophe ("Chris' car").
Your code does not return any value if the condition is valid.
One option is to split words by spaces and punctuation, manipulate, then rejoin:
def censor(sentence, arr)
words = sentence.scan(/[\w'-]+|[.,!?]+/) # this splits the senctence into an array of words and punctuation
res = []
words.each do |word|
word = word.gsub(/[aeiou]/, "*") if arr.include? word.downcase
res << word
end
res.join(' ') # add spaces also before punctuation
end
puts censor("Gosh, it's so hot", ["gosh", "hot", "shoot", "so"])
#=> G*sh , it's s* h*t
Note that res.join(' ') add spaces also before punctuation. I'm not so good with regexp, but this could solve:
res.join(' ').gsub(/ [.,!?]/) { |punct| "#{punct}".strip }
#=> G*sh, it's s* h*t
This part words = sentence.scan(/[\w'-]+|[.,!?]+/) returns ["Gosh", ",", "it's", "so", "hot"]

Vba: Regular expression to count the number of words in a string delimited by special characters

Need some help writing a regular expression to count the number of words in a string (Please note the data is a html string, which needs to be placed into a spreadsheet) when separated either by any special characters like . , - , +, /, Tab etc. Count should exclude special characters.
**Original String** **End Result**
Ex : One -> 1
One. -> 1
One Two -> 2
One.Two -> 2
One Two. -> 2
One.Two. -> 2
One.Tw.o -> 3
Updated
I think you asked a valuable question and this downvoting is not fair!
Function WCount(ByVal strWrd As String) As Long
'Variable declaration
Dim Delimiters() As Variant
Dim Delimiter As Variant
'Initialization
Delimiters = Array("+", "-", ".", "/", Chr(13), Chr(9)) 'Define your delimiter characters here.
'Core
For Each Delimiter In Delimiters
strWrd = Replace(strWrd, Delimiter, " ")
Next Delimiter
strWrd = Trim(strWrd)
Do While InStr(1, strWrd, " ") > 0
strWrd = Replace(strWrd, " ", " ")
Loop
WCount = UBound(Split(strWrd, " ")) + 1
End Function
________________
You can use this function as a UDF in excel formulas or can use in another VBA codes.
Using in formula
=WCOUNT("One.Two.Three.") or =WCOUNT($A$1") assuming your string is in A1 cell.
Using in VBA
(With assume passing your string with Str argument.)
Sub test()
Debug.Print WCount(Str)
End Sub
Regards.
Update
I have test your text as shown below.
copy your text in a Cell of Excel as shown.
The code updated for Line break and Tab characters and count your string words correctly now.
Try this code, all necessary comments are in code:
Sub SpecialSplit()
Dim i As Long
Dim str As String
Dim arr() As String
Dim delimeters() As String
'here you define all special delimeters you want to use
delimetres = Array(".", "+", "-", "/")
For i = 1 To 9
str = Cells(i, 1).Value
'this will protect us from situation where last character is delimeter and we have additional empty string
str = Left(str, Len(str) - 1)
'here we replace all special delimeters with space to simplify
For Each delimeter In delimetres
str = Replace(str, delimeter, " ")
Next
arr = Split(str)
Cells(i, 2).Value = UBound(arr) - LBound(arr) + 1
Next
End Sub
With your posted data following RegExp is working correctly. Put this in General Module in Visual Basic Editor.
Public Function CountWords(strInput As String) As Long
Dim objMatches
With CreateObject("VBScript.RegExp")
.Global = True
.MultiLine = True
.Pattern = "\w+"
Set objMatches = .Execute(strInput)
CountWords = objMatches.Count
End With
End Function
You have to use it like a normal formula. e.g. assuming data is in cell A1 function would be:
=CountWords(A1)
For your information, it can be also achieved through formula if number of characters are specific like so:
=LEN(TRIM(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(TRIM(A1),"."," "),","," "),"-"," "),"+"," "),"/"," "),"\"," ")))-LEN(SUBSTITUTE(TRIM(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(TRIM(A1),"."," "),","," "),"-"," "),"+"," "),"/"," "),"\"," "))," ",""))+1

VB.NET - Regex.Replace error with [ character

I want to remove some characters from a textbox. It works, but when i try to replace the "[" character it gives a error. Why?
Return Regex.Replace(html, "[", "").Replace(",", " ").Replace("]", "").Replace(Chr(34), " ")
When i delete the "[", "").Replace( part it works great?
Return Regex.Replace(html, ",", " ").Replace("]", "").Replace(Chr(34), " ")
The problem is that since the [ character has a special meaning in regex, It must be escaped in order to use it as part of a regex sequence, therefore to escape it all you have to do is add a \ before the character.
Therefore this would be your proper regex code Return Regex.Replace(html, "\[", "").Replace(",", " ").Replace("]", "").Replace(Chr(34), " ")
Because [ is a reserved character that regex patterns use. You should always escape your search patterns using Regex.Escape(). This will find all reserved characters and escape them with a backslash.
Dim searchPattern = Regex.Escape("[")
Return Regex.Replace(html, searchPattern, ""). 'etc...
But why do you need to use regex anyway? Here's a better way of doing it, I think, using StringBuilder:
Dim sb = New StringBuilder(html) _
.Replace("[", "") _
.Replace(",", " ") _
.Replace("]", "") _
.Replace(Chr(34), " ")
Return sb.ToString()

Largest "separation" of patterns for Delphi regex?

Update
As Graymatter has observed, regex fails to match when there are at least 2 extra line breaks before the second target. That is to say, changing the concatenation loop to "for I := 0 to 1" will make the regex-match fail.
As shown in the code below, without the concatenation, the program can get the two values using regex. However, with the concatenation, the program cannot get the two values.
Could you help to comment on the reason and the workaround ?
program Project1;
{$APPTYPE CONSOLE}
uses
// www.regular-expressions.info/delphi.html
// http://www.regular-expressions.info/download/TPerlRegEx.zip
PerlRegEx,
SysUtils;
procedure Test;
var
Content: UTF8String;
Regex: TPerlRegEx;
GroupIndex: Integer;
I: Integer;
begin
Regex := TPerlRegEx.Create;
Regex.Regex := 'Value1 =\s*(?P<Value1>\d+)\s*.*\s*Value2 =\s*(?P<Value2>\d*\.\d*)';
Content := '';
for I := 0 to 10000000 do
begin
// Uncomment here to see effect
// Content := Content + 'junkjunkjunkjunkjunk' + sLineBreak;
end;
Regex.Subject := 'junkjunkjunkjunkjunk' +
sLineBreak + ' Value1 = 1' +
sLineBreak + 'junkjunkjunkjunkjunk' + Content +
sLineBreak + ' Value2 = 1.23456789' +
sLineBreak + 'junkjunkjunkjunkjunk';
if Regex.Match then
begin
GroupIndex := Regex.NamedGroup('Value1');
Writeln(Regex.Groups[GroupIndex]);
GroupIndex := Regex.NamedGroup('Value2');
Writeln(Regex.Groups[GroupIndex]);
end
else
begin
Writeln('No match');
end;
Regex.Free;
end;
begin
Test;
Readln;
end.
Adding this line works.
Regex.Options := [preSingleLine];
From the documentation:
preSingleLine
Normally, dot (.) matches anything but a newline (\n). With preSingleLine, dot (.) will match anything, including newlines. This allows a multiline string to be regarded as a single entity. Equivalent to Perl's /s modifier. Note that preMultiLine and preSingleLine can be used together.
When there is only one line break before the second target, the regex can match even without preSingleline. The reason is because \s can match line return.

Why is this regexp slow when the input line is long and has many spaces?

VBScript's Trim function only trims spaces. Sometimes I want to trim TABs as well. For this I've been using this custom trimSpTab function that is based on a regular expression.
Today I ran into a performance problem. The input consisted of rather long lines (several 1000 chars).
As it turns out
- the function is slow, only if the string is long AND contains many spaces
- the right-hand part of the regular expression is reponsible for the poor performance
- the run time seems quadratic to the line length (O(n^2))
So why is this line trimmed fast
" aaa xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx bbb " '10000 x's
and this one trimmed slowly
" aaa bbb " '10000 spaces
Both contain only 6 characters to be trimmed.
Can you propose a modification to my trimSpTab function?
Dim regex
Set regex = new regexp
' TEST 1 - executes in no time
' " aaa XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX bbb "
t1 = Timer
character = "X"
trimTest character
MsgBox Timer-t1 & " sec",, "with '" & character & "' in the center of the string"
' TEST 2 - executes in 1 second on my machine
' " aaa bbb "
t1 = Timer
character = " "
trimTest character
MsgBox Timer-t1 & " sec",, "with '" & character & "' in the center of the string"
Sub trimTest (character)
sInput = " aaa " & String (10000, character) & " bbb "
trimmed = trimSpTab (sInput)
End Sub
Function trimSpTab (byval s)
'trims spaces & tabs
regex.Global = True
regex.Pattern = "^[ \t]+|[ \t]+$" 'trim left+right
trimSpTab = regex.Replace (s, "")
End Function
I have tried this (with regex.Global = false) but to no avail
regex.Pattern = "^[ \t]+" 'trim left
s = regex.Replace (s, "")
regex.Pattern = "[ \t]+$" 'trim right
trimSpTab = regex.Replace (s, "")
UPDATE
I've come up with this alternative in the mean time. It processes a 100 million character string is less than a second.
Function trimSpTab (byval s)
'trims spaces & tabs
regex.Pattern = "^[ \t]+"
s = strReverse (s)
s = regex.Replace (s, "")
s = strReverse (s)
s = regex.Replace (s, "")
trimSpTab = s
End Function
Solution
As mentioned in the question, your current solution is to reverse the string. However, this is not necessary, since .NET regex supports RightToLeft matching option. For the same regex, the engine will start matching from right to left instead of default behavior of matching from left to right.
Below is sample code in C#, which I hope you can adapt to VB solution (I don't know VB enough to write sample code):
input = new Regex("^[ \t]+").Replace(input, "", 1)
input = new Regex("[ \t]+$", RegexOptions.RightToLeft).Replace(input, "", 1)
Explanation
The long run time is due to the engine just trying to match [ \t]+ indiscriminately in the middle of the string and end up failing when it is not an trailing blank sequence.
The observation that the complexity is quadratic is correct.
We know that the regex engine starts matching from index 0. If there is a match, then the next attempt starts at the end of the last match. Otherwise, the next attempt starts at the (current index + 1). (Well, to simplify things, I don't mention the case where a zero-length match is found).
Below shall illustrate the farthest attempt (some is a match, some are not) of the engine matching the regex ^[ \t]+|[ \t]+$. _ is used to denote space (or tab character) for clarity.
_____ab_______________g________
^----
^
^
^--------------
^-------------
^------------
...
^
^
^-------
When there is a long sequence of spaces & tabs in the middle of the string (which will not produce a match), the engine attempts matching at every index in the long sequence of spaces & tabs. As the result, the engine ends up going through O(k2) characters on a non-matching sequence of spaces & tabs of length k.
Your evidence proves that VBScript's RegExp implementation does not optimize for the $ anchor: It spends time (backtracking?) for each of the spaces in the middle of your test string. Without doubt, that's a fact good to know.
If this causes you real world problems, you'll have to find/write a better (R)Trim function. I came up with:
Function trimString(s, p)
Dim l : l = Len(s)
If 0 = l Then
trimString = s
Exit Function
End If
Dim ps, pe
For ps = 1 To l
If 0 = Instr(p, Mid(s, ps, 1)) Then
Exit For
End If
Next
For pe = l To ps Step -1
If 0 = Instr(p, Mid(s, pe, 1)) Then
Exit For
End If
Next
trimString = Mid(s, ps, pe - ps + 1)
End Function
It surely needs testing and benchmarks for long heads or tails of white space, but I hope it gets you started.