How remove email-address from string? - regex

So, I have a string and I want to remove the e-mail adress from it if there is one.
As example:
This is some text and it continues like this
until sometimes an email
adress shows up asd#asd.com
also some more text here and here.
I want this as a result.
This is some text and it continues like this
until sometimes an email
adress shows up [email_removed]
also some more text here and here.
cleanFromEmail(string)
{
newWordString =
space := a_space
Needle = #
wordArray := StrSplit(string, [" ", "`n"])
Loop % wordArray.MaxIndex()
{
thisWord := wordArray[A_Index]
IfInString, thisWord, %Needle%
{
newWordString = %newWordString%%space%(email_removed)%space%
}
else
{
newWordString = %newWordString%%space%%thisWord%%space%
;msgbox asd
}
}
return newWordString
}
The problem with this is that I end up loosing all the line-breaks and only get spaces. How can I rebuild the string to look just like it did before removing the email-adress?

That looks rather complicated, why not use RegExReplace instead?
string =
(
This is some text and it continues like this
until sometimes an email adress shows up asd#asd.com
also some more text here and here.
)
newWordString := RegExReplace(string, "\S+#\S+(?:\.\S+)+", "[email_removed]")
MsgBox, % newWordString
Feel free to make the pattern as simple or as complicated as you want, depending on your needs, but RegExReplace should do it.

If for some reason RegExReplace doesn't always work for you, you can try this:
text =
(
This is some text and it continues like this
until sometimes an email adress shows up asd#asd.com.
also some more text here and here.
)
MsgBox, % cleanFromEmail(text)
cleanFromEmail(string){
lineArray := StrSplit(string, "`n")
Loop % lineArray.MaxIndex()
{
newLine := ""
newWord := ""
thisLine := lineArray[A_Index]
If InStr(thisLine, "#")
{
wordArray := StrSplit(thisLine, " ")
Loop % wordArray.MaxIndex()
{
thisWord := wordArray[A_Index]
{
If InStr(thisWord, "#")
{
end := SubStr(thisWord, 0)
If end in ,,,.,;,?,!
newWord := "[email_removed]" end ""
else
newWord := "[email_removed]"
}
else
newWord := thisWord
}
newLine .= newWord . " " ; concatenate the outputs by adding a space to each one
}
newLine := trim(newLine) ; remove the last space from this variable
}
else
newLine := thisLine
newString .= newLine . "`n"
}
newString := trim(newString)
return newString
}

Related

Include string into another string using regex

I have some set of strings. Strings might have items listed between square brackets. I'd like to include into strings with brackets a constant number of extra items. Brackets might be empty, or absent. For example:
string1 --> string1 # added nothing
string2[] --> string2[extra1="1",extra2="2"] # added two items
string3[item="1"] --> string3[item="1",extra1="1",extra2="2"] # added two items
Currently I achieve this with the following code (Golang):
str1 := "test"
str2 := `test[]`
str3 := `test[item1="1"]`
re := regexp.MustCompile(`\[(.+)?\]`)
for _, s := range []string{str1, str2, str3} {
s = re.ReplaceAllString(s, fmt.Sprintf(`[item1="a",item2="b",$1]`))
fmt.Println(s)
}
But in the output, in case of empty brackets I also got an unwanted comma "," in the end:
test
test[item1="a",item2="b",]
test[item1="a",item2="b",item1="1"]
Is it possible to avoid paste comma in case of empty brackets?
Of course it's possible to parse string again and trim the comma, but it seems suboptimal.
Code example on Go playground
You can have two regexes, where one matches for empty [] and other
matches for string with text inside []. Below is the tested code -
https://play.golang.org/p/_DOOGDMUOCm
Second way is just look back in the string after replacing it. If the
last two characters are ,] and you can substring till , and add ]. I
guess you already know this approach.
package main
import (
"fmt"
"regexp"
)
func main() {
str1 := "test"
str2 := `test[]`
str3 := `test[item1="1"]`
re := regexp.MustCompile(`\[(.*)\]`)
for _, s := range []string{str1, str2, str3} {
matched,err := regexp.Match(`\[(.+)\]`, []byte(s));
_ = err;
if(matched==true){
s = re.ReplaceAllString(s, fmt.Sprintf(`[item1="a",item2="b",$1]`));
}else {
s = re.ReplaceAllString(s, fmt.Sprintf(`[item1="a",item2="b"]`));
}
fmt.Println(s)
}
}

ahk - Get text after caracter (space)

I'm new to Autohotkeys. I'm trying to remove all the text up to the first space on each line, getting everything else.
example:
txt1=something
txt2=other thing
var.="-1" " " txt1 " " txt2 "`n"
var.="2" " " txt1 " " txt2 "`n"
var.="4" " " txt1 " " txt2 "`n"
;; more add ...
FinalVar:=var
;...
msgbox % FinalVar
RETURN
Current output:
-1 something other thing
2 something other thing
4 something other thing
how I wish (all lines of FinalVar whitout need Loop):
something other thing
something other thing
something other thing
In bash i could use something like SED
Is there a fast way to do the same thing in ahk?
Thanks to your atention. Sorry my english!
You can use a combination of the InStr command
InStr()
Searches for a given occurrence of a string, from the left or the right.
FoundPos := InStr(Haystack, Needle , CaseSensitive := false, StartingPos := 1, Occurrence := 1)
and SubStr command.
SubStr()
Retrieves one or more characters from the specified position in a string.
NewStr := SubStr(String, StartingPos , Length)
With InStr you find the position of the first space in var.
With SubStr you extract everything after that position to the end of the string like this:
StartingPos := InStr(var, " ")
var := SubStr(var, StartingPos + 1)
Note the + 1, it is there because you need to start extracting the text 1 position after the space, otherwise the space will be the first character in the extracted text.
To replace the leading text in all lines you can use RegExReplace
RegExReplace()
Replaces occurrences of a pattern (regular expression)
inside a string.
NewStr := RegExReplace(Haystack, NeedleRegEx , Replacement := "", OutputVarCount := "", Limit := -1, StartingPosition := 1)
FinalVar := RegExReplace(var, "m`a)^(.*? )?(.*)$", "$2")
m`a)are RegEx options, ^(.*? )?(.*)$ is the actual search pattern.
m Multiline. Views Haystack as a collection of individual lines (if
it contains newlines) rather than as a single continuous line.
`a: `a recognizes any type of newline, namely `r, `n, `r`n,
`v/VT/vertical tab/chr(0xB), `f/FF/formfeed/chr(0xC), and
NEL/next-line/chr(0x85).

Remove all articles and other strings from a string using Go?

Is there any method in Go or having regular expression that it will remove only the articles used in the string?
I have tried below code that will do it but it will also remove other words from the string I'm showing the code below:
removalString := "This is a string"
stringToRemove := []string{"a", "an", "the", "is"}
for _, wordToRemove := range stringToRemove {
removalString = strings.Replace(removalString, wordToRemove, "", -1)
}
space := regexp.MustCompile(`\s+`)
trimedExtraSpaces := space.ReplaceAllString(removalString, " ")
spacesCovertedtoDashes := strings.Replace(trimedExtraSpaces, " ", "-", -1)
slug := strings.ToLower(spacesCovertedtoDashes)
fmt.Println(slug)
Edited
Play link
In this It will remove the is which is used in the this.
The Expected output is this-string
You can use strings.Split and strings.Join plus a loop for filtering and then building it together again:
removalString := "This is a string"
stringToRemove := []string{"a", "an", "the", "is"}
filteredStrings := make([]string, 0)
for _, w := range strings.Split(removalString, " ") {
shouldAppend := true
lowered := strings.ToLower(w)
for _, w2 := range stringToRemove {
if lowered == w2 {
shouldAppend = false
break
}
}
if shouldAppend {
filteredStrings = append(filteredStrings, lowered)
}
}
resultString := strings.Join(filteredStrings, "-")
fmt.Printf(resultString)
Outpus:
this-string
Program exited.
Here you have the live example
My version just using regexp
Construct a regexp of the form '\ba\b|\ban\b|\bthe\b|\bis\b|' which will find
the words in the list that have "word boundaries" on both sides - so "This" is not matched
Second regexp reduces any spaces to dashes and makes multiple spaces a single dash
package main
import (
"bytes"
"fmt"
"regexp"
)
func main() {
removalString := "This is a strange string"
stringToRemove := []string{"a", "an", "the", "is"}
var reg bytes.Buffer
for _, x := range stringToRemove {
reg.WriteString(`\b`) // word boundary
reg.WriteString(x)
reg.WriteString(`\b`)
reg.WriteString(`|`) // alternation operator
}
regx := regexp.MustCompile(reg.String())
slug := regx.ReplaceAllString(removalString, "")
regx2 := regexp.MustCompile(` +`)
slug = regx2.ReplaceAllString(slug, "-")
fmt.Println(slug)
}

What is wrong with this StringReplace code?

I have this string XXX:ABC. I want to remove XXX: so that the string becomes ABC .
The variable Symbol contains the string XXX:ABC .
The code as follows:
MsgBox, Symbol %Symbol%
SearchText := "XXX:"
ReplaceText := ""
StringReplace, newSymbol, Symbol, SearchText, ReplaceText, ALL
MsgBox, newSymbol %newSymbol%
From the message box output, newSymbol content is the same as Symbol. Can someone tell me what is wrong with my code?
I am using Autohotkey v1.1.14.03.
For command parameters, you have to distinguish between variable parameters and value parameters.
StringReplace for instance has the following argument list:
StringReplace, OutputVar, InputVar, SearchText [, ReplaceText,
ReplaceAll?]
The docs say furthermore:
OutputVar: The name of the variable in which to store the result
of the replacement process.
InputVar: The name of the variable whose contents will be read
from.
SearchText: The string to search for.
As you can see, some parameters are expected to be variable names, whereas others are expected to be values like strings or numbers. You can use variable contents as value parameters by either enclosing them in percent signs or using them within an expression:
StringReplace, newSymbol, Symbol, %SearchText%, %ReplaceText%, ALL
; or as an expression
StringReplace, newSymbol, Symbol, % SearchText, % ReplaceText, ALL
With the newer StrReplace() function I noticed I could not use it with variables for whatever reason. And the documentation here: https://autohotkey.com/docs/commands/StringReplace.htm
is lacking an example. After a lot of tests, couldn't figure it out.
So I wrote a "polyfill" for StrReplace, complete with test code.
; Author: John Mark Isaac Madison
; EMAIL : J4M4I5M7#hotmail.com
; I_SRC : input source text
; I_OLD : old token to find
; I_NEW : new token to replace old with
FN_POLYFILL_STR_REPLACE(I_SRC, I_OLD, I_NEW)
{
;Check length of input parameters:
;--------------------------------------------;
L1 := StrLen(I_SRC)
L2 := StrLen(I_OLD)
L3 := StrLen(I_NEW)
if( !(L1 > 0) )
{
msgbox BAD_PARAM_#1:STR_REP
}
if( !(L2 > 0) )
{
msgbox BAD_PARAM_#2:STR_REP
}
if( !(L3 > 0) )
{
msgbox BAD_PARAM_#3:STR_REP
}
;--------------------------------------------;
OP := "" ;output string
f_ptr := 0 ;fill pointer
max_i := StrLen(I_SRC)
dx := 0 ;<--Loop counter / index
LOOP ;;[LOOP_START];;
{
dx++
if(dx > max_i)
{
break ;[BAIL_OUT_OF_LOOP]
}
h := FN_IS_TOKEN_HERE(I_SRC, I_OLD, dx)
;if(8==dx)
;{
; msgbox, HACK_8 dx[%dx%] h[%h%] I_SRC[%I_SRC%] I_OLD[%I_OLD%]
; src_len := StrLen( I_SRC )
; old_len := StrLen( I_OLD )
; msgbox src_len [%src_len%] old_len[%old_len%] I_OLD[%I_OLD%]
;}
if( h > 0)
{
;token found, replace it by concating
;the I_NEW onto output string:
OP := OP . I_NEW
;OP := OP . "[X]"
;msgbox : I_SRC[%I_SRC%] h[%h%] dx[%dx%]
;jump pointer to last character of
;the found token to skip over
;now irrelevant characters:
dx := h
;msgbox, DX: %dx%
}
else
if( 0 == h)
{
msgbox, "H_SHOULD_NOT_BE_ZERO"
}
else
if( h < 0 )
{
;concat character to output:
c := SubStr(I_SRC,dx,1)
OP := OP . c
}
} ;;[LOOP_END];;
msgbox OP : %OP%
;msgbox I_SRC[ %I_SRC%] I_OLD[ %I_OLD%] I_NEW[ %I_NEW%]
return OP ;;<--return output string
}
;Author: John Mark Isaac Madison
;EMAIL : J4M4I5M7#hotmail.com
;unit-test that will run when script boots up:
FN_POLYFILL_STR_REPLACE_TEST()
{
T1 := FN_POLYFILL_STR_REPLACE("WHAT_IS_UP","UP","DOWN")
;;msgbox, T1 : %T1%
i_src := "123_TOKEN_123"
i_old := "TOKEN"
i_new := "NEEEW"
T2 := FN_POLYFILL_STR_REPLACE(i_src,i_old,i_new)
;;msgbox, T2 : %T2%
;;msgbox, "POLYFILL_TEST_RAN"
i_src := "const IS_VARNAME"
i_old := "VARNAME"
i_new := "BASH"
T3 := FN_POLYFILL_STR_REPLACE(i_src,i_old,i_new)
;msgbox, T3 : %T3%
i_src := "123456VARNAME"
i_old := "VARNAME"
i_new := "AB"
T4 := FN_POLYFILL_STR_REPLACE(i_src,i_old,i_new)
if(T1 != "WHAT_IS_DOWN")
{
msgbox [PSR_TEST_FAIL#1]
}
if(T2 != "123_NEEEW_123")
{
msgbox [PSR_TEST_FAIL#2]
}
if(T3 != "const IS_BASH")
{
msgbox [PSR_TEST_FAIL#3]
}
if(T4 != "123456AB")
{
msgbox [PSR_TEST_FAIL#4]
}
return ;rrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr;
}
FN_POLYFILL_STR_REPLACE_TEST()
Also, be aware that trimming out newlines in your data is harder than it
should be as well. And will be bound to throw a monkey wrench into whatever string parsing you are doing.
Problem still exists in newer versions.
solve by this alternative syntax for expressions:
newSymbol := StrReplace(Symbol, "" . SearchText, "" . ReplaceText)

How do you replace a character in Go using the Regexp package ReplaceAll function?

I am not familiar with C-like syntaxes and would like to write code to find & replace, say, all 'A's to 'B's in a source string, say 'ABBA' with the Regexp package ReplaceAll or ReplaceAllString functions? How do I set up type Regexp, src and repl? Here's the ReplaceAll code snippet from the Go documentation:
// ReplaceAll returns a copy of src in which all matches for the Regexp
// have been replaced by repl. No support is provided for expressions
// (e.g. \1 or $1) in the replacement text.
func (re *Regexp) ReplaceAll(src, repl []byte) []byte {
lastMatchEnd := 0; // end position of the most recent match
searchPos := 0; // position where we next look for a match
buf := new(bytes.Buffer);
for searchPos <= len(src) {
a := re.doExecute("", src, searchPos);
if len(a) == 0 {
break // no more matches
}
// Copy the unmatched characters before this match.
buf.Write(src[lastMatchEnd:a[0]]);
// Now insert a copy of the replacement string, but not for a
// match of the empty string immediately after another match.
// (Otherwise, we get double replacement for patterns that
// match both empty and nonempty strings.)
if a[1] > lastMatchEnd || a[0] == 0 {
buf.Write(repl)
}
lastMatchEnd = a[1];
// Advance past this match; always advance at least one character.
_, width := utf8.DecodeRune(src[searchPos:len(src)]);
if searchPos+width > a[1] {
searchPos += width
} else if searchPos+1 > a[1] {
// This clause is only needed at the end of the input
// string. In that case, DecodeRuneInString returns width=0.
searchPos++
} else {
searchPos = a[1]
}
}
// Copy the unmatched characters after the last match.
buf.Write(src[lastMatchEnd:len(src)]);
return buf.Bytes();
}
This is a routine to do what you want:
package main
import ("fmt"; "regexp"; "os"; "strings";);
func main () {
reg, error := regexp.Compile ("B");
if error != nil {
fmt.Printf ("Compile failed: %s", error.String ());
os.Exit (1);
}
output := string (reg.ReplaceAll (strings.Bytes ("ABBA"),
strings.Bytes ("A")));
fmt.Println (output);
}
Here is a small example. You can also find good examples in he Regexp test class
package main
import (
"fmt"
"regexp"
"strings"
)
func main() {
re, _ := regexp.Compile("e")
input := "hello"
replacement := "a"
actual := string(re.ReplaceAll(strings.Bytes(input), strings.Bytes(replacement)))
fmt.Printf("new pattern %s", actual)
}