replace all characters in string except last 4 characters - regex

Using Go, how do I replace all characters in a string with "X" except the last 4 characters?
This works fine for php/javascript but not for golang as "?=" is not supported.
\w(?=\w{4,}$)
Tried this, but does not work. I couldn't find anything similar for golang
(\w)(?:\w{4,}$)
JavaScript working link
Go non-working link

A simple yet efficient solution that handles multi UTF-8-byte characters is to convert the string to []rune, overwrite runes with 'X' (except the last 4), then convert back to string.
func maskLeft(s string) string {
rs := []rune(s)
for i := 0; i < len(rs)-4; i++ {
rs[i] = 'X'
}
return string(rs)
}
Testing it:
fmt.Println(maskLeft("123"))
fmt.Println(maskLeft("123456"))
fmt.Println(maskLeft("1234世界"))
fmt.Println(maskLeft("世界3456"))
Output (try it on the Go Playground):
123
XX3456
XX34世界
XX3456
Also see related question: How to replace all characters in a string in golang

Let's say inputString is the string you want to mask all the characters of (except the last four).
First get the last four characters of the string:
last4 := string(inputString[len(inputString)-4:])
Then get a string of X's which is the same length as inputString, minus 4:
re := regexp.MustCompile("\w")
maskedPart := re.ReplaceAllString(inputString[0:len(inputString)-5], "X")
Then combine maskedPart and last4 to get your result:
maskedString := strings.Join([]string{maskedPart,last4},"")

Simpler approach without regex and looping
package main
import (
"fmt"
"strings"
)
func main() {
string := "thisisarandomstring"
head := string[:len(string)-4]
tail := string[len(string)-4:]
mask := strings.Repeat("x", len(head))
fmt.Printf("%v%v", mask, tail)
}
// Output:
// xxxxxxxxxxxxxxxring

Create a Regexp with
re := regexp.MustCompile("\w{4}$")
Let's say inputString is the string you want to remove the last four characters from. Use this code to return a copy of inputString without the last 4 characters:
re.ReplaceAllString(inputString, "")
Note: if it's possible that your input string could start out with less than four characters, and you still want those characters removed since they are at the end of the string, you should instead use:
re := regexp.MustCompile("\w{0,4}$")

Related

Include string into another string using regex

I have some set of strings. Strings might have items listed between square brackets. I'd like to include into strings with brackets a constant number of extra items. Brackets might be empty, or absent. For example:
string1 --> string1 # added nothing
string2[] --> string2[extra1="1",extra2="2"] # added two items
string3[item="1"] --> string3[item="1",extra1="1",extra2="2"] # added two items
Currently I achieve this with the following code (Golang):
str1 := "test"
str2 := `test[]`
str3 := `test[item1="1"]`
re := regexp.MustCompile(`\[(.+)?\]`)
for _, s := range []string{str1, str2, str3} {
s = re.ReplaceAllString(s, fmt.Sprintf(`[item1="a",item2="b",$1]`))
fmt.Println(s)
}
But in the output, in case of empty brackets I also got an unwanted comma "," in the end:
test
test[item1="a",item2="b",]
test[item1="a",item2="b",item1="1"]
Is it possible to avoid paste comma in case of empty brackets?
Of course it's possible to parse string again and trim the comma, but it seems suboptimal.
Code example on Go playground
You can have two regexes, where one matches for empty [] and other
matches for string with text inside []. Below is the tested code -
https://play.golang.org/p/_DOOGDMUOCm
Second way is just look back in the string after replacing it. If the
last two characters are ,] and you can substring till , and add ]. I
guess you already know this approach.
package main
import (
"fmt"
"regexp"
)
func main() {
str1 := "test"
str2 := `test[]`
str3 := `test[item1="1"]`
re := regexp.MustCompile(`\[(.*)\]`)
for _, s := range []string{str1, str2, str3} {
matched,err := regexp.Match(`\[(.+)\]`, []byte(s));
_ = err;
if(matched==true){
s = re.ReplaceAllString(s, fmt.Sprintf(`[item1="a",item2="b",$1]`));
}else {
s = re.ReplaceAllString(s, fmt.Sprintf(`[item1="a",item2="b"]`));
}
fmt.Println(s)
}
}

apostrophe in word not being recognized for string replace

I am having a problem replacing the word "you're" with regexp.
All of the other words are changing correctly just the word "you're".
I think it is not parsing after the apostrophe.
I have to replace the word "you" to "I" and "you're" to "I'm".
It will change "you" to "I" but "you're" becomes "I're" because it is not going past the apostrophe and it thinks that is the end of the word for some reason. I have to escape the apostrophe somehow.
Please see below for the code in question.
package main
import (
"fmt"
"math/rand"
"regexp"
"strings"
"time"
)
//Function ElizaResponse to take in and return a string
func ElizaResponse(str string) string {
// replace := "How do you know you are"
/*Regex MatchString function with isolation of the word "father"
*with a boundry ignore case regex command.
*/
if matched, _ := regexp.MatchString(`(?i)\bfather\b`, str);
//Condition to replace the original string if it has the word "father"
matched {
return "Why don’t you tell me more about your father?"
}
r1 := regexp.MustCompile(`(?i)\bI'?\s*a?m\b`)
//Match the words "I am" and capture for replacement
matched := r1.MatchString(str)
//condition if "I am" is matched
if matched {
capturedString := r1.ReplaceAllString(str, "$1")
boundaries := regexp.MustCompile(`\b`)
tokens := boundaries.Split(capturedString, -1)
// List the reflections.
reflections := [][]string{
{`I`, `you`},
{`you're`, `I'm`},
{`your`, `my`},
{`me`, `you`},
{`you`, `I`},
{`my`, `your`},
}
// Loop through each token, reflecting it if there's a match.
for i, token := range tokens {
for _, reflection := range reflections {
if matched, _ := regexp.MatchString(reflection[0], token); matched {
tokens[i] = reflection[1]
break
}
}
}
// Put the tokens back together.
return strings.Join(tokens, ``)
}
//Get random number from the length of the array of random struct
//an array of strings for the random response
response := []string{"I’m not sure what you’re trying to say. Could you explain it to me?",
"How does that make you feel?",
"Why do you say that?"}
//Return a random index of the array
return response[rand.Intn(len(response))]
}
func main() {
rand.Seed(time.Now().UTC().UnixNano())
fmt.Println("Im supposed to just take what you're saying at face value?")
fmt.Println(ElizaResponse("Im supposed to just take what you're saying at face value?"))
}
Note that the apostrophe character creates a word boundary, so your use of \b in regular expressions is probably tripping you up. That is, the string "I'm" has four word boundaries, one before and after each character.
┏━┳━┳━┓
┃I┃'┃m┃
┗━┻━┻━┛
│ │ │ └─ end of line creates a word boundary
│ │ └─── after punctuation character creates a word boundary
│ └───── before punctuation character creates a word boundary
└─────── start of line creates a word boundary
There is no way to change the behavior of the word boundary metacharacter so you might be better off mapping regexes that include the full word with punctuation to the desired replacement, e.g.:
type Replacement struct {
rgx *regexp.Regexp
rpl string
}
replacements := []Replacement{
{regexp.MustCompile("\\bI\\b"), "you"},
{regexp.MustCompile("\\byou're\\b"), "I'm"},
// etc...
}
Note also that one of your examples contains a UTF-8 "right single quotation mark" (U+2019, 0xe28099), not to be confused with the UTF-8/ASCII apostrophe (U+0027, 0x27)!
fmt.Sprintf("% x", []byte("'’")) // => "27 e2 80 99"
What you want to achieve here is to replace specific strings with specific replacements. It is easier to achieve that with a map of string keys and values, where each unique key is a literal phrase to search and the values are the texts to replace with.
This how you may define the reflections:
reflections := map[string]string{
`you're`: `I'm`,
`your`: `my`,
`me`: `you`,
`you`: `I`,
`my`: `your`,
`I` : `you`,
}
Next, you need to get the keys in the descending by length order (here is a sample code):
type ByLenDesc []string
func (a ByLenDesc) Len() int {
return len(a)
}
func (a ByLenDesc) Less(i, j int) bool {
return len(a[i]) > len(a[j])
}
func (a ByLenDesc) Swap(i, j int) {
a[i], a[j] = a[j], a[i]
}
And then in the function:
var keys []string
for key, _ := range reflections {
keys = append(keys, key)
}
sort.Sort(ByLenDesc(keys))
Then build the pattern:
pat := "\\b(" + strings.Join(keys, `|`) + ")\\b"
// fmt.Println(pat) // => \b(you're|your|you|me|my|I)\b
The pattern matches you're, your, you, me, my, or I as whole words.
res := regexp.MustCompile(pat).ReplaceAllStringFunc(capturedString, func(m string) string {
return reflections[m]
})
The above code creates a regex object and replaces all matches with the corresponding reflections values.
See the Go demo.
I have found that i just needed to change these two lines of code.
boundaries := regexp.MustCompile(`(\b[^\w']|$)`)
return strings.Join(tokens, ` `)
Its stops the split function from splitting at the ' character.
Then the return of tokens needs a space to put out the string otherwise it would be a continuous string.

Regex extracting sets of numbers from string when prefix occurs, while not matching said prefix

As stated in the title, given a situation where I have a string like so:
"somestring~200~122"
I am wanting to regex to match the numbers when the prefix "~" occurs. So I can ultimately end up with [200, 122].
Matching the prefix is necessary as I need to protect against a case where a string like the one below should not be matched
"somestring~abc200~def122"
For additional context: As stated in the title, I am using go so I am planning on using doing something like the following in order to obtain the numbers within the string:
pattern := regexp.MustCompile("regex i need help with")
numbers := pattern.FindAllString(host, -1)
You can use FindAllStringSubmatch to extract the group containing just the digits. Below is an example that finds all instances of ~ followed by numbers. It additionally converts all the matches to ints
and inserts them into a slice:
package main
import (
"fmt"
"regexp"
"strconv"
)
func main() {
host := "somestring~200~122"
pattern := regexp.MustCompile(`~(\d+)`)
numberStrings := pattern.FindAllStringSubmatch(host, -1)
numbers := make([]int, len(numberStrings))
for i, numberString := range numberStrings {
number, err := strconv.Atoi(numberString[1])
if err != nil {
panic(err)
}
numbers[i] = number
}
fmt.Println(numbers)
}
https://play.golang.org/p/09YyewtRXz

Replace every nth instance of character in string

I'm a bit new to Go, but I'm trying to replace every nth instance of my string with a comma. So for example, a part of my data looks as follows:
"2017-06-01T09:15:00+0530",1634.05,1635.95,1632.25,1632.25,769,"2017-06-01T09:16:00+0530",1632.25,1634.9,1631.65,1633.5,506,"2017-06-01T09:17:00+0530",1633.5,1639.95,1633.5,1638.4,991,
I want to replace every 6th comma with a '\n' so it looks like
"2017-06-01T09:15:00+0530",1634.05,1635.95,1632.25,1632.25,769"
"2017-06-01T09:16:00+0530",1632.25,1634.9,1631.65,1633.5,506"
"2017-06-01T09:17:00+0530",1633.5,1639.95,1633.5,1638.4,991"
I've looked at the regexp package and that just seems to be a finder. The strings package does have a replace but I don't know how to use it to replace specific indices. I also don't know how to find specific indices without going through the entire string character by character. I was wondering if there is a regEx solution that is more elegant than me writing a helper function.
Strings are immutable so I'm not able to edit them in place.
EDIT: Cast the string into []bytes. This allows me to edit the string in place. Then the rest is a fairly simple for loop, where dat is the data.
If that is your input, you should replace ," strings with \n".You may use strings.Replace() for this. This will leave a last, trailing comma which you can remove with a slicing.
Solution:
in := `"2017-06-01T09:15:00+0530",1634.05,1635.95,1632.25,1632.25,769,"2017-06-01T09:16:00+0530",1632.25,1634.9,1631.65,1633.5,506,"2017-06-01T09:17:00+0530",1633.5,1639.95,1633.5,1638.4,991,`
out := strings.Replace(in, ",\"", "\n\"", -1)
out = out[:len(out)-1]
fmt.Println(out)
Output is (try it on the Go Playground):
"2017-06-01T09:15:00+0530",1634.05,1635.95,1632.25,1632.25,769
"2017-06-01T09:16:00+0530",1632.25,1634.9,1631.65,1633.5,506
"2017-06-01T09:17:00+0530",1633.5,1639.95,1633.5,1638.4,991
If you want flexible.
package main
import (
"fmt"
"strings"
)
func main() {
input := `"2017-06-01T09:15:00+0530",1634.05,1635.95,1632.25,1632.25,769,"2017-06-01T09:16:00+0530",1632.25,1634.9,1631.65,1633.5,506,"2017-06-01T09:17:00+0530",1633.5,1639.95,1633.5,1638.4,991,`
var result []string
for len(input) > 0 {
token := strings.SplitN(input, ",", 7)
s := strings.Join(token[0:6], ",")
result = append(result, s)
input = input[len(s):]
input = strings.Trim(input, ",")
}
fmt.Println(result)
}
https://play.golang.org/p/mm63Hx24ne
So I figured out what I was doing wrong. I initially had the data as a string, but if I cast it to a byte[] then I can update it in place.
This allowed me to use a simple for loop below to solve the issue without relying on any other metric other than nth character instance
for i := 0; i < len(dat); i++ {
if dat[i] == ',' {
count += 1
}
if count%6 == 0 && dat[i] == ',' {
dat[i] = '\n'
count = 0
}

Golang Regex: FindAllStringSubmatch to []string

I download a multiline file from Amazon S3 in format like:
ColumnAv1 ColumnBv1 ColumnCv1 ...
ColumnAv2 ColumnBv2 ColumnCv2 ...
the file is of type byte. Then I want to parse this with regex:
matches := re.FindAllSubmatch(file,-1)
then I want to feed result row by row to function which takes []string as input (string[0] is ColumnAv1, string[1] is ColumnBv2, ...).
How should I convert result of [][][]byte to []string containing first, second, etc row? I suppose I should do it in a loop, but I cannot get this working:
for i:=0;i<len(len(matches);i++{
tmp:=myfunction(???)
}
BTW, Why does function FindAllSubmatch return [][][]byte whereas FindAllStringSubmatch return [][]string?
(Sorry I don't have right now access to my real example, so the syntax may not be proper)
It's all explained extensively in the package's documentation.
Read the parapgraph which explains :
There are 16 methods of Regexp that match a regular expression and identify the matched text. Their names are matched by this regular expression:
Find(All)?(String)?(Submatch)?(Index)?
In your case, you probably want to use FindAllStringSubmatch.
In Go, a string is just a read-only []byte.
You can choose to either keep passing []byte variables around,
or cast the []byte values to string :
var byteSlice = []byte{'F','o','o'}
var str string
str = string(byteSlice)
You can simply iterate through the bytes result as you would do for strings result using two nested loop, and just convert slice of bytes to a string in the second loop:
package main
import "fmt"
func main() {
f := [][][]byte{{{'a', 'b', 'c'}}}
for _, line := range f {
for _, match := range line { // match is a type of []byte
fmt.Println(string(match))
}
}
}
Playground