Replace every nth instance of character in string - regex

I'm a bit new to Go, but I'm trying to replace every nth instance of my string with a comma. So for example, a part of my data looks as follows:
"2017-06-01T09:15:00+0530",1634.05,1635.95,1632.25,1632.25,769,"2017-06-01T09:16:00+0530",1632.25,1634.9,1631.65,1633.5,506,"2017-06-01T09:17:00+0530",1633.5,1639.95,1633.5,1638.4,991,
I want to replace every 6th comma with a '\n' so it looks like
"2017-06-01T09:15:00+0530",1634.05,1635.95,1632.25,1632.25,769"
"2017-06-01T09:16:00+0530",1632.25,1634.9,1631.65,1633.5,506"
"2017-06-01T09:17:00+0530",1633.5,1639.95,1633.5,1638.4,991"
I've looked at the regexp package and that just seems to be a finder. The strings package does have a replace but I don't know how to use it to replace specific indices. I also don't know how to find specific indices without going through the entire string character by character. I was wondering if there is a regEx solution that is more elegant than me writing a helper function.
Strings are immutable so I'm not able to edit them in place.
EDIT: Cast the string into []bytes. This allows me to edit the string in place. Then the rest is a fairly simple for loop, where dat is the data.

If that is your input, you should replace ," strings with \n".You may use strings.Replace() for this. This will leave a last, trailing comma which you can remove with a slicing.
Solution:
in := `"2017-06-01T09:15:00+0530",1634.05,1635.95,1632.25,1632.25,769,"2017-06-01T09:16:00+0530",1632.25,1634.9,1631.65,1633.5,506,"2017-06-01T09:17:00+0530",1633.5,1639.95,1633.5,1638.4,991,`
out := strings.Replace(in, ",\"", "\n\"", -1)
out = out[:len(out)-1]
fmt.Println(out)
Output is (try it on the Go Playground):
"2017-06-01T09:15:00+0530",1634.05,1635.95,1632.25,1632.25,769
"2017-06-01T09:16:00+0530",1632.25,1634.9,1631.65,1633.5,506
"2017-06-01T09:17:00+0530",1633.5,1639.95,1633.5,1638.4,991

If you want flexible.
package main
import (
"fmt"
"strings"
)
func main() {
input := `"2017-06-01T09:15:00+0530",1634.05,1635.95,1632.25,1632.25,769,"2017-06-01T09:16:00+0530",1632.25,1634.9,1631.65,1633.5,506,"2017-06-01T09:17:00+0530",1633.5,1639.95,1633.5,1638.4,991,`
var result []string
for len(input) > 0 {
token := strings.SplitN(input, ",", 7)
s := strings.Join(token[0:6], ",")
result = append(result, s)
input = input[len(s):]
input = strings.Trim(input, ",")
}
fmt.Println(result)
}
https://play.golang.org/p/mm63Hx24ne

So I figured out what I was doing wrong. I initially had the data as a string, but if I cast it to a byte[] then I can update it in place.
This allowed me to use a simple for loop below to solve the issue without relying on any other metric other than nth character instance
for i := 0; i < len(dat); i++ {
if dat[i] == ',' {
count += 1
}
if count%6 == 0 && dat[i] == ',' {
dat[i] = '\n'
count = 0
}

Related

Remove all underscores until last number

I faced with the following problem. I need to remove all underscores between the start of the string and last digit in string (like was: 123_456__ - became: 123456__). I used the usual loop for it, which goes through string.length - 1 down to 0 and when the symbol is digit I start the new loop from the 0 to the i, where i is position of the found digit and forming new string skipping underscores. But it seems that there are some ways to replace it with regex or more "Kotlin-style" code, but I do not know how to do it. Is it possible to do it in more convenient way?
One way to to this is to use string functions like takeLastWhile / drop etc.
val s = "123_456__"
val end = s.takeLastWhile { !it.isDigit() }
val start = s.dropLast(end.length).filter { it != '_' } // or replace("_", "")
val result = start + end
println(result)

Can I split a column text as array using data factory data flow?

Inside my data flow pipeline I would like to add a derived column and its datatype is array. I would like to split the existing column with 1000 characters without breaking words. I think we can use regexSplit,
regexSplit(<string to split> : string, <regex expression> : string) => array
But I do not know which regular expression I can use for split the existing column without breaking words.
Please help me to figure it out.
I created a workaround for this and it works fine for me.
filter(split(regexReplace(regexReplace(text, `[\t\n\r]`, ``), `(.{1,1000})(?:\s|$)`, `$1~~`), '~~'), #item !="")
I think, we have a better solution than this.
I wouldn't use a regex for this, but a truncating function like this one, courtesy of TimS:
public static string TruncateAtWord(this string input, int length)
{
if (input == null || input.Length < length)
return input;
int iNextSpace = input.LastIndexOf(" ", length, StringComparison.Ordinal);
return string.Format("{0}…", input.Substring(0, (iNextSpace > 0) ? iNextSpace : length).Trim());
}
Translated into expression functions it would look something* like this.
substring(Input, 1, iif(locate(Input, ' ', 1000) > 0, locate(Input, ' ', 1000) , length(Input)) )
Since you don't have a lastIndexOf available as an expression function, you would have to default to locate, which means that this expression truncates the string at the first space after the 1000th character.
*I don't have an environment where I can test this.

replace all characters in string except last 4 characters

Using Go, how do I replace all characters in a string with "X" except the last 4 characters?
This works fine for php/javascript but not for golang as "?=" is not supported.
\w(?=\w{4,}$)
Tried this, but does not work. I couldn't find anything similar for golang
(\w)(?:\w{4,}$)
JavaScript working link
Go non-working link
A simple yet efficient solution that handles multi UTF-8-byte characters is to convert the string to []rune, overwrite runes with 'X' (except the last 4), then convert back to string.
func maskLeft(s string) string {
rs := []rune(s)
for i := 0; i < len(rs)-4; i++ {
rs[i] = 'X'
}
return string(rs)
}
Testing it:
fmt.Println(maskLeft("123"))
fmt.Println(maskLeft("123456"))
fmt.Println(maskLeft("1234世界"))
fmt.Println(maskLeft("世界3456"))
Output (try it on the Go Playground):
123
XX3456
XX34世界
XX3456
Also see related question: How to replace all characters in a string in golang
Let's say inputString is the string you want to mask all the characters of (except the last four).
First get the last four characters of the string:
last4 := string(inputString[len(inputString)-4:])
Then get a string of X's which is the same length as inputString, minus 4:
re := regexp.MustCompile("\w")
maskedPart := re.ReplaceAllString(inputString[0:len(inputString)-5], "X")
Then combine maskedPart and last4 to get your result:
maskedString := strings.Join([]string{maskedPart,last4},"")
Simpler approach without regex and looping
package main
import (
"fmt"
"strings"
)
func main() {
string := "thisisarandomstring"
head := string[:len(string)-4]
tail := string[len(string)-4:]
mask := strings.Repeat("x", len(head))
fmt.Printf("%v%v", mask, tail)
}
// Output:
// xxxxxxxxxxxxxxxring
Create a Regexp with
re := regexp.MustCompile("\w{4}$")
Let's say inputString is the string you want to remove the last four characters from. Use this code to return a copy of inputString without the last 4 characters:
re.ReplaceAllString(inputString, "")
Note: if it's possible that your input string could start out with less than four characters, and you still want those characters removed since they are at the end of the string, you should instead use:
re := regexp.MustCompile("\w{0,4}$")

Allow user to pass a separator character by doubling it in C++

I have a C++ function that accepts strings in below format:
<WORD>: [VALUE]; <ANOTHER WORD>: [VALUE]; ...
This is the function:
std::wstring ExtractSubStringFromString(const std::wstring String, const std::wstring SubString) {
std::wstring S = std::wstring(String), SS = std::wstring(SubString), NS;
size_t ColonCount = NULL, SeparatorCount = NULL; WCHAR Separator = L';';
ColonCount = std::count(S.begin(), S.end(), L':');
SeparatorCount = std::count(S.begin(), S.end(), Separator);
if ((SS.find(Separator) != std::wstring::npos) || (SeparatorCount > ColonCount))
{
// SEPARATOR NEED TO BE ESCAPED, BUT DON'T KNOW TO DO THIS.
}
if (S.find(SS) != std::wstring::npos)
{
NS = S.substr(S.find(SS) + SS.length() + 1);
if (NS.find(Separator) != std::wstring::npos) { NS = NS.substr(NULL, NS.find(Separator)); }
if (NS[NS.length() - 1] == L']') { NS.pop_back(); }
return NS;
}
return L"";
}
Above function correctly outputs MANGO if I use it like:
ExtractSubStringFromString(L"[VALUE: MANGO; DATA: NOTHING]", L"VALUE")
However, if I have two escape separators in following string, I tried doubling like ;;, but I am still getting MANGO instead ;MANGO;:
ExtractSubStringFromString(L"[VALUE: ;;MANGO;;; DATA: NOTHING]", L"VALUE")
Here, value assigner is colon and separator is semicolon. I want to allow users to pass colons and semicolons to my function by doubling extra ones. Just like we escape double quotes, single quotes and many others in many scripting languages and programming languages, also in parameters in many commands of programs.
I thought hard but couldn't even think a way to do it. Can anyone please help me on this situation?
Thanks in advance.
You should search in the string for ;; and replace it with either a temporary filler char or string which can later be referenced and replaced with the value.
So basically:
1) Search through the string and replace all instances of ;; with \tempFill- It would be best to pick a combination of characters that would be highly unlikely to be in the original string.
2) Parse the string
3) Replace all instances of \tempFill with ;
Note: It would be wise to run an assert on your string to ensure that your \tempFill (or whatever you choose as the filler) is not in the original string to prevent an bug/fault/error. You could use a character such as a \n and make sure there are non in the original string.
Disclaimer:
I can almost guarantee there are cleaner and more efficient ways to do this but this is the simplest way to do it.
First as the substring does not need to be splitted I assume that it does not need to b pre-processed to filter escaped separators.
Then on the main string, the simplest way IMHO is to filter the escaped separators when you search them in the string. Pseudo code (assuming the enclosing [] have been removed):
last_index = begin_of_string
index_of_current_substring = begin_of_string
loop: search a separator starting at last index - if not found exit loop
ok: found one at ix
if char at ix+1 is a separator (meaning with have an escaped separator
remove character at ix from string by copying all characters after it one step to the left
last_index = ix+1
continue loop
else this is a true separator
search a column in [ index_of_current_substring, ix [
if not found: error incorrect string
say found at c
compare key_string with string[index_of_current_substring, c [
if equal - ok we found the key
value is string[ c+2 (skip a space after the colum), ix [
return value - search is finished
else - it is not our key, just continue searching
index_of_current_substring = ix+1
last_index = index_of_current_substring
continue loop
It should now be easy to convert that to C++

Golang Regex: FindAllStringSubmatch to []string

I download a multiline file from Amazon S3 in format like:
ColumnAv1 ColumnBv1 ColumnCv1 ...
ColumnAv2 ColumnBv2 ColumnCv2 ...
the file is of type byte. Then I want to parse this with regex:
matches := re.FindAllSubmatch(file,-1)
then I want to feed result row by row to function which takes []string as input (string[0] is ColumnAv1, string[1] is ColumnBv2, ...).
How should I convert result of [][][]byte to []string containing first, second, etc row? I suppose I should do it in a loop, but I cannot get this working:
for i:=0;i<len(len(matches);i++{
tmp:=myfunction(???)
}
BTW, Why does function FindAllSubmatch return [][][]byte whereas FindAllStringSubmatch return [][]string?
(Sorry I don't have right now access to my real example, so the syntax may not be proper)
It's all explained extensively in the package's documentation.
Read the parapgraph which explains :
There are 16 methods of Regexp that match a regular expression and identify the matched text. Their names are matched by this regular expression:
Find(All)?(String)?(Submatch)?(Index)?
In your case, you probably want to use FindAllStringSubmatch.
In Go, a string is just a read-only []byte.
You can choose to either keep passing []byte variables around,
or cast the []byte values to string :
var byteSlice = []byte{'F','o','o'}
var str string
str = string(byteSlice)
You can simply iterate through the bytes result as you would do for strings result using two nested loop, and just convert slice of bytes to a string in the second loop:
package main
import "fmt"
func main() {
f := [][][]byte{{{'a', 'b', 'c'}}}
for _, line := range f {
for _, match := range line { // match is a type of []byte
fmt.Println(string(match))
}
}
}
Playground