I have multiple rows of string where the string is all wrong. Here is one row an an example of the geometry and output expected:
id
geometry
output
1
POLYGON (( 106.812271, -6.361551, 106.812111, -6.361339, 106.81205, -6.361177, 106.81206, -6.360905, 106.812055, -6.360582, 106.812065, -6.360218, 106.812293, -6.359295, 106.812593, -6.358644, 106.812436, -6.358406, 106.8121515, -6.3582051, 106.8123, -6.357823, 106.81244, -6.357407, 106.812612, -6.356842, 106.812719, -6.356544, 106.81274, -6.356384, 106.812864, -6.356148, 106.813019, -6.356021, 106.813287, -6.355797, 106.813781, -6.355286, 106.814076, -6.354751, 106.814277, -6.354393, 106.814403, -6.354027, 106.814553, -6.353814, 106.814736, -6.353526, 106.814993, -6.353302, 106.81516, -6.353024, 106.815358, -6.35279, 106.815509, -6.352588, 106.815675, -6.352331, 106.8153007, -6.3521138, 106.8151398, -6.3520137, 106.8149789, -6.3518005, 106.8147643, -6.3516939, 106.8144639, -6.3516245, 106.8141527, -6.3515392, 106.8135734, -6.351342, 106.813171, -6.3512034, 106.8123284, -6.3509219, 106.8122418, -6.3511298, 106.8118164, -6.3521534, 106.8116597, -6.3525047, 106.8111849, -6.3535692, 106.8102245, -6.3554942, 106.8093545, -6.3568947, 106.8085097, -6.3580518, 106.80795, -6.358832, 106.8077793, -6.3590429, 106.807668, -6.359441, 106.807499, -6.360346, 106.8072531, -6.3616378, 106.8071476, -6.3622599, 106.8070637, -6.3626798, 106.8070823, -6.3629367, 106.8071207, -6.3634531, 106.8078269, -6.363831, 106.809448, -6.364124, 106.810574, -6.364198, 106.81066, -6.362993, 106.811175, -6.36277, 106.812087, -6.361703, 106.812271, -6.361551))
POLYGON (( 106.812271 -6.361551, 106.812111 -6.361339, 106.81205 -6.361177, 106.81206 -6.360905, 106.812055 -6.360582, 106.812065 -6.360218, 106.812293 -6.359295, 106.812593 -6.358644, 106.812436 -6.358406, 106.8121515 -6.3582051, 106.8123 -6.357823, 106.81244 -6.357407, 106.812612 -6.356842, 106.812719 -6.356544, 106.81274 -6.356384, 106.812864 -6.356148, 106.813019 -6.356021, 106.813287 -6.355797, 106.813781 -6.355286, 106.814076 -6.354751, 106.814277 -6.354393, 106.814403 -6.354027, 106.814553 -6.353814, 106.814736 -6.353526, 106.814993 -6.353302, 106.81516 -6.353024, 106.815358 -6.35279, 106.815509 -6.352588, 106.815675, -6.352331, 106.8153007, -6.3521138, 106.8151398 -6.3520137, 106.8149789 -6.3518005, 106.8147643 -6.3516939, 106.8144639 -6.3516245, 106.8141527 -6.3515392, 106.8135734 -6.351342, 106.813171 -6.3512034, 106.8123284 -6.3509219, 106.8122418 -6.3511298, 106.8118164 -6.3521534, 106.8116597 -6.3525047, 106.8111849 -6.3535692, 106.8102245 -6.3554942, 106.8093545 -6.3568947, 106.8085097 -6.3580518, 106.80795 -6.358832, 106.8077793 -6.3590429, 106.807668 -6.359441, 106.807499 -6.360346, 106.8072531 -6.3616378, 106.8071476 -6.3622599, 106.8070637 -6.3626798, 106.8070823 -6.3629367, 106.8071207 -6.3634531, 106.8078269 -6.363831, 106.809448 -6.364124, 106.810574 -6.364198, 106.81066 -6.362993, 106.811175 -6.36277, 106.812087 -6.361703, 106.812271 -6.361551))
One example is as follows above. I need to get rid of all odd position commas and only keep the even position commas. So that the geometry can become output.
I tried doing a split(text.",") and concatenate however when the columns is blank it returns xxx,,,, which is not what I had in mind.
Since some have more than 200 commas means that I need to have more than 200 columns, is there a simpler way like using regex?Someone please help.
If the second number is always negative, this is simple as replacing , - (comma, space, dash) with (space).
=REGEXREPLACE(B2,", -"," ")
If not,
=REGEXREPLACE(B2,"(-??\d+\.?\d*),(\s*-?\d+\.?\d*)","$1$2")
Capture group #1: (-??\d+\.?\d*)
-?? zero or one of literal dash followed by
\d+ one or more digits followed by
.? zero or one of literal .
\d* zero or more of digits
literal ,
Capture group #2 (\s*-?\d+\.?\d*)
\s* zero or more of space characters
-? zero or one of literal dash followed by
\d+ one or more digits followed by
.? zero or one of literal .
\d* zero or more of digits
Replace with capture groups only: $1$2
try:
=INDEX(REGEXREPLACE(QUERY(FLATTEN(SPLIT(A1, ",")&IF(ISODD(
SEQUENCE(1, COLUMNS(SPLIT(A1, ",")))),, ",")),,9^9), ",$", ))
for array:
=INDEX(IFERROR(BYROW(A1:A3, LAMBDA(x, REGEXREPLACE(QUERY(FLATTEN(SPLIT(x, ",")&
IF(ISODD(SEQUENCE(1, COLUMNS(SPLIT(x, ",")))),, ",")),,9^9), ",$", )))))
An idea to match , and capture any non-commas with an optional comma after:
=REGEXREPLACE(A1; ",([^,]*,?)"; "$1")
Replace with $1 what was captured by the first group - See this demo at regex101
I have to check if a string follows the following patterns:
Field1=value1
Field1=value1,Field2=value2
7645a=fds23,Field2=dsd$
The words 'field1', 'value1' don't count, the important thing is that it has to be something=something and if there is more than 1, it should be a comma for each pair.
I reached the following regex:
((\w+)[^=])=((\w+)[^=])
"Match any one or more word except if it has =, then there should be an = and then match any one or more word except if it has =".
The thing is, it does take the comma but I think is because of \w. I don't think this is correct.
I'm using https://regexr.com/ to check for the correct regular expression.
If you need to match symbols like $, then don't use \w. This satisfies all your conditions:
(?:([^,=\n]+)=([^,=\n]+))(?:,([^,=\n]+)=([^,=\n]+))*
Explanation:
(?: // Begin non-capturing group (first key=value pair)
( // Begin capturing group (key)
[^,=\n]+ // Match one or more characters that aren't comma, equals, or new line
) // End capturing group (key)
= // Equals
( // Begin capturing group (value)
[^,=\n]+ // Match one or more characters that aren't comma, equals, or new line
) // End capturing group (value)
) // End non-capturing group (first key=value pair)
(?: // Begin non-capturing group (additional key=value pairs)
, // Starts with comma (otherwise entire group fails)
( // Begin capturing group (key)
[^,=\n]+ // Match one or more characters that aren't comma, equals, or new line
) // End capturing group (key)
= // Equals
( // Begin capturing group (value)
[^,=\n]+ // Match one or more characters that aren't comma, equals, or new line
) // End capturing group (value)
) // End non-capturing group (additional key=value pairs)
* // Match 0 or more of the additional key value pairs
Test Here
I am trying to match all content between parentheses, including parentheses in a non-greedy way. There should be a space before and after the opening parentheses (or the start of a line before the opening parentheses) and a space before and after the closing parentheses. Take the following text:
( )
( This is a comment )
1 2 +
\ a
: square dup * ;
( foo bar
baz )
(quux)
( ( )
(
( )
The first line should be matched, the second line including its content should be matched, the second last line should not be matched (or raise an error) and the last line should be matched. The two lines foo bar baz should be matched, but (quux) should not as it doesn't contain a space before and after the parentheses. The line with the extra opening parentheses inside should be matched.
I tried a few conventional regexes for matching content between parentheses but without much success. The regex engine is that of Go's.
re := regexp.MustCompile(`(?s)\(( | .*? )\)`)
s = re.ReplaceAllString(s, "")
Playground: https://play.golang.org/p/t93tc_hWAG
Regular expressions "can't count" (that's over-simplified, but bear with me), so you can't match on an unbounded amount of parenthesis nesting. I guess you're mostly concerned about matching only a single level in this case, so you would need to use something like:
foo := regexp.MustCompile(`^ *\( ([^ ]| [^)]*? \)$`)
This does require the comment to be the very last thing on a line, so it may be better to add "match zero or more spaces" there. This does NOT match the string "( ( ) )" or try to cater for arbitrary nesting, as that's well outside the counting that regular expressions can do.
What they can do in terms of counting is "count a specific number of times", they can't "count how many blah, then make sure there's the same number of floobs" (that requires going from a regular expression to a context-free grammar).
Playground
Here is a way to match all the 3 lines in question:
(?m)^[\t\p{Zs}]*\([\pZs}\t](?:[^()\n]*[\pZs}\t])?\)[\pZs}\t]*$
See the Go regex demo at the new regex101.com
Details:
(?m) - multiline mode on
^ - due to the above, the start of a line
[\t\p{Zs}]* - 0+ horizontal whitespaces
\( - a (
[\pZs}\t] - exactly 1 horizontal whitespace
(?:[^()\n]*[\pZs}\t])? - an optional sequence matching:
[^()\n]* - a negated character class matching 0+ characters other than (, ) and a newline
[\pZs}\t] - horizontal whitespace
\) - a literal )
[\pZs}\t]* - 0+ horizontal whitespaces
$ - due to (?m), the end of a line.
Go playground demo:
package main
import (
"regexp"
"fmt"
)
func main() {
var re = regexp.MustCompile(`(?m)^[\t\p{Zs}]*\([\pZs}\t](?:[^()\n]*[\pZs}\t])?\)[\pZs}\t]*$`)
var str = ` ( )
( This is a comment )
1 2 +
\ a
: square dup * ;
( foo bar
baz )
(quux)
( ( )
(
( )`
for i, match := range re.FindAllString(str, -1) {
fmt.Println("'", match, "' (found at index", i, ")")
}
}