Regex get value between bracket and comma - regex

I have a few strings like these:
new google.maps.LatLng(52.80359, -4.7127),
new google.maps.LatLng(53.80645306, -5.45455287),
new google.maps.LatLng(51.8035914546, -4.7123622894287),
I need to get both the longitude and latitude, so one regex for each number, the - symbol needs including where possible.
I have tried a few tools online but none seem to pickup on a decent pattern

Simply use grep grep -oE '[-0-9]+\.[0-9]+'
$ echo "new google.maps.LatLng(52.80359, -4.7127)," | grep -oE '[-0-9]+\.[0-9]+'
52.80359
-4.7127
$ echo "new google.maps.LatLng(53.80645306, -5.45455287)," | grep -oE '[-0-9]+\.[0-9]+'
53.80645306
-5.45455287
$ echo "new google.maps.LatLng(51.8035914546, -4.7123622894287)," | grep -oE '[-0-9]+\.[0-9]+'
51.8035914546
-4.7123622894287
Grep is the command line tool for matching lines in files (or stdout) against a particular pattern, the -o is tells grep to display on the part of the line that matches (by default grep displays the whole line that matches the given pattern). The -E tell grep to use grep to use extended regexp.
The regexp pattern [-0-9] matches either a minus sign - or a digit the following + says repeated the previous item one or more times i.e in abc123xyz match 123 not just 1 the \. matches the decimal place we have to escaped with \ because a single . matches any character in regexp then match any digits after the decimal place using [0-9]+ again.
See the reference for more information on regular expressions.

I would use this approach:
LatLng\((-*\d+\.*\d+),\s(-*\d+\.*\d+)\)
While it matches more than what you probably need, it places the latitude in capture group 1 and the longtitude in capture group 2, both excluding the surrounding parantheses' and the comma.
See it in action here: http://regexr.com?32od6

in C# use Regex.Match as follows:
using System.Text.RegularExpressions;
...
Match match = Regex.Match(input, #"([-]?\d+(?:[.]\d+)?)\D+?([-]?\d+(?:[.]\d+)?)");
if (match.Success)
{
string Lat = match.Groups[1].Value;
string Lng = match.Groups[2].Value;
}

Related

Convert regex positive look ahead to sed operation

I would like to sed to find and replace every occurrence of - with _ but only before the first occurrence of = on every line.
Here is a dataset to work with:
ke-y_0-1="foo"
key_two="bar"
key_03-three="baz-jazz-mazz"
key-="rax_foo"
key-05-five="craz-"
In the end the dataset should look like this:
ke_y_0_1="foo"
key_two="bar"
key_03_three="baz-jazz-mazz"
key_="rax_foo"
key_05_five="craz-"
I found this regex will match properly.
\-(?=.*=)
However the regex uses positive lookaheads and it appears that sed (even with -E, -e or -r) dose not know how to work with positive lookaheads.
I tried the following but keep getting Invalid preceding regular expression
cat dataset.txt | sed -r "s/-(?=.*=)/_/g"
Is it possible to convert this in a usable way with sed?
Note, I do not want to use perl. However I am open to awk.
You can use
sed ':a;s/^\([^=]*\)-/\1_/;ta' file
See the online demo:
#!/bin/bash
s='ke-y_0-1="foo"
key_two="bar"
key_03-three="baz-jazz-mazz"
key-="rax_foo"
key-05-five="craz-"'
sed ':a; s/^\([^=]*\)-/\1_/;ta' <<< "$s"
Output:
ke_y_0_1="foo"
key_two="bar"
key_03_three="baz-jazz-mazz"
key_="rax_foo"
key_05_five="craz-"
Details:
:a - setting a label named a
s/^\([^=]*\)-/\1_/ - find any zero or more chars other than a = char from the start of string (while capturing into Group 1 (\1)) and then matches a - char, and replaces with Group 1 value (\1) and a _ (that replaces the found - char)
ta - jump to lable a location upon successful replacement. Else, stop.
You might also use awk setting the field separator to = and replace all - with _ for the first field.
To print only the replaced lines:
awk 'BEGIN{FS=OFS="="}gsub("-", "_", $1)' file
Output
ke_y_0_1="foo"
key_03_three="baz-jazz-mazz"
key_="rax_foo"
key_05_five="craz-"
If you want to print all lines:
awk 'BEGIN{FS=OFS="="}{gsub("-", "_", $1);print}' file

How can I replace anything before first forward slash using bash script?

Using GitHub workflow I have the following command
echo MY_DIR=$(echo "${GITHUB_REF#refs/heads/}" | tr '[:upper:]' '[:lower:]')
This would return a value like something/something-else/another
I am looking to add to this script to replace everything before the first forward slash with thisword
Which would output thisword/something-else/another
Can regex be used on the single line script to do this replace? I believe I could use the following regex /^[^/]+/ but unsure how to combine with the current script.
Depending on the version and distro of sed (apologies, but there are many with different syntax and flags), you might be able to do something like:
echo MY_DIR=$(echo "${GITHUB_REF#refs/heads/}" | tr '[:upper:]' '[:lower:]' | sed 's/^[a-z]*\//thisword\//' )
Sed is finding-and-replacing a string of text starting from the beginning of the line ^ which contains any number of occurrences * of lowercase characters in any order [a-z] which are then followed by the first slash. The slashes can be escaped by using the backslash character \. To clarify sed's use of /, here's the same expression omitting the regex and slashes forming part of your search string: sed 's/find/replace/'.
Try the below regex
^([a-z]*)(\/)
function formatData() {
var str = "something/something-else/another";
var res = str.replace(/^([a-z]*)!?(\/)/gim, "otherword/");
document.getElementById("demo").innerHTML = res;
}
Assuming MY_DIR holds something/something-else/another, you can use
MY_DIR="something/something-else/another"
MY_DIR="thisword/${MY_DIR#*/}"
echo "$MY_DIR"
See the online demo.
This is an example of string variable expansion where # means "replace as few chars as possible from the left", and */ glob matches any text up to a / including it.

extract substring with SED

I have the next strings:
for example:
input1 = abc-def-ghi-jkl
input2 = mno-pqr-stu-vwy
I want extract the first word between "-"
for the fisrt string I want to get: def
if the input is the second string, I want to get: pqr
I want to use the command SED, Could you help me please?
Use
sed 's,^[^-]*-\([^-]*\).*,\1,' file
The string after the first - will be captured up to the second - and the rest will be matched, then the matched line will be replaced with the group text.
With bash:
var='input1 = abc-def-ghi-jkl'
var=${var#*-} # remove shortest prefix `*-`, this removes `input1 = abc-`
echo "${var%%-*}" # remove longest suffix `-*`, this removes `-ghi-jkl`
Or with awk:
awk -F'-' '{print $2}' <<<'input1 = abc-def-ghi-jkl'
Use - as input field separator and print the second field.
Or with cut:
cut -d'-' -f2 <<<'input1 = abc-def-ghi-jkl'
When you want to use sed, you can choose between solutions like
# Double processing
echo "$input1" | sed 's/[^-]*-//;s/-.*//'
# Normal approach
echo "$input1" | sed -r 's/^[^-]*-([^-]*)|-.*)/\1/g'
# Funny alternative
echo "$input1" | sed -r 's/(^[^-]*-|-.*)//g'
The obvious "external" tool would be cut. You can also look at a Bash builtin solution like
[[ ${input1} =~ ([^-]*)-([^-]*) ]] && printf %s "${BASH_REMATCH[2]}"
grep solution (in my opinion this is the most natural approach, as you are only trying to find matches to a regular expression - you are not looking to edit anything, so there should be no need for the more advanced command sed)
grep -oP '^[^-]*-\K[^-]*(?=-)' << EOF
> abc-qrs-bobo-the-clown
> 123-45-6789
> blah-blah-blah
> no dashes here
> mahi-mahi
> EOF
Output
qrs
45
blah
Explanation
Look at the inputs first, included here for completeness as a heredoc (more likely you would name your file as the last argument to grep.) The solution requires at least two dashes to be present in the string; in particular, for mahi-mahi it will find no match. If you want to find the second mahi as a match, you can remove the lookahead assertion at the end of the regular expression (see below).
The regular expression does this. First note the command options: -o to return only the matched substring, not the entire line; and -P to use Perl extensions. Then, the regular expression: start from the beginning of the line (^); look for zero or more non-dash characters followed by dash, and then (\K) discard this part of the required match from the substrings found to match the pattern. Then look for zero or more non-dash characters again - this will be returned by the command. Finally, require a dash following this pattern, but do not include it in the match. This is done with a lookahead (marked by (?= ... )).

Regex -> extracting fixed position occurrences from complex string

I have a string like this one below (nvram extract) that is used by tinc VPN to define the network hosts:
1<host1<host1.network.org<<0<10.10.10.0/24<<Ed25519PublicKey = 8dtRRgAaTbUNtPxW9U3nGn6U7uvfIPwRo1wnx7xMIUH<Subnet = 10.10.3.0/24>1<host2<host2.network.org<<0<10.10.9.0/24<<Ed25519PublicKey = irn48tqF2Em4rIG0ggBmpEfaVKtkl6DmGdSzTHMmVEI<>0<host3<host3.network.org<<0<10.10.11.0/24<<Ed25519PublicKey = wQt1sFwOsd1hnBaNGHq4JDyib22fOg1YqzOp0p08ZTD<>
I'm trying to extract from the above:
host1.network.org
host2.network.org
host3.network.org
The hostname and keys are made up, but the structure of the input string is accurate. By the way the end node could be as well be defined as an IP addresses, so I'm trying to extract what's in between the second occurrence of "<" and the first occurrence of "<<". Since this is a multi match the occurrences are counted after either beginning of the line or the ">" character. So the above could be read as follow:
1<host1<host1.network.org<<0<10.10.10.0/24<<Ed25519PublicKey = 8dtRRgAaTbUNtPxW9U3nGn6U7uvfIPwRo1wnx7xMIUH<Subnet = 10.10.3.0/24>
1<host2<host2.network.org<<0<10.10.9.0/24<<Ed25519PublicKey = irn48tqF2Em4rIG0ggBmpEfaVKtkl6DmGdSzTHMmVEI<>
0<host3<host3.network.org<<0<10.10.11.0/24<<Ed25519PublicKey = wQt1sFwOsd1hnBaNGHq4JDyib22fOg1YqzOp0p08ZTD<>
As I need this info in a shell script I guess I would need to store each host/IP as an emlement of an array.
I have used regexp online editors, and managed to work out this string:
^[0|1]<.*?(\<(.*?)\<<)|>[0|1]<.*?(\<(.*?)\<)
however is I run a
grep -Eo '^[0|1]<.*?(\<(.*?)\<<)|>[0|1]<.*?(\<(.*?)\<)'
against the initial stinge I get the full string in return so I must be doing something wrong :-/
P.S. running on buysbox:
`BusyBox v1.25.1 (2017-05-21 14:11:58 CEST) multi-call binary.
Usage: grep [-HhnlLoqvsriwFE] [-m N] [-A/B/C N] PATTERN/-e PATTERN.../-f FILE [FILE]...
Search for PATTERN in FILEs (or stdin)
-H Add 'filename:' prefix
-h Do not add 'filename:' prefix
-n Add 'line_no:' prefix
-l Show only names of files that match
-L Show only names of files that don't match
-c Show only count of matching lines
-o Show only the matching part of line
-q Quiet. Return 0 if PATTERN is found, 1 otherwise
-v Select non-matching lines
-s Suppress open and read errors
-r Recurse
-i Ignore case
-w Match whole words only
-x Match whole lines only
-F PATTERN is a literal (not regexp)
-E PATTERN is an extended regexp
-m N Match up to N times per file
-A N Print N lines of trailing context
-B N Print N lines of leading context
-C N Same as '-A N -B N'
-e PTRN Pattern to match
-f FILE Read pattern from file`
Thanks!
OK, no response to my comment so I'll enter it as answer. How about
\w*[a-z]\w*(\.\w*[a-z]\w*)+
It matches at least two parts of a fully qualified name, separated by a dot.
grep -Eo '\w*[a-z]\w*(\.\w*[a-z]\w*)+'
yields
host1.network.org
host2.network.org
host3.network.org
(assuming your string is entered in stdin ;)
The regex you have is based on capturing groups and with grep you can only get full matches. Besides, you use -E (POSIX ERE flavor), while your regex is actually not POSIX ERE compatible as it contains lazy quantifiers that are not supported by this flavor.
I think you can extract all non-< chars between < and << followed with a digit and then a < with a PCRE regex (-P option):
s='1<host1<host1.network.org<<0<10.10.10.0/24<<Ed25519PublicKey = 8dtRRgAaTbUNtPxW9U3nGn6U7uvfIPwRo1wnx7xMIUH<Subnet = 10.10.3.0/24>1<host2<host2.network.org<<0<10.10.9.0/24<<Ed25519PublicKey = irn48tqF2Em4rIG0ggBmpEfaVKtkl6DmGdSzTHMmVEI<>0<host3<host3.network.org<<0<10.10.11.0/24<<Ed25519PublicKey = wQt1sFwOsd1hnBaNGHq4JDyib22fOg1YqzOp0p08ZTD<>'
echo $s | grep -oP '(?<=<)[^<]+(?=<<[0-9]<)'
See the regex demo and a grep demo.
Output:
host1.network.org
host2.network.org
host3.network.org
Here, (?<=<) is a positive lookbehind that only checks for the < presence immediately to the left of the current location but does not add < to the match value, [^<]+ matches 1+ chars other than < and (?=<<[0-9]<) (a positive lookahead) requires <<, then a digit, and then a < but again does not add these chars to the match.
If you have no PCRE option in grep, try replacing all the text you do not need with some char, and then either split with awk, or use grep:
echo $s | \
sed 's/[^<]*<[^<]*<\([^<][^<]*\)<<[0-9]<[^<]*<<[^<]*[<>]*/|\1/g' | \
grep -oE '[^|]+'
See another online demo.

Bash - Find and replace regex with another string

I have the following string libVersion = '1.23.45.6' and I need to replace 1.23.45.6 with 1.23.45.7.
Obviously the version could be any number with similar format (it does not have to be 4 numbers).
I tried to use the following but doesn't work
echo "libVersion = '1.23.45.6'" |sed "s/([0-9\.]+)/1.23.45.7/g"
Basic sed, ie sed without any arguments uses BRE (Basic Regular Expression). In BRE, you have to escape +, to bring the power of regex + which repeats the previous token one or more times, likewise for the capturing groups \(regex\)
echo "libVersion = '1.23.45.6'" | sed "s/[0-9.]\+/1.23.45.7/"
You may also use a negated char class to replace all the chars exists within single quotes.
echo "libVersion = '1.23.45.6'" | sed "s/'[^']*'/'1.23.45.7'/"
Since the replacement should occur only one time, you don't need a g global modifier.