In regex capture group, exclude one word - regex

I have this type of url:
https://example.com/en/app/893245
https://example.com/ru/app/wq23245
https://example.com/app/8984245
I want to extract only word between com and app
https://example.com/en/app/893245 -> en
https://example.com/ru/app/wq23245 -> ru
https://example.com/app/8984245 ->
I tried to exclude app from capture group but I don't know how to do it except like this:
.*com\/((?!app).*)\/app
Is it possible to something like this but excluding the word app from being captured? example\.com\/(\w+|?!app)\/
Rubular link: https://rubular.com/r/NnojSgQK7EuelE

If you need a plain regex you may use lookarounds:
/(?<=example\.com\/)\w+(?=\/app)/
Or, probably better in a context of a URL:
/(?<=example\.com\/)[^\/]+(?=\/app)/
See the Rubular demo.
In Ruby, you may use
strs = ['https://example.com/en/app/893245','https://example.com/ru/app/wq23245','https://example.com/app/8984245']
strs.each { |s|
p s[/example\.com\/(\w+)\/app/, 1]
}
# => ["en", "ru", nil]

you could use sed
sed -n -f script.sed yourinput.txt
and inside script.sed:
s/.*com\/\(.*\)\/app.*/\1/p
Example input:
https://example.com/en/app/893245
https://example.com/ru/app/wq23245
https://example.com/app/8984245
Example output:
$ sed -n -f comapp.sed comapp.txt
en
ru

Related

Replace Proxy Credentials with sed

I try to replace this samples:
http_proxy="http://QQ#127.0.0.1:8080/"
http://test1:test2#127.0.0.1:8080/
http_proxy="http://QQ#127.0.0.1:8080/
http://#127.0.0.1:8080/"
with this regex (^.+?\/\/).+?(#.*$)
to get it like this
http_proxy="http://user:pass#127.0.0.1:8080/"
http://user:pass#127.0.0.1:8080/
http_proxy="http://user:pass#127.0.0.1:8080/"
http://user:pass#127.0.0.1:8080/
According to https://regex101.com/r/AE3Wxi/3 the regex seems to be working for the first 3 lines.
But when i try it with
echo http_proxy=\"http://QQ#127.0.0.1:8080/\" | sed 's/\(^.+?\/\/\).+?\(#.*$\)/\1user:pass\2/g'
It has this output:
http_proxy="http://QQ#127.0.0.1:8080/"
You have to escape the plus \+ and sed does not support non greedy quantifiers like .\+?
If you also want a match for the last example http://#127.0.0.1:8080/" the quantifier after the double forward slash should be * instead of +
You could write the command as:
echo http_proxy=\"http://QQ#127.0.0.1:8080/\" | sed 's/\(^.\+\/\/\).*\(#.*$\)/\1user:pass\2/'
Output
http_proxy="http://user:pass#127.0.0.1:8080/"
If you only want to replace the first occurrence in the line, you might shorten it to
echo http_proxy=\"http://QQ#127.0.0.1:8080/\" | sed 's~\(//\)[^#]*\(#\)~\1user:pass\2~'
See a regex demo and here for the captured group values.

How can I replace anything before first forward slash using bash script?

Using GitHub workflow I have the following command
echo MY_DIR=$(echo "${GITHUB_REF#refs/heads/}" | tr '[:upper:]' '[:lower:]')
This would return a value like something/something-else/another
I am looking to add to this script to replace everything before the first forward slash with thisword
Which would output thisword/something-else/another
Can regex be used on the single line script to do this replace? I believe I could use the following regex /^[^/]+/ but unsure how to combine with the current script.
Depending on the version and distro of sed (apologies, but there are many with different syntax and flags), you might be able to do something like:
echo MY_DIR=$(echo "${GITHUB_REF#refs/heads/}" | tr '[:upper:]' '[:lower:]' | sed 's/^[a-z]*\//thisword\//' )
Sed is finding-and-replacing a string of text starting from the beginning of the line ^ which contains any number of occurrences * of lowercase characters in any order [a-z] which are then followed by the first slash. The slashes can be escaped by using the backslash character \. To clarify sed's use of /, here's the same expression omitting the regex and slashes forming part of your search string: sed 's/find/replace/'.
Try the below regex
^([a-z]*)(\/)
function formatData() {
var str = "something/something-else/another";
var res = str.replace(/^([a-z]*)!?(\/)/gim, "otherword/");
document.getElementById("demo").innerHTML = res;
}
Assuming MY_DIR holds something/something-else/another, you can use
MY_DIR="something/something-else/another"
MY_DIR="thisword/${MY_DIR#*/}"
echo "$MY_DIR"
See the online demo.
This is an example of string variable expansion where # means "replace as few chars as possible from the left", and */ glob matches any text up to a / including it.

Regex to extract content from each line of a log file output from '_m' to the end of the line

Format of log line:
Xxx x xx:xx:xx xmmxxx XXXXXX: XXXXXXX:XXX: xxx_Mxxx_Xxxxxx_mxxxxxmmxx [XXX xxxx.
I want to extract from '_m' to the end of the line, removing the '_' before the 'm'.
New to regex...
Thanks!
if your tool/language support look-behind, this works: match the first _m till EOL. also ignore the leading _
(?<=_)m.*
test with grep:
kent$ echo "Xxx x xx:xx:xx xmmxxx XXXXXX: XXXXXXX:XXX: xxx_Mxxx_Xxxxxx_mxxxxxmmxx [XXX xxxx."|grep -Po '(?<=_)m.*'
mxxxxxmmxx [XXX xxxx.
With sed:
sed -n 's/^.*_\(m.*$\)/\1/p' file
It is quite easy:
This example is written in C# however the regex is quite general and will probably work anywhere:
Regex regex = new Regex(#"_(m.*)"); // If you look for _M the regex should be #"_(M.*)"
Match match = regex.Match(logLine);
if (match.Success)
Console.WriteLine(match.Groups[1].Value);
Hope this will help you on your quest.

Regex get value between bracket and comma

I have a few strings like these:
new google.maps.LatLng(52.80359, -4.7127),
new google.maps.LatLng(53.80645306, -5.45455287),
new google.maps.LatLng(51.8035914546, -4.7123622894287),
I need to get both the longitude and latitude, so one regex for each number, the - symbol needs including where possible.
I have tried a few tools online but none seem to pickup on a decent pattern
Simply use grep grep -oE '[-0-9]+\.[0-9]+'
$ echo "new google.maps.LatLng(52.80359, -4.7127)," | grep -oE '[-0-9]+\.[0-9]+'
52.80359
-4.7127
$ echo "new google.maps.LatLng(53.80645306, -5.45455287)," | grep -oE '[-0-9]+\.[0-9]+'
53.80645306
-5.45455287
$ echo "new google.maps.LatLng(51.8035914546, -4.7123622894287)," | grep -oE '[-0-9]+\.[0-9]+'
51.8035914546
-4.7123622894287
Grep is the command line tool for matching lines in files (or stdout) against a particular pattern, the -o is tells grep to display on the part of the line that matches (by default grep displays the whole line that matches the given pattern). The -E tell grep to use grep to use extended regexp.
The regexp pattern [-0-9] matches either a minus sign - or a digit the following + says repeated the previous item one or more times i.e in abc123xyz match 123 not just 1 the \. matches the decimal place we have to escaped with \ because a single . matches any character in regexp then match any digits after the decimal place using [0-9]+ again.
See the reference for more information on regular expressions.
I would use this approach:
LatLng\((-*\d+\.*\d+),\s(-*\d+\.*\d+)\)
While it matches more than what you probably need, it places the latitude in capture group 1 and the longtitude in capture group 2, both excluding the surrounding parantheses' and the comma.
See it in action here: http://regexr.com?32od6
in C# use Regex.Match as follows:
using System.Text.RegularExpressions;
...
Match match = Regex.Match(input, #"([-]?\d+(?:[.]\d+)?)\D+?([-]?\d+(?:[.]\d+)?)");
if (match.Success)
{
string Lat = match.Groups[1].Value;
string Lng = match.Groups[2].Value;
}

Why does this bash/sed call work?

I've been looking at examples of using sed to extract a substring using regex and I have a test script working. Problem is I don't understand why and would like to. Here's the script:
#!/bin/bash
string=" ID : s0016b54e23bc.ab.cd.efghig\
Name : cd167095"
echo -e "string: '$string'"
name=`echo $string | sed 's/.*\(cd.*\)/\1/'`
echo -e "\nExtracted: $name"
And it outputs:
string: ' ID : s0016b54e23bc.ab.cd.efghigName : cd167095'
Extracted: cd167095
The regex should have two matches:
cd.efghigName : cd167095
and
cd167095
Why is the second match returned?
Because it's "greedy"
The first .* matches as much as possible for the expression as a whole to succeed.
To see this, change the second cd to ef or something, and you will see the script return the first.
Now, if you use something like Ruby, Python, or Perl, you get more elaborate regular expressions, and you can use .*? which is the "non-greedy" form of .*.
#!/usr/bin/env ruby
string=" ID : s0016b54e23bc.ab.cd.efghig\
Name : cd167095"
puts string.gsub /.*?(cd.*)/, '\1'
so ross$ ./qq3
cd.efghigName : cd167095
Though really, I would just write:
string[/cd.*/]