Can we use regex to concat chars at different index - regex

Say for example here is a string input:
D8DB2F1F0F21R123
We need to extract chars at index(assumption starting index is 0): 4,8,6,1 ie: '2','0','1','8' and concat them.
Final output should be: 2018
Can we achieve the above desired result, just by a regex ?

Using bash :
$ x=D8DB2F1F0F21R123
$ echo ${x:4:1}${x:8:1}${x:6:1}${x:1:1}
2018

Related

Replace or remove period/dot from expression

I have this expression in PCRE and I want to leave/remove the . (period) out of klantnummer.
Expression:
^h:\/Klant\/(?<klantnummer>[^\/]+)\/(?<folder1>[^\/]+)\/(?<folder2>[^\/]+)
Input:
h:/Klant/12345678.9/map 1/map 2
Outcome: 12345678.9
Desired result: 123456789
https://regex101.com/r/EVv47V/1
So Klantnummer should have 123456789 as result
You can't do that in one step. You could catch it in two Capture Groups:
^h:\/Klant\/(?<klantnummer1>[^\.\/]+)\.(?<klantnummer2>[^\/]+)\/(?<folder1>[^\/]+)\/(?<folder2>[^\/]+)
and put both together by string concatenation after or use two regex steps and filter out the period in the second, like stated in comments.
Regex above assumes there is always a period, this will work for 0 or 1 period in the number:
h:\/Klant\/(?<klantnummer1>[^\.\/]+)(?:\.?(?<klantnummer2>[^\.\/]+))\/(?<folder1>[^\/]+)\/(?<folder2>[^\/]+)
As already discussed you can't do this on one step.
The solution of using 2 regex stages, or 2 splitting klantnummer into 2 groups before and after the capture group will both work.
However I believe that the simplest and most efficient both in terms of computer power and of code to write, will be to replace .with and empty String '' after the regex, and before using it.
You haven't said which programming language you are using so I can't give you the syntax/example.
If all that you are doing is splitting the String on the slashes you will probably find it easier to split the string into an array.
For example in python
s = "h:/Klant/12345678.9/map 1/map 2"
array = s.split('/')
Klantnummer=array[2].replace('.','')
folder1=array[3]
folder2=array[4]
print(Klantnummer)
print(folder1)
print(folder2)
output
123456789
map 1
map 2
Tested on https://www.online-python.com/

How to use zgrep to display all words of a x size from a wordlist?

I want to display all the words from my wordlist who start with a w and are 9 letters long. Yesterday I learnt a bit more on how to use zgrep so I came with :
zgrep '\(^w\)\(^.........$\)' a.gz
But this doesn't work and I think it's because I don't know how to do a AND between the two conditions. I found that it should be (?=expr)(?=expr) but I can't figure out how to build my command then
So how can I build my command using the (?=expr) ?
for example if I have a wordlist like this:
Washington
Sausage
Walalalalalaaaa --> shouldn't match
Wwwwwwwww --> should match
You may use
zgrep '^w[[:alpha:]]\{8\}$' a.gz
The POSIX BRE pattern will match a string that
^w - starts with w
[[:alpha:]]\{8\} - then has eight letters
$ - followed with with the end of string marker.
Also, see the 9.3 Basic Regular Expressions.

bash regular expression that will match YYMMDD but not longer numbers

The general problem
I am trying to understand how to prevent the existence of some pattern before or after a sought-out pattern when writing regex's!
A more specific example
I'm looking for a regex that will match dates in the format YYMMDD ((([0-9]{2})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1]))) inside a long string while ignoring longer numeric sequences
it should be able to match:
text151124moretext
123text151124moretext
text151124
text151124moretext1944
151124
but should ignore:
text15112412moretext
(reason: it has 8 numbers instead of 6)
151324
(reason: it is not a valid date YYMMDD - there is no 13th month)
how can I make sure that if a number has more than these 6 digits, it won't picked up as a date inside one single regex (meaning, that I would rather avoid preprocessing the string)
I've thought of \D((19|20)([0-9]{2})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1]))\D but doesn't this mean that there has to be some character before and after?
I'm using bash 3.2 (ERE)
thanks!
Try:
#!/usr/bin/env bash
extract_date() {
local string="$1"
local _date=`echo "$string" | sed -E 's/.*[^0-9]([0-9]{6})[^0-9].*/\1/'`
#date -d $_date &> /dev/null # for Linux
date -jf '%y%m%d' $_date &> /dev/null # for MacOS
if [ $? -eq 0 ]; then
echo $_date
else
return 1
fi
}
extract_date text15111224moretext # ignore n_digits > 6
extract_date text151125moretext # take
extract_date text151132 # # ignore day 32
extract_date text151324moretext1944 # ignore month 13
extract_date text150931moretext1944 # ignore 31 Sept
extract_date 151126 # take
Output:
151125
151126
If your tokens are line-separated (i.e. there is only one token per line):
^[\D]*[\d]{6}([\D]*|[\D]+[\d]{1,6})$
Basically, this regex looks for:
Any number of non-digits at the beginning of the string;
Exactly 6 digits
Any number of non-digits until the end OR at least one non-digit and at least one digit (up to 6) to the end of the string
This regex passes all of your given sample inputs.
You could use non-capturing groups to define non-digits either side of your date Regex. I had success with this expression and your same test data.
(?:\D)([0-9]{2})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1])(?:\D)

Regular expression character class with parenthesis with grep command

Regular expression with grep command. For example let say i have file called regular.txt which contain date like below:
$ cat regular.txt
july
jul
Fourth
4th
4
So i am trying match all these text from the input file,using below process method:
Method 1: Match only Fourth|4th|4
$egrep '(Fourth|4th|4)` regular.txt
output method 1:
Fourth
4th
4
Method 2: Match only Fourth|4th|4 using optional parenthesis
$ egrep '(Fourth|4(th)?)` regular.txt
output method 2:
Fourth
4th
4
Method 3: Match entire file july, jul, Fourth, 4th, 4. i am using command like below:
$ egrep 'july? (Fourth|4(th)?)` regular.txt
output method 3: Nothing will be match here. how to do this ?
could you please help me on this ?
Thanks,
Your july? (Fourth|4(th)?) regex matches a sequence of patterns, jul followed with an optional y, then a space, and then 2 alternatives: Fourth or 4 optionally followed with th substring.
If you plan to match jul or july as a 3rd alternative, add it to the grouping construct:
'Fourth|4(th)?|july?'
^ ^^

adding a space after each 4th number/digit Oracle 11G

I am trying to get a space into every 4th number/digit (not character). This is what I come up with:
newStudentNumber := regexp_replace(newStudentNumber, '[[:digit:]](....)', '\1 ');
dbms_output.put_line(newStudentNumber);
result:
NL 2345 7894 TUE
What I actually want:
NL 1234 5678 944 TUE
My code replaces the number at every 4th place with a spacebar, instead of adding a space like the wanted result above.
Can anyone explain this to me?
Thanks in advance
You can use the following regex..
([[:digit:]]{4})
And replace with what you are doing now.. \1(space)
Why yours is not working?
Your regex matches a digit and captures next 4 characters (not only digits). So.. when you do a replace.. the digit which is matched but not captured is also replaced.. and not because it is unable to insert.
Explanation for input = NL 12345678944 TUE and regex = [[:digit:]](....):
NL 12345678944 TUE (it will match digit "1" and captures "2345")
See DEMO