Repeat expression until it has reached the end of parenthesis? - regex

I have been working in Regx101 and have been messing around with expressions however I cannot figure out how to repeat an expression until a certain character (ending parenthesis ')' ) is reached.
I have tried [^)]* and it works but it does not repeat the expression.
This is my expression so far:
CREATE\s+TABLE\s+(\w*)(\s+(\w*)\s+(\w*)[,][^)]*
I am trying to get the (\s+(\w*)\s+(\w*)[,] part to repeat until the end of a parenthesis is reached.
My test string is:
CREATE TABLE tableName(columnName, type, primary, columnName2, type2)
These are the results I am getting:
enter image description here
Expected results should include column2 and type2 as their own individual group.
Any help is welcomed as well as any solutions I have not tried yet.

Not completely sure what exactly you want, but this example
echo 'CREATE TABLE tableName(columnName, type, primary, columnName2, type2)' \
| perl -nle '/CREATE\s+TABLE\s+(\w*)\s*\((\s*(\s*\w+ *,)+(\s*\w+)?)/ && print "$1\n$2"'
or this slightly more readable version with the /x modifier
echo 'CREATE TABLE tableName(columnName, type, primary, columnName2, type2)' \
| perl -nle '/CREATE\s+TABLE\s+(\w*) \s* \( (\s*(\s*\w+\s*,)+ (\s*\w+)? )/x && print "$1\n$2"'
give this output
tableName
columnName, type, primary, columnName2, type2

Related

How to append after Regex Search in Perl script

I'm trying to convert a MySQL dump into SQLite database, for database migration. I need to edit the date to append time, so for example 2018-09-19 should be converted to 2018-09-19 00:00:00.00. The reason for this format has to do with how our application works. This is the solution I came up with but it doesn't work.
#!/usr/bin/perl
while (<>){
<Other Stuff>
....
s/([12]\d{3}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01]))\[/$1[00:00:00.00]][/
print;
}
For testing I created a test.txt file with just for testing
2019-03-06
And in command line or terminal I used the following command to test if the append works.
perl -pe 's/([12]\d{3}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01]))\[/$1[00:00:00.00]][/' < test.txt > testout.txt
This gives a clear error of:
syntax error at -e line 1, near "00:" Execution of -e aborted due to compilation errors.
Using this #dada's solution that looks like this gives no error but also doesn't append the 00:00:00.00 at the end of the line
The Expected output should be
2019-03-06 00:00:00.00
Your problem statement says you want to turn:
2018-09-19
into:
2018-09-19 00:00:00.00
However, your code is:
s/([12]\d{3}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01]))\[/$1[00:00:00.00]][/
Using /x we can write it a bit more legibly as:
s/
(
[12]\d{3} # year
- # hyphen
( 0[1-9] | 1[0-2] ) # month (saved as $2)
- # hyphen
( 0[1-9] | [12]\d | 3[01] ) # day (saved as $3)
) # save this as $1
\[ # square bracket
/$1[00:00:00.00]][/x
From this, it is clear that 2018-09-19 does not match because it does not end with a square bracket.
The replacement value is:
$1[00:00:00.00]][
This (tries to) say:
lookup index 00:00:00.00 in array #1 and substitute value
append ][
However this is not valid perl and not what you wanted anyway.
What is happening is that instead of $x + [y] (scalar followed by string value), perl is seeing $x[y] (value from array). To prevent this, either use braces ( ${x}[y] ) or escape the bracket ( $x\[y] ). This results in:
${1}[00:00:00.00]][
which is still not what the problem said was needed as the zeros are wrapped in brackets.
To get what you say you want, remove the \[ from the end of the search part and remove the unnecessary brackets from the replacement part:
s/
(
[12]\d{3}
- ( 0[1-9] | 1[0-2] )
- ( 0[1-9] | [12]\d | 3[01] )
)
# no bracket here
/$1 00:00:00.00/x; # no brackets here
Note that your code as given has another bug which is that the final print needs to be separated from the s/// by a semi-colon.

Using sed (or any other tool) to remove the quotes in a json file

I have a json file
{"doc_type":"user","requestId":"1000778","clientId":"42114"}
I want to change it to
{"doc_type":"user","requestId":1000778,"clientId":"42114"}
i.e. convert the requestId from String to Integer. I have tried some ways, but none seem to work :
sed -e 's/"requestId":"[0-9]"/"requestId":$1/g' test.json
sed -e 's/"requestId":"\([0-9]\)"/"requestId":444/g' test.json
Could someone help me out please?
Try
sed -e 's/\("requestId":\)"\([0-9]*\)"/\1\2/g' test.json
or
sed -e 's/"requestId":"\([0-9]*\)"/"requestId":\1/g' test.json
The main differences with your attempts are:
Your regular expressions were looking for [0-9] between double quotes, and that's a single digit. By using [0-9]* instead you are looking for any number of digits (zero or more digits).
If you want to copy a sequence of characters from your search in your replacing string, you need to define a group with a starting \( and a final \) in the regexp, and then use \1 in the replacing string to insert the string there. If there are multiple groups, you use \1 for the first group, \2 for the second group, and so on.
Also note that the final g after the last / is used to apply this substitution in all matches, in every processed line. Without that g, the substitution would only be applied to the first match in every processed line. Therefore, if you are only expecting one such replacement per line, you can drop that g.
Since you said "or any other tool", I'd recommend jq! While sed is great for line-based, JSON is not and sometimes newlines are added in just for pretty printing the output to make developers' lives easier. It's rules also get even more tricky when handling Unicode or double-quotes in string content. jq is specifically designed to understand the JSON format and can dissect it appropriately.
For your case, this should do the job:
jq '.requestId = (.requestId | tonumber)'
Note, this will throw an error if requestId is missing and not output the JSON object. If that's a concern, you might need something a little more sophisticated like this example:
jq 'if has("requestId") then .requestId = (.requestId | tonumber) else . end'
Also, jq does pretty-print and colorize it's output if sent to a terminal. To avoid that and just see a compact, one-line-per-object format, add -Mc to the command. jq will also work if provided multiple objects back-to-back without a newline in the input. Here's a full-demo to show this filter:
$ (echo '{"doc_type":"bare"}{}'
echo '{"doc_type":"user","requestId":"0092","clientId":"11"}'
echo '{"doc_type":"user","requestId":"1000778","clientId":"42114"}'
) | jq 'if has("requestId") then .requestId = (.requestId | tonumber) else . end' -Mc
Which produced this output:
{"doc_type":"bare"}
{}
{"doc_type":"user","requestId":92,"clientId":"11"}
{"doc_type":"user","requestId":1000778,"clientId":"42114"}
sed -e 's/"requestId":"\([0-9]\+\)"/"requestId":\1/g' test.json
You were close. The "new" regex terms I had to add: \1 means "whatever is contained in the first \( \) on the "search" side, and \+ means "1 or more of the previous thing".
Thus, we search for the string "requestId":" followed by a group of 1 or more digits, followed by ", and replace it with "requestId": followed by that group we found earlier.
Perhaps the jq (json query) tool would help you out?
$ cat test
{"doc_type":"user","requestId":"1000778","clientId":"42114"}
$ cat test |jq '.doc_type' --raw-output
user
$

expr: Regex not detecting valid expression

I'm trying to create a script that automatically looks for plugged in devices and makes a compressed backup of it. However, I'm having trouble finding the correct way on how to use expr:
#!/bin/bash
MountText=`mount`
# Show result of regex search
expr "$MountText" : '\/dev\/(sd[^a])\d on [\S]+\/[\s\S]+? type'
The expression by itself is \/dev\/(sd[^a])\d on [\S]+\/[\s\S]+? type, and captures the device name (sd*), while excluding mounts relating to sda.
I drafted the regex on Regexr (regex shared in link), and used what mount dumped (gist).
For some reason, it only gives this odd error:
0
I looked around, and I found this SO question. It didn't help me too much, because now it's implying that expr didn't recognize the parentheses I used to capture the device, and it also believed that the expression didn't capture anything!
I'm really confused. What am I doing wrong?
A few things to note with expr:
The regular expression is implicitly anchored at the beginning of the string (as if you started the expression with ^).
The capture group is indicated with escaped parentheses, \(...\).
Only basic regular expressions are supported. Specifically, \s, \S, and +? are not supported.
The following will match the one device.
expr "$MountText" : '.*/dev/\(sd[^a]\)[[:digit:]] on '
Note that you don't need to use expr with bash, which has regular-expression matching built in.
regex='/dev/(sd[^a])[[:digit:]] on '
mount | while IFS= read -r line; do
[[ $line =~ $regex ]] && echo ${BASH_REMATCH[1]}
done

Regular Expression: Capture character pattern zero or one positions from start of string

I have a series of entries, which can be represented by this string:
my_string="-D-K4_NNNN_M116_R1_001.gz _D-K4_NNNN_M56_R1_001.gz R-K4_NNNN_KQ9_R1_001.gz D-K4_NNNN_M987_R1_001.gz _R-K4_NNNN_M987_R1_001.gz"
For each entry, I need to return whether it starts with 'R' or 'D'. In order to do this, I need to ignore any character that comes before it. So, I wrote this regular expression:
for i in $my_string; do echo $i | grep -E -o "^*?[RD]"; done
However, this is only returning R or D for entries which are not preceded by a character.
How do I get this regex to return the R or D value in every case, whether there is a character in front of it or not? Keep in mind that the only thing which can be 'hard-coded' into the expression is the pattern to be matched.
It will be easy if you use sed:
sed -r 's/^.?([RD]).*$/\1/'
i.e.
for i in $my_string; do echo $i | sed -r 's/^.?([RD]).*$/\1/'; done
Update:
Here is what each part of the command means:
-r : extended regular expression, although I think -e should work but
turns out that during my testing, in order to use capturing group
in regex, I need -r. Anyway, not the main point
The script can be read as:
s/XXXX/YYYY/ : substitude from XXXX to YYYY
The "from" pattern (XXXX) means:
^ : start with
.? : zero or one occurence of any character
( : start of group
[RD] : either R or D
) : end of group (which means, the group will contains either R or D
.* : any number of any character
$ : till the end
the "to" pattern (YYYY):
\1 : content of capture group 1 in the "from" pattern (which is the "R or D")
Use a parameter expansion to remove the prefix before using grep:
for i in $my_string; do echo ${i#[^RD]} | grep -o "^[RD]" ; done
or use a simple test without grep (since you already know that each item starts with a R or a D):
for i in $my_string; do
if [[ $i =~ ^[^D]?R ]] ; then
echo 'R'
else
echo 'D'
fi
done
This regex worked in my local tests. Please have a try:
^.?[RD]
I can't think of a way to ONLY return the letter you want. I'd have a command after to detect whether the returned string is greater than 1 character long, and if so, I'd return only the second character.
I'm not 100% sure of what you are asking ( i understood you want to match only R and D at the beginning of a filename, whatever the character before it, if there is one ), but I think you should use lookbehind, in php you would do
$re = "/(?<=^\S|\s\S|\s)[RD]/";
$str = "-D-K4_NNNN_M116_R1_001.gz _D-K4_NNNN_M56_R1_001.gz R-K4_NNNN_KQ9_R1_001.gz D-K4_NNNN_M987_R1_001.gz _R-K4_NNNN_M987_R1_001.gz";
preg_match_all($re, $str, $matches);
You can see the output here.
To use Perl syntax in bash you must enable it. https://unix.stackexchange.com/questions/84477/forcing-bash-to-use-perl-regex-engine
You can test your regexp here if you need https://regex101.com/r/vV3nS3/1
This does it when using the modifier 'g' for global: (^| ).?(R|D)
See the regex101 here

RegEx, colon separated list

I am trying to match a list of colon separated emails. For the sake of keeping things simple, I am going to leave the email expression out of the mix and match it with any number of characters with no spaces in between them.
The following will be matched...
somevalues ;somevalues; somevalues;
or
somevalues; somevalues ;somevalues
The ending ; shouldn't be necessary.
The following would not be matched.
somevalues ; some values somevalues;
or
some values; somevalues some values
I have gotten this so far, but it doesn't work. Since I allow spaces between the colons, the expression doesn't know if the space is in the word, or between the colon.
([a-zA-Z]*\s*\;?\s*)*
The following is matched (which shouldn't e)
somevalue ; somevalues some values;
How do I make the expression only allow spaces if there is a ; to the left or right of it?
Why not just split on semi colon and then regex out the email addresses?
This following PCRE Expression should work.
\w+\s*(?:(?:;(?:\s*\w+\s*)?)+)?
However if putting the email address validation regular expression on this will require
replacing \w+ with (?:<your email validation regex>)
Probabbly This is exactly what you want, tested on http://regexr.com?2rnce
EDIT: However depending on the language you might? need to escape ; as \;
The problem comes from the ? in \;?
[a-zA-Z]*(\s*;\s*[a-zA-Z]*)*
should work.
Try
([a-zA-Z]+\s*;\s*)*([a-zA-Z]+\s*\)?
Note that I changed * to + on the e-mail pattern since I assume you don't want strings like ; to match.
to solve this with regex, you must prepend + append the delimiter to your input lines, otherwise you cannot easily detect the first and last item
#!/bin/bash
input=a:aa:aaa:aaaa
needle=aa
if [[ ":$input:" =~ ":$needle:" ]]
then
echo found
else
echo not found
fi
# -> found
.. this takes 45 nanoseconds
bash globbing is faster with 35 nanoseconds
input=a:aa:aaa:aaaa
needle=aa
if [[ ":$input:" == *":$needle:"* ]]
then
echo found
else
echo not found
fi
# -> found
stupid solution: split by delimiter and match whole lines. this one is really slow, with 5100 nanoseconds
echo a:aa:aaa:aaaa | tr ':' $'\n' | grep "^aa$"
# -> aa