Remove leading 0 in String with letters and digits - regex

I have a comma separated file where I need to change the first column removing leading zeroes in string. Text file is as below
ABC-0001,ab,0001
ABC-0010,bc,0010
I need to get the data as under
ABC-1,ab,0001
ABC-10,bc,0010
I can do a command line replace which i tried as below:
sed 's/ABC-0*[1-9]/ABC-[1-9]/g' file
I ended up getting output:
ABC-[1-9],ab,0001
ABC-[1-9]0,ac,0010
Can you please tell me what I am missing in here.
Alternately I also tried to apply formatting in the SQL that generates this file as below:
select regexp_replace(key,'((0+)|1-9|0+)','(1-9|0+)') from file where key in ('ABC-0001','ABC-0010')
which gives output as
ABC-(1-9|0+)1
ABC-(1-9|0+)1(1-9|0+)
Help on either of solution will be very helpful!

Try this :
sed -E 's/ABC-0*([1-9])/ABC-\1/g' file
------ --
| |
capturing group |
captured group

To do it in the query using Oracle, where the key value with the zeroes you want to remove is in a column called "key" in a table called "file", would look like this:
select regexp_replace(key, '(-)(0+)(.*)', '\1\3')
from file;
You need to capture the dash as it is "consumed" by the regex as it is matched. Followed by the second group of one or more 0's, followed by the rest of the field. Replace with captured groups 1 and 3, leaving the 0's (if any) between out.

Related

PostgreSQL regex_replace substitution for 2 groups

I have column in Postgres db which has text in char varying data type. The text includes an uri which contains file name and resembles as below;
The file is a file of \\88-77-99-666.abc.example.com\Folder1\Folder2\Folder3\Folder4\20221122\12345678.PDF [9bc8rer55c655f4cb5df763c61862d3fdde9557b0] is the sha1 of the file.
I am trying to get the file name 12345678.PDF and date 20221122 from the text content. However, regexp_replace either gives me everything till file name or everything after filename. I am trying to get only file name
1>> Regexp_replace(data, '.+\\', '')
Yields filename and everything after it
2>> Regexp_replace(data, '\[.*', '')
Yields filename and everything after it
If I capture two groups like below I get same result as 1.
Regexp_replace(data, '.+\\|\[', '')
How can I substitute 2 groups and only get filename? Or what is the better way to achieve this? And I need to get the date value but if I can figure this out maybe I will be able to apply the learning for to extract date value. Thanks for your time.
You can use
SELECT REGEXP_MATCHES(
'The file is a file of \\88-77-99-666.abc.example.com\Folder1\Folder2\Folder3\Folder4\20221122\2779780.PDF [9bc8rer55c655f4cb5df763c61862d3fdde9557b0] is the sha1 of the file.',
'([^[:space:]\\/]+)\s+\[([^][]+)') AS Result;
See the DB fiddle, result:
Details:
([^[:space:]\\/]+) - Group 1: one or more chars other than \, / and whitespace
\s+ - one or more whitespaces
\[ - a [ char
([^][]+) - Group 2: one or more chars other than [ and ].

How to search for specific content inside round bracktes and put it into a group?

I want to update round about 3000 sql scripts where I either have this
ALTER TABLE `database_name`.`table_name`
ADD INDEX `index_name` (`column_name` ASC); /* or DESC */
or this
ALTER TABLE `database_name`.`table_name` ADD INDEX `index_name` (`column_name` ASC); /* or DESC */
I'm using Notepad++ and want to replace both cases with a procedure. First of all this is what I have tried so far to find both cases
ALTER TABLE `?(.*)`?.`?(.*)`? ADD INDEX `?(.*)`? \([^\),]+\);
and currently I'm missing the case for multiple lines. I'm not sure where to check for multiple lines, when adding a \n before ADD there will be no hits.
The procedure will always be
CALL my_proc('database_name', 'table_name', 'column_name', 'index_name');
This is what I tried so far to replace it
CALL my_proc($1, $2, ..., $3);
Unfortunately I don't know the correct pattern to use for ... I basically want to say: Only extract the column name from what's inside the round brackets.
How do I have to modify my search regex take care for multiple lines and put the column_name into a fourth group so that I can use this
CALL my_proc($1, $2, $4, $3);
Base on your work, you can try:
ALTER TABLE `?([^.`]+)`?.`?([^.`]+)`?(?: | ?\n| ?\r\n)ADD INDEX `?([^.`]+)`? \(`?([^.`]+)`.*\);
And replace with:
CALL my_proc\(`$1`, `$2`, `$4`, `$3`\);
Explanations on the regex:
`?([^.`]+)`?
Means:
zero or one backtick
a capturing group with anything else than a . or a backtick one or unlimited times
zero or one backtick
Then a . then the same pattern
(?: | ?\n| ?\r\n) means a non capturing group with a space OR zero or one space followed by a Unix new line OR zero or one space followed by a Windows new line
etc.
Explanations on the replace: the parenthesis have to be escaped, and based on what you've done, the backticks must be added manually
Example: https://regex101.com/r/QKPYsX/3

How to select section in regular expression in linux commands

I have these lines that every line begin a word then equal and several sentence so I like select every section. For example:
delete = \account
user\
admin
admin right is good.
add = \
nothing
no out
input output is not good
edit = permission
bob
admin
alice killed bob!!!
I want to select a section for example:
add = \
nothing
no out
input output is not good
I like do it with regular expression.
Your question is a bit vague but you could try the following ...
/\s*(\w+) = ([^=]*\n)*/m
... subject to the requirement that the last section is terminated with \n.
this works by:
'\s*' matching some optional leading whitespace
'(\w+)' capturing the name of the section
' = ' matches the space equals space separator
'([^=]*\n)' it then captures a string that does not include an equals and ends with a newline
'*' and it does that last bit multiple times
The m flag is then required to set multi-line.
See the following to quickly see the groups that are output for each match ...
https://regex101.com/r/oDKSy9/1
(NOTE: The g flag will probably not be required depending on how you use the regex.)
Solution by OP.
I find this solution:
csplit -k fileName '/.*=/' '{*}'
Thanks #haggisandchips

Substitute all non matching characters between certain columns

I'm trying to substitute all non matching characters in a single line between certain columns (after a search).
Example:
The search can be everything
In example below the search = test
The substitute character of non matching characters: empty space.
I want to substitute all characters non part of "test" between columns 10 and 30.
Columns 10 and 30 are indicated with |
before: djd<aj.testjal.kjetestjaja testlala ratesttsuvtesta !<-a-
| |
after: djd<aj.test test testlala ratesttsuvtesta !<-a-
How can I realize this?
Use the following substitution command on that line.
:s/\(test\)\zs\|\%>9v\%<31v./\=submatch(1)!=''?'':' '/g
If the range of columns is specified using visual selection, run
:'<,'>s/\(test\)\zs\|\%V./\=submatch(1)!=''?'':' '/g
One method may be to select the appropiate column range using the Visual mode (control+v)
Once selected, the search and replace can be done using (see this question)
%s/\%Vfoo/bar/g
A regular expression for not test can be found here: Regular expression to match a line that doesn't contain a word?

Find lines matching regex and select a different part of the line

I have two lines like below:
/pace =builtin\administrators Type=0x0 Flags=0x13 AccessMask=0x1f01ff
/pace =domain\user Type=0x0 Flags=0x13 AccessMask=0x1f01ff
Need to create a regular expression where it only select 0x1f01ff where the line have domain\user.
This is what I have created but it select /pace =domain\user Type=0x0 Flags=0x13 AccessMask=:
^(.+domain(.*)accessmask=)
try this:
^.+domain\\user.+AccessMask=([^\s]+)
It matches any line that has domain\user and then get the value of accessmask (any character that is not a whitespace)