How to make Ruby regex expression with some conditional inputs

How to make Ruby regex expression with some conditional inputs - regex

This is my inputs looks like
format 1: 2022-09-23 18:40:45.846 I/getUsers: fetching data
format 2: 11:54:54.619 INFO loadingUsers:23 - visualising: "Entered to dashboard
This is the expression which is working for format one, i want to have the same (making changes to this) to handle both formats
^([0-9-]+ [:0-9.]+)\s(?<level>\w+)[\/+](?<log>.*)
it results as for format 1:
level I
message getUsers: fetching data
for 2nd it should be as
level INFO
message loadingUsers:23 - visualising: "Entered to dashboard
Help would be appreciated, Thanks

You can use
^([0-9-]+ [:0-9.]+|[0-9:.]+)\s(?<level>\w+)[\/+\s]+(?<log>.*)
See the Rubular demo.
Details:
^ - start of a line
([0-9-]+ [:0-9.]+|[0-9:.]+) - Group 1: one or more digits/hyphens, space, one or more digits/colons/dots, or one or more digits/colons/dots
\s - a whitespace
(?<level>\w+) - Group "level": one or more letters, digits or underscores
[\/+\s]+ - one or more slashes, + or whitespaces
(?<log>.*) - Group "log": zero or more chars other than line break chars as many as possible.
If you want to precise your Group 1 pattern (although I consider using a loose pattern fine in these scenarios), you can replace ([0-9-]+ [:0-9.]+|[0-9:.]+) with (\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2}\.\d+|\d{1,2}:\d{1,2}:\d{1,2}\.\d+), see this regex demo.

Related

Remove duplicate lines containing same starting text

So I have a massive list of numbers where all lines contain the same format.
#976B4B|B|0|0
#970000|B|0|1
#974B00|B|0|2
#979700|B|0|3
#4B9700|B|0|4
#009700|B|0|5
#00974B|B|0|6
#009797|B|0|7
#004B97|B|0|8
#000097|B|0|9
#4B0097|B|0|10
#970097|B|0|11
#97004B|B|0|12
#970000|B|0|13
#974B00|B|0|14
#979700|B|0|15
#4B9700|B|0|16
#009700|B|0|17
#00974B|B|0|18
#009797|B|0|19
#004B97|B|0|20
#000097|B|0|21
#4B0097|B|0|22
#970097|B|0|23
#97004B|B|0|24
#2C2C2C|B|0|25
#979797|B|0|26
#676767|B|0|27
#97694A|B|0|28
#020202|B|0|29
#6894B4|B|0|30
#976B4B|B|0|31
#808080|B|1|0
#800000|B|1|1
#803F00|B|1|2
#808000|B|1|3
What I am trying to do is remove all duplicate lines that contain the same hex codes, regardless of the text after it.
Example, in the first line #976B4B|B|0|0 the hex #976B4B shows up in line 32 as #976B4B|B|0|31. I want all lines EXCEPT the first occurrence to be removed.
I have been attempting to use regex to solve this, and found ^(.*)(\r?\n\1)+$ $1 can remove duplicate lines but obviously not what I need. Looking for some guidance and maybe a possibility to learn from this.

You can use the following regex replacement, make sure you click Replace All as many times as necessary, until no match is found:
Find What: ^((#[[:xdigit:]]+)\|.*(?:\R.+)*?)\R\2\|.*
Replace With: $1
See the regex demo and the demo screenshot:
Details:
^ - start of a line
((#[[:xdigit:]]+)\|.*(?:\R.+)*?) - Group 1 ($1, it will be kept):
(#[[:xdigit:]]+) - Group 2: # and one or more hex chars
\| - a | char
.* - the rest of the line
(?:\R.+)*? - any zero or more non-empty lines (if they can be empty, replace .+ with .*)
\R\2\|.* - a line break, Group 2 value, | and the rest of the line.

Removing everything between 2 strings with Google sheets RE2

I'm trying to remove something from a product title as part of a Google sheet
Example Johner Gladstone Pinot Noir 2015, 75CL
Stella Artois Premium Lager Bottle, 1 X 660 Ml
Pepesza Ppsh-40 Vodka Tommy Gun, 1 L
And I want to be able to remove everything from the , and either the CL, ML or L.
The problem I'm running into is that I don't know enough about regex and I'm struggling to find a good place to learn!
What I've tried so far is below
=REGEXREPLACE(A2,"[, ]\QML|CL\E","")
but this doesn't work and I think its because [, ] isn't a valid part.
=REGEXREPLACE(A2,"\*\QML|CL\E","")
because I know that , is the only punctuation in the titles - I've also tried this but not been successful.

What you are trying to get is
(?i), .*?[CM]?L
See the regex demo. Details:
(?i) - case insensitive flag
, .*? - comma, space, and then any zero or more chars other than line break chars, as few as possible (due to *?, if you need as many as possible use * instead)
[CM]?L - C or M (optionally due to ?) and then an L char.
However, you can simply match from a , + space till the end of the line:
", .*
See this regex demo. Here, the first comma+space is matched and then the rest of the string (line, since . does not match line breaks by default).
See the regular expression syntax accepted by RE2.

PostgreSQL: .csv regex - test for repeating substrings within a string (digits)

Introduction:
I have the following scenario in PostgreSQL whereby I want to perform some data validation on a .csv string prior to inserting it into a table (see the fiddle here).
I've managed to get a regex (in a CHECK constraint) which disallows spaces within strings (e.g. "12 34") and also disallows preceding zeros ("00343").
Now, the icing on the cake would be if I could use regular expressions to disallow strings which contain a repeat of an integer - i.e. if a sequence \d+ matched another \d+ within the same string.
Is this beyond the capacities of regular expressions?
My table is as follows:
CREATE TABLE test
(
data TEXT NOT NULL,
CONSTRAINT d_csv_only_ck
CHECK (data ~ '^([ ]*([1-9]\d*)+[ ]*)(,[ ]*([1-9]\d*)+[ ]*)*$')
);
And I can populate it as follows:
INSERT INTO test VALUES
('992,1005,1007,992,456,456,1008'), -- want to make this line unnacceptable - repeats!
('44,1005,1110'),
('13, 44 , 1005, 10078 '), -- acceptable - spaces before and after integers
('11,1203,6666'),
('1,11,99,2222'),
('3435'),
(' 1234 '); -- acceptable
But:
INSERT INTO test VALUES ('23432, 3433 ,00343, 567'); -- leading 0 - unnacceptable
fails (as it should), and also fails (again, as it should)
INSERT INTO test VALUES ('12 34'); -- spaces within numbers - unnacceptable
The question:
However, if you notice the first string, it has repeats of 992and 456.
I would like to be able to match these.
All of these rules do not have to be in the same regex - I can use a second CHECK constraint.
I would like to know if what I am asking is possible using Regular Expressions?
I did find this post which appears to go some (all?) of the way to solving my issue, but I'm afraid it's beyond my skillset to get it to work - I've included a small test at the bottom of the fiddle.
Please let me know should you require any further information.
p.s. as an aside, I'm not very experienced with regexes and I would welcome any input on my basic one above.

Since PostegreSQL regex does not support backreferences, you cannot apply this restriction because you would need a negative lookahead with a backreference in it.
Have a look at this PCRE regex:
^(?!.*\b(\d+)\b.*\b\1\b) *[1-9]\d* *(?:, *[1-9]\d* *)*$
See this regex demo.
Details:
^ - start of string
(?!.*\b(\d+)\b.*\b\1\b) - no same two numbers as whole word allowed anywhere in the string
* - zero or more spaces
[1-9]\d* - a non-zero digit and then any zero or more digits
* - zero or more spaces
(?:, *[1-9]\d* *)* - zero or more occurrences of
, * - comma and zero or more spaces
[1-9]\d* - a non-zero digit and then any zero or more digits
* - zero or more spaces
$ - end of string.
Even if you replace \b with \y (PostgreSQL regex word boundaries) in the PostgreSQL code, it won't work due to the drawback mentioned at the top of the answer.

Regex for SQL Query

Hello together I have the following problem:
I have a long list of SQL queries which I would like to adapt to one of my changes. Finally, I have a renaming problem and I'm afraid I want to solve it more complicated than expected.
The query looks like this:
INSERT member (member, prename, name, street, postalcode, town, tel1, tel2, fax, bem, anrede, salutation, email, name2, name3, association, project) VALUES (2005, N'John', N'Doe', N'Street 4711', N'1234', N'Town', N'1234-5678', N'1234-5678', N'1234-5678', N'Leader', NULL, N'Dear Mr. Doe', N'a#b.com', N'This is the text i want to delete', N'Name2', N'Name3', NULL, NULL);
In the "Insert" there was another column which I removed (which I did simply via Notepad++ by typing the search term - "example, " - and replaced it with an empty field. Only the following entry in Values I can't get out using this method, because the text varies here. So far I have only worked with the text file in which I adjusted the list of queries.
So as you can see there is one more entry in Values than in the insertions (there was another column here, but it was removed by my change).
It is the entry after the email address. I would like to remove this including the comma (N'This is the text i want to delete',).
My idea was to form a group and say that the 14th digit after the comma should be removed. However, even after research I do not know how to realize this.
I thought it could look like this (tried in https://regex101.com/)
VALUES\s?\((,) something here
Is this even the right approach or is there another method? I only knew Regex to solve this problem, because of course the values look different here.
And how can I finally use the regex to get the queries adapted (because the queries are local to my computer and not yet included in the code).
Short summary:
Change the query from
VALUES (... test5, test6, test7 ...)
To
VALUES (... test5, test7 ...)

As per my comment, you could use find/replace, where you search for:
(\bVALUES +\((?:[^,]+,){13})[^,]+,
And replace with $1
See the online demo
( - Open 1st capture group.
\bValues +\( - Match a word-boundary, literally 'VALUES', followed by at least a single space and a literal open paranthesis.
(?: - Open non-capturing group.
[^,]+, - Match anything but a comma at least once followed by a comma.
){13} - Close non-capture group and repeat it 13 times.
) - Close 1st capture group.
[^,]+, - Match anything but a comma at least once followed by a comma.

You may use the following to remove / replace the value you need:
Find What: \bVALUES\s*\((\s*(?:N'[^']*'|\w+))(?:,(?1)){12}\K,(?1)
Replace With: (empty string, or whatever value you need)
See the regex demo
Details
\bVALUES - whole word VALUES
\s* - 0+ whitespaces
\( - a (
(\s*(?:N'[^']*'|\w+)) - Group 1: 0+ whitespaces and then either N' followed with any 0 or more chars other than ' and then a ', or 1+ word chars
(?:,(?1)){12} - twelve repetitions of , followed with the Group 1 pattern
\K - match reset operator that discards the text matched so far from the match memory buffer
, - a comma
(?1) - Group 1 pattern.
Settings screen:

How to delete duplicate numbers in notepad ++?

I've been trying to do use the ^(.*?)$\s+?^(?=.*^\1$) but it doesnt work.
I have this scenario:
9993990487 - 9993990487
9993990553 - 9993990553
9993990554 - 9993990559
9993990570 - 9993990570
9993990593 - 9993990596
9993990594 - 9993990594
And I would want to delete those that are "duplicate" and spect the following:
9993990487
9993990553
9993990554 - 9993990559
9993990570
9993990593 - 9993990596
9993990594
I would really appreciate some help since its 20k+ numbers I have to filter. Or maybe another program, but it's the only one I have available in this PC.
Thanks,
Josue

You may use
^(\d+)\h+-\h+\1$
Replace with $1.
See the regex demo.
Details
^ - start of a line
(\d+) - Group 1: one or more digits
\h+-\h+ - a - char enclosed with 1+ horizontal whitespaces
\1 - an inline backreference to Group 1 value
$ - end of a line.
The replacement is a $1 placeholder that replaces the match with the Group 1 value.
Demo and settings:

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js