I am currently trying to figure out, how to find all columns of a table within an SQL statement using Regex in notepad++.
Lets take this query:
select
a.id,
a.id || a.name,
a.age,
b.id
From a,b
Now, I wat to retrieve all columns for a using regex - the problem the query itself is much larger and I do not want to have to go through the whole query.
The desired result is:
id
name
age
I already figured out that with
(?<=a\.)(\S+)
I match the desired strings, but Notepad++ still returns the whole lines and not only the words I need.
Can anyone help me here?
You may use this 2 step approach to extract values after a.:
Find: \ba\.(\w+)|(?s:.)
Replace With: (?1$1\n:)
Then, you need to remove duplicate lines to get the expected results.
Details
\ba\. - a a. substring as a whole word
(\w+) - Group 1: one or more word chars (the group value will be kept + an LF will be appended in the replacement pattern)
| - or
(?s:.) - any char (it will be removed).
The (?1$1\n:) replacement means that the Group 1 value will be output and a line ending LF symbol will be appended to the result if Group 1 matches, else, empty string will be used as a replacement.
Maybe "matching non greedy" using "?" and looking for word boundaries can help? The expression would look like this (add a ? in the last bracket):
(?<=a\.)(\S+?\b)
This just came into my mind as I read the question, didn't check it on functionality.
More information on non-greedy modifier can be found here.
Related
Regex checks wouldn't be a strong point of mine. This is trivial but after playing around with it for 15 minutes already I think it would be quicker posting here. Ultimately I want to filter out any results of a table where a certain text column value ends with S(01 -99), i.e. the letter S followed by 2 digits. Consider the following test query
select x.* from (
select
unnest(array['kjkjkj','jhjs01','kjkj11','kjhkjh','uusus','iiosis99']::text[])
as tests ) x
where RIGHT(x.tests,3) !~ 'S[0-9]{1,2}$'
This returns everything in the unnested array, whereas I'm hoping to return everything excluding the second and last values. Any pointers in the right direction would be much appreciated. I'm using PostgreSQL v11.9
You may actually use SIMILAR TO here since your pattern is not that complex:
SELECT * FROM table
WHERE column_name NOT SIMILAR TO '%S[0-9]{2}'
SIMILAR TO patterns require a full string match, so here, % matches any text from the start of the string, then S matches S and [0-9]{2} matches two digits that must be at the end of the string.
If you were to use a regex, you could use
WHERE column_name !~ 'S[0-9]{2}$'
Or, 'S[0-9]{1,2}$' if there can be one or two digits. Since the regex search in PostgreSQL does not require a full string match, it just matches S, two (or one or two with {1,2}) digits at the end of string ($).
Hello together I have the following problem:
I have a long list of SQL queries which I would like to adapt to one of my changes. Finally, I have a renaming problem and I'm afraid I want to solve it more complicated than expected.
The query looks like this:
INSERT member (member, prename, name, street, postalcode, town, tel1, tel2, fax, bem, anrede, salutation, email, name2, name3, association, project) VALUES (2005, N'John', N'Doe', N'Street 4711', N'1234', N'Town', N'1234-5678', N'1234-5678', N'1234-5678', N'Leader', NULL, N'Dear Mr. Doe', N'a#b.com', N'This is the text i want to delete', N'Name2', N'Name3', NULL, NULL);
In the "Insert" there was another column which I removed (which I did simply via Notepad++ by typing the search term - "example, " - and replaced it with an empty field. Only the following entry in Values I can't get out using this method, because the text varies here. So far I have only worked with the text file in which I adjusted the list of queries.
So as you can see there is one more entry in Values than in the insertions (there was another column here, but it was removed by my change).
It is the entry after the email address. I would like to remove this including the comma (N'This is the text i want to delete',).
My idea was to form a group and say that the 14th digit after the comma should be removed. However, even after research I do not know how to realize this.
I thought it could look like this (tried in https://regex101.com/)
VALUES\s?\((,) something here
Is this even the right approach or is there another method? I only knew Regex to solve this problem, because of course the values look different here.
And how can I finally use the regex to get the queries adapted (because the queries are local to my computer and not yet included in the code).
Short summary:
Change the query from
VALUES (... test5, test6, test7 ...)
To
VALUES (... test5, test7 ...)
As per my comment, you could use find/replace, where you search for:
(\bVALUES +\((?:[^,]+,){13})[^,]+,
And replace with $1
See the online demo
( - Open 1st capture group.
\bValues +\( - Match a word-boundary, literally 'VALUES', followed by at least a single space and a literal open paranthesis.
(?: - Open non-capturing group.
[^,]+, - Match anything but a comma at least once followed by a comma.
){13} - Close non-capture group and repeat it 13 times.
) - Close 1st capture group.
[^,]+, - Match anything but a comma at least once followed by a comma.
You may use the following to remove / replace the value you need:
Find What: \bVALUES\s*\((\s*(?:N'[^']*'|\w+))(?:,(?1)){12}\K,(?1)
Replace With: (empty string, or whatever value you need)
See the regex demo
Details
\bVALUES - whole word VALUES
\s* - 0+ whitespaces
\( - a (
(\s*(?:N'[^']*'|\w+)) - Group 1: 0+ whitespaces and then either N' followed with any 0 or more chars other than ' and then a ', or 1+ word chars
(?:,(?1)){12} - twelve repetitions of , followed with the Group 1 pattern
\K - match reset operator that discards the text matched so far from the match memory buffer
, - a comma
(?1) - Group 1 pattern.
Settings screen:
Need to match everything after the first / and until the 2nd / or end of string. Given the following examples:
/US
/CA
/DE/Special1
/FR/Special 1/special2
Need the following returned:
US
CA
DE
FR
Was using this in DataStudio which worked:
^(.+?)/
However the same in BigQuery is just returning null. After trying dozens of other examples here, decided to ask myself. Thanks for your help.
For such simple extraction - consider alternative of using cheaper string functions instead of more expensive regexp functions. See an example below
#standardSQL
WITH `project.dataset.table` AS (
SELECT '/US' line UNION ALL
SELECT '/CA' UNION ALL
SELECT '/DE/Special1' UNION ALL
SELECT '/FR/Special 1/special2'
)
SELECT line, SPLIT(line, '/')[SAFE_OFFSET(1)] value
FROM `project.dataset.table`
with result
Row line value
1 /US US
2 /CA CA
3 /DE/Special1 DE
4 /FR/Special 1/special2 FR
Your regex matches any 1 or more chars as few as possible at the start of a string (up to the first slash) and puts this value in Group 1. Then it consumes a / char. It does not actually match what you need.
You can use a regex in BigQuery that matches a string partially and capture the part you need to get as a result:
/([^/]+)
It will match the first occurrence of a slash followed with one or more chars other than a slash placing the captured substring in the result you get.
I am trying to use the value.match command in OpenRefine 2.6 for splitting two columns based on a 4 number date.
A sample of the text is:
"first sentence, second sentence, third sentences, 2009"
What I do is going to "Add column based on this column" and insert
value.match(\d{4})
but I get the error
Parsing error at offset 12: Missing number, string, identifier, regex,
or parenthesized expression
any idea of the possible solution?
You need to fix 3 things to get this working:
1) As Wiktor says you need to start & end the regular expression with a forward slash /
2) The 'match' function requires you to match the whole string in the cell, not just the fragment you need - so your regular expression needs to match the whole string
3) To extract part of a string with 'match' you need to have capture groups in your regular expression- that is use ( ) around the bit of the regular expression you want to extract. The captured values will be put in an array and you will need to get the string out of tge array to store it in a cell
So you'll need something like:
value.match(/.*(\d{4})/)[0]
To get the four digit year from the end of the string
I have a spreadsheet which i'm using for importing prices using IMPORTHTML.
The import result contains the prices with text.
I'm using REGEXEXTRACT to get the price only.
The problem is that the extraction is not equal to same value in other cell.
For exmaple:
The import result is:
$58.00 & FREE Shipping. Details
in cell A1 - using REGEXEXTRACTwith regular_expression "[0-9][0-9].[0-9][0-9]" the result is 58.00
in cell A2 - i typed 58.00
trying to compare the two (using IF(A1=A2...) will fail.
Any idea why and how to fix it?
Thanks
You may use the following regex extraction:
REGEXEXTRACT(<CELL>, "^\W*([\d.]+)")
See the regex demo
The "^\W*([\d.]+)" means:
^ - start of string
\W* - zero or more non-word chars (non letters, digits, underscores)
([\d.]+) - Group 1: one or more digits or dots.
As per Rubén's details, you need to cast the string value extracted with the REGEXEXTRACT to the actual value of the extracted text with =VALUE.
Formula
Try
=VALUE(REGEXTRACT(A1,"[0-9][0-9].[0-9][0-9]")=A2
Explanation
REGEXEXTRACT always returns a text value. If you type 58.00 it's very likely that it's was identified as a number.
The answer for this is:
=VALUE(REGEXTRACT(<CELL1>,"^\W*([\d.]+)")
and after that using:
IF(A1=A2...)