PostgreSQL - finding string using regular expression - regex

What I am looking to do is to, within Postgres, search a column for a string (an account number). I have a log table, which has a parameters column that takes in parameters from the application. It is a paragraph of text and one of the parameters stored in the column is the account number.
The position of the account number is not consistent in the text and some rows in this table have nothing in the column (since no parameters are passed on certain screens). The account number has the following format: L1234567899. So for the account number, the first character is a letter and then it is followed by ten digits.
I am looking for a way to extract the account number alone from this column so I can use it in a view for a report.
So far what I have tried is getting it into an array, but since the position changes, I cannot count on it being in the same place.
select foo from regexp_split_to_array(
(select param from log_table where id = 9088), E'\\s+') as foo

You can use regexp_match() to achieve that result.
(regexp_match(foo,'[A-Z][0-9]{10}'))[1]
DBFiddle

Use substring to pull out the match group.
select substring ('column text' from '[A-Z]\d{10}')
Reference: PostgreSQL regular expression capture group in select

Related

Google Sheets ArrayFormula to get INITIALS of arbitrary length name

Sample sheet.
As the title says, given a column of arbitrary number of words of arbitrary length, Want a single ArrayFormula to get the first letters of all words in the said column.
I have tried two methods, seen in sample sheet.
1) Using SPLIT and ARRAYFORMULA, can get it one cell but cannot extend down column.
2) Using 2 REGEXEXTRACT, can get for first 2 initials and extend down
But is it possible to get for arbitrary number of words for whole column using ArrayFormula.
Is it possible to use REGEXEXTRACT to return the first letters of many words?
This replaces every word with the captured first letter
=ARRAYFORMULA(UPPER(REGEXREPLACE(A1:A6,"(\w)\S*\s?","$1")))

OpenRefine : split a cell based on the a string of 5 numbers (postal code)

I am new to OpenRefine and GREL.
In a address row, I am trying to extract the city and the postal code.
The row will typically contains : 12 rue du Paradis 75012 Paris
I'd like to split this row starting from the 5 digit number (75012). After I could easily extract the city.
In the command "Split into several columns", what Regular expression would you put (or is it another command)?
Thanks!
The 'split into several columns' takes a regular expression as an argument to specify the separator to be used when doing the split. This is probably not what you need in this case - since there isn't a common expression for the separator.
Instead you would probably be better using the "Add column based on this column" option and then using a 'match' function to create the new column. The 'match' takes a regular expression as an argument, but allows you to capture the output - so you can use this to do pattern matching in a string. In this case for example you could use something like:
value.match(/.*\s+(\d{5})\s+(.*)/)
This would capture the 5 digit number and the city in an array:
["75012","Paris"]
You could then use this to create the values you want in the new column, or in two new columns. E.g.:
value.match(/.*\s+(\d{5})\s+(.*)/)[0]
will get the number

SPLUNK: extract and rank values from single event

Hi I have a field which has repeated groups of "<id>:<Flag>:<Rank>:<weight>:<quantity>" separated by column.
Example, 2113:X:1:2.92400000:14100.00000:613:X:7:2.92800:96300.00000:1132:L:2:2.92750000:14300.00000
I want to extract the id corresponding to the highest weight. In the above example I would get
Id Weight
613 2.928
from my regex knowledge, I could only work with single repetition of the event. but not more than that

SQLite: How to split a column

I have a column containing two names, which I'd like to extract into two separate columns surname1 and surname2 (I don't need the name nor the initial letter (e.g. N.)).
The exemplary content of that column is:
AwyeEaef2012 MS101 N.Lopez-O.Lorenzi.txt
-Lopez and Lorenzi are these two which we are looking for in this row.
What is good about my situation is that the first name comes always after the first dot (.) and ends just before the dash (-) and the second name comes just after second dot and ends just before the third dot and txt (.txt).
I know how to write a regex and using LIKE check if that column contains some specific surname but not the opposite way- how to read surnames and write them into two new columns.
Several rows from that column look like below:
WyeEaef MN2014 MS401 N.Lopez-O.Lorenzi.txt
AwyufEQ WCH2014 OS401 N.Lorenzi-O.Lopez.txt
THAFa5u WCH2014 LS107 N.Larry-O.Lolly.txt
So the pattern is as I mentioned *.Name1-[A-Z].Name2.txt
Where * is max 30 characters of capital and small letters and numbers
It could be approached in this manner: other words we need to divide this into substrings divided by dots first substring is a waste, the second without two last characters(a dash and acapital letter, e.g. -O) is the first name, the third substring is the second name and the fourth is another waste(a former file format).
I'd like to have an output of three columns:
initialColumn, firstName, secondName
The workaround that I wrote as a formula in Excel which I personally don't love, but might be useful for someone in the future.
=MID(A1;FIND(".";A1;1)+1;FIND(".";A1;FIND(".";A1;1)+1)-FIND(".";A1;1)-3)
I was surprised that Excel can manage processing ~0.5mln of records in the blink of an eye.

extract number from string in Oracle

I am trying to extract a specific text from an Outlook subject line. This is required to calculate turn around time for each order entered in SAP. I have a subject line as below
SO# 3032641559 FW: Attached new PO 4500958640- 13563 TYCO LJ
My final output should be like this: 3032641559
I have been able to do this in MS excel with the formulas like this
=IFERROR(INT(MID([#[Normalized_Subject]],SEARCH(30,[#[Normalized_Subject]]),10)),"Not Found")
in the above formula [#[Normalized_Subject]] is the name of column in which the SO number exists. I have asked to do this in oracle but I am very new to this. Your help on this would be greatly appreciated.
Note: in the above subject line the number 30 is common in every subject line.
The last parameter of REGEXP_SUBSTR() indicates the sub-expression you want to pick. In this case you can't just match 30 then some more numbers as the second set of digits might have a 30. So, it's safer to match the following, where x are more digits.
SO# 30xxxxxx
As a regular expression this becomes:
SO#\s30\d+
where \s indicates a space \d indicates a numeric character and the + that you want to match as many as there are. But, we can use the sub-expression substringing available; in order to do that you need to have sub-expressions; i.e. create groups where you want to split the string:
(SO#\s)(30\d+)
Put this in the function call and you have it:
regexp_substr(str, '(SO#\s)(30\d+)', 1, 1, 'i', 2)
SQL Fiddle