Mariadb: Regexp_substr not working with non-matching group regular expression - regex

I am using a query to pull a user ID from a column that contains text. This is for a forum system I am using and want to get the User id portion out of a text field that contains the full message. The query I am using is
SELECT REGEXP_SUBSTR(message, '(?:member: )(\d+)'
) AS user_id
from posts
where message like '%quote%';
Now ignoring the fact thats ugly SQL and not final I just need to get to the point where it reads the user ID. The following is an example of the text that you would see in the message column
`QUOTE="Moony, post: 967760, member: 22665"]I'm underwhelmed...[/QUOTE]
Hopefully we aren’t done yet and this is nothing but a depth signing!`
Is there something different about the regular expression when used in mariadb REGEXP_SUBST? this should be PCRE and works within regex testers and should read correctly. It should be looking for the group "member: " and then take the numbers after that and have a single match on all those posts.

This is an ugly hack/workaround that works by using a lookahead for the following "] however will not work if there are multiple quotes in a post
(?<=member: )\.+(?="])

Related

Regular expressions (RegEx) to filter string from URLs in Google Analytics

I want to filter a string from the URLs in Google Analytics. This can be done using the Views > Filter > Exclude using RegEx, but I have been unable to get it to work.
An outline of how these filters are set up, can be found here, however, I can not work out how to isolate the string using RegEx. I believe it will need to be one filter per URL type.
The URLs follow this format:
/software/11F372288FA/pagename
/software/13F412C5FA/pagename/summary
/software/XIL1P0BFXCKM81/pagename2
I need to exclude this part of the URL:
/11F372288FA/
So that the URL data (e.g. Session time) is recorded against:
/software/pagename
/software/pagename/summary
/software/pagename2
I have worked out that I can isolate the string using thing following RegEx
^\/validate\/(..........)\/accounts\/summary$
It is not very elegant and would require a filter for every URL type.
Thanks for the help!
I'm not certain if this will work in your exact case but instead of using regex for this it might be easier to just create a new string from the start to the end of "software" and append everything from pagename to the end. In Java this might look something like:
String newString = oldString.substring(0, 9) + oldString.substring(oldString.indexOf("pagename"));
Take note though that this will only work if the "software" at the start is always the same length and you are actually only excluding things between "software" and "pagename".

How to use Postgres Regex Replace with a capture group

As the title presents above I am trying to reference a capture groups for a regex replace in a postgres query. I have read that the regex_replace does not support using regex capture groups. The regex I am using is
r"(?:[\s\(\)\=\)\,])(username)(?:[\s\(\)\=\)\,])?"gm
The above regex almost does what I need it to but I need to find out how to only allow a match if the capture groups also capture something. There is no situation where a "username" should be matched if it just so happens to be a substring of a word. By ensuring its surrounded by one of the above I can much more confidently ensure its a username.
An example application of the regex would be something like this in postgres (of course I would be doing an update vs a select):
select *, REGEXP_REPLACE(reqcontent,'(?:[\s\(\)\=\)\,])(username)(?:[\s\(\)\=\)\,])?' ,'NEW-VALUE', 'gm') from table where column like '%username%' limit 100;
If there is any more context that can be provided please let me know. I have also found similar posts (postgresql regexp_replace: how to replace captured group with evaluated expression (adding an integer value to capture group)) but that talks more about splicing in values back in and I don't think quite answers my question.
More context and example value(s) for regex work against. The below text may look familiar these are JQL filters in Jira. We are looking to update our usernames and all their occurrences in the table that contains the filter. Below is a few examples of filters. We originally were just doing a find a replace but that doesn't work because we have some usernames that are only two characters and it was matching on non usernames (e.g je (username) would place a new value in where the word project is found which completely malforms the JQL/String resulting in something like proNEW-VALUEct = balh blah)
type = bug AND status not in (Closed, Executed) AND assignee in (test, username)
assignee=username
assignee = username
Definition of Answered:
Regex that will only match on a 'username' if its surrounded by one of the specials
A way to regex/replace that username in a postgres query.
Capturing groups are used to keep the important bits of information matched with a regex.
Use either capturing groups around the string parts you want to stay in the result and use their placeholders in the replacement:
REGEXP_REPLACE(reqcontent,'([\s\(\)\=\)\,])username([\s\(\)\=\)\,])?' ,'\1NEW-VALUE\2', 'gm')
Or use lookarounds:
REGEXP_REPLACE(reqcontent,'(?<=[\s\(\)\=\)\,])(username)(?=[\s\(\)\=\)\,])?' ,'NEW-VALUE', 'gm')
Or, in this case, use word boundaries to ensure you only replace a word when inside special characters:
REGEXP_REPLACE(reqcontent,'\yusername\y' ,'NEW-VALUE', 'g')

How to extract sub-directories from the URL using 'REGEXP_EXTRACT' in Data Studio

I'm trying to extract the product name from the URL between the 2 slashes using REGEXP_EXTRACT. For example, I want to extraxt ace-5 from the URLs below:
www.abc.com/products/phones/ace-5/
www.abc.com/products/phones/ace-5/?cid=dm66363&bid
www.abc.com/products/phones/ace-5/?fbclid=iwar30dpnmmpwppnla7
www.abc.com/products/phones/ace-5/?et_cid=em_367029&et_rid=130
I have a RegEx to extract the Domain Name but it is not something I'm actually looking for. Below is the RegEx:
REGEXP_EXTRACT(page,'^[^.]+.([^.]+)')
It gives the following result: abc
Assuming that the product name would always be the fixed fourth path element, we can try:
REGEXP_EXTRACT(page, '(?:[^\/]+\/){3}([^\/]+).*')
or, if the above would not work:
REGEXP_EXTRACT(page, '[^\/]+\/[^\/]+\/[^\/]+\/([^\/]+).*')
Here is a demo for the above:
Demo
Since I do not have the Same Page with my GDS, but I tried to recreate with my set of data source i.e pages from the google analytics.
Use may use the below which will get you all the records after two slash as per your requirement.
REGEXP_EXTRACT(Page,'[^/]+/[^/]+/([^/]+)')
You need to create a calculated column with this formula, once you have created this calculated column you might need to add an additional filter to remove those with the null value.
example Page: "/products/phones/ace-5/"
The Calculated Column value will be "ace-5"
Just make sure this regex will only give you the extracted word after phones/, if you do not have any record after that it will give you null in return.
The REGEXP_EXTRACT Calculated Field below does the trick, extracting all characters after the 3rd / till the next instance of /:
REGEXP_EXTRACT(Page, "^(?:[^/]+/){3}([^/]+)")
Google Data Studio Report and a GIF to elaborate

How do I use regex to return text following specific prefixes?

I'm using an application called Firemon which uses regex to pull text out of various fields. I'm unsure what specific version of regex it uses, I can't find a reference to this in the documentation.
My raw text will always be in the following format:
CM: 12345
APP: App Name
BZU: Dept Name
REQ: First Last
JST: Text text text text.
CM will always be an integer, JST will be sentence that may span multiple lines, and the other fields will be strings that consist of 1-2 words - and there's always a return after each section.
The application, Firemon, has me create a regex entry for each field. Something simple that looks for each prefix and then a return should work, because I return after each value. I've tried several variations, such as "BZU:\s*(.*)", but can't seem to find something that works.
EDIT: To be clear I'm trying to get the value after each prefix. Firemon has a section for each field. "APP" for example is a field. I need a regex example to find "APP:" and return the text after it. So something as simple as regex that identifies "APP:", and grabs everything after the : and before the return would probably work.
You can use (?=\w+ )(.*)
Positive lookahead will remove prefix and space character from match groups and you will in each match get text after space.
I am a little late to the game, but maybe this is still an issue.
In the more recent versions of FireMon, sample regexes are provided. For instance:
jst:\s*([^;]?)\s;
will match on:
jst:anything in here;
and result in
anything in here

How do I extract a postcode from one column in SSIS using regular expression

I'm trying to use a custom regex clean transformation (information found here ) to extract a post code from a mixed address column (Address3) and move it to a new column (Post Code)
Example of incoming data:
Address3: "London W12 9LZ"
Incoming data could be any combination of place names with a post code at the start, middle or end (or not at all).
Desired outcome:
Address3: "London"
Post Code: "W12 9LZ"
Essentially, in plain english, "move (not copy) any post code found from address3 into Post Code".
My regex skills aren't brilliant but I've managed to get as far as extracting the post code and getting it into its own column using the following regex, matching from Address3 and replacing into Post Code:
Match Expression:
(?<stringOUT>([A-PR-UWYZa-pr-uwyz]([0-9]{1,2}|([A-HK-Ya-hk-y][0-9]|[A-HK-Ya-hk-y][0-9] ([0-9]|[ABEHMNPRV-Yabehmnprv-y]))|[0-9][A-HJKS-UWa-hjks-uw])\ {0,1}[0-9][ABD-HJLNP-UW-Zabd-hjlnp-uw-z]{2}|([Gg][Ii][Rr]\ 0[Aa][Aa])|([Ss][Aa][Nn]\ {0,1}[Tt][Aa]1)|([Bb][Ff][Pp][Oo]\ {0,1}([Cc]\/[Oo]\ )?[0-9]{1,4})|(([Aa][Ss][Cc][Nn]|[Bb][Bb][Nn][Dd]|[BFSbfs][Ii][Qq][Qq]|[Pp][Cc][Rr][Nn]|[Ss][Tt][Hh][Ll]|[Tt][Dd][Cc][Uu]|[Tt][Kk][Cc][Aa])\ {0,1}1[Zz][Zz])))
Replace Expression:
${stringOUT}
So this leaves me with:
Address3: "London W12 9LZ"
Post Code: "W12 9LZ"
My next thought is to keep the above match/replace, then add another to match anything that doesn't match the above regex. I think it might be a negative lookahead but I can't seem to make it work.
I'm using SSIS 2008 R2 and I think the regex clean transformation uses .net regex implementation.
Thanks.
Just solved this. As usual, it was simpler logic than I thought it should be. Instead of trying to match the non-post code strings and replace them with themselves, I have added another line matching the postcode again and replacing it with "".
So in total, I have:
Match the post code using the above regex and move it to the Post Code column
Match the post code using the above regex and replace it with "" in the Address3 column