How to extract big mgrs using regex - regex

I have an input json:
{"id":12345,"mgrs":"04QFJ1234567890","code":"12345","user":"db3e1a-3c88-4141-bed3-206a"}
I would like to extract with regular expression MGRS of 1000 kilometer, in my example result should be: 04QFJ1267
First 2 symbols always digits, next 3 always chars and the rest always digits. MGRS have a fix length of 15 chars at all.
Is it possible?
Thanks.

All you really need to do is remove characters 8-10 and 13-15. If you want/need to do that using regex, then you could use the replace method with regex: (EDIT Edited to remove the rest of the string).
.*?(\w{7})\d{3}(\d{2})\d+.*
and replacement string:
$1$2
I see now you are using Java. So the relevant code line might look like:
resultString = subjectString.replaceAll(".*?(\\w{7})\\d{3}(\\d{2})\\d+.*", "$1$2");
The above assumes all your strings look like what you showed, and there is no need to test to be sure that "mgrs" is in the string.

Related

Split complex string into mutliple parts using regex

I've tried a lot to split this string into something i can work with, however my experience isn't enough to reach the goal. Tried first 3 pages on google, which helped but still didn't give me an idea how to properly do this:
I have a string which looks like this:
My Dogs,213,220#Gallery,635,210#Screenshot,219,530#Good Morning,412,408#
The result should be:
MyDogs
213,229
Gallery
635,210
Screenshot
219,530
Good Morning
412,408
Anyone have an idea how to use regex to split the string like shown above?
Given the shared patterns, it seems you're looking for a regex like the following:
[A-Za-z ]+|\d+,\d+
It matches two patterns:
[A-Za-z ]+: any combination of letters and spaces
\d+,\d+: any combination of digits + a comma + any combination of digits
Check the demo here.
If you want a more strict regex, you can include the previous pattern between a lookbehind and a lookahead, so that you're sure that every match is preceeded by either a comma, a # or a start/end of string character.
(?<=^|,|#)([A-Za-z ]+|\d+,\d+)(?=,|#|$)
Check the demo here.

How to apply correct regex?

I have a special task which requires lots of regex and javascript parsing.
My head is almost exploding, so maybe I'm tired and forgot some small thing else I'm not newbie to regex so perhaps someone will point me to good direction here and show me where I did mistake.
So I have this regex code:
((?<=\ffmpg=).+(?=////u0026cs=nt))
to get the value of substring between 2 strings. The first string is called:
ffmpg= from this string it should start and it will end just before the other string start called //u0026cs=nt
The problem is that it is working fine until the html page contains only one parameter with the same name; because the source html has inside like 10's of ffmg and the same end string called cs=nt.
I can not even make regex to count the characters because every time you visit the html page the number of characters are different, sometimes +3 else +10. So the only way is to get this sting from the start of param1 to the end of param2.
This is the string I need to get: 1714248%2C23851735%2C23804281%2C23839597%2C23357901%2C3313341%2C3316343%2C23848795%2C3300132%2C26853996%2C3300114%2C3315790%2C23857451%2C23856472%2C23851936%2C3300161%2C3314786%2C23856652%2C23859863%2C23837993%2C23833479%2C23861502%2C23842630%2C23842986%2C23861012
This is the source html example:
\u0026doc=IcuU5Oy8\u0026pen=V9PXaHoOp1gKD25rgAg\u0026ffmpg=1714248%2C23851735%2C23804281%2C23839597%2C23357901%2C3313341%2C3316343%2C23848795%2C3300132%2C26853996%2C3300114%2C3315790%2C23857451%2C23856472%2C23851936%2C3300161%2C3314786%2C23856652%2C23859863%2C23837993%2C23833479%2C23861502%2C23842630%2C23842986%2C23861012\u0026cs=nt\u0026token=gHgig8eLY3qsQ0bXa\\u0026doc=IcuU5Oy8\u0026pen=V9PXaHoOp1gKD25rgAg\u0026ffmpg=1714248%2C23851735%2C23804281%2C23839597%2C23357901%2C3313341%2C3316343%2C23848795%2C3300132%2C26853996%2C3300114%2C3315790%2C23857451%2C23856472%2C23851936%2C3300161%2C3314786%2C23856652%2C23859863%2C23837993%2C23833479%2C23861502%2C23842630%2C23842986%2C23861012\u0026cs=nt\u0026token=gHgig8eLY3qsQ0bXa\\u0026doc=IcuU5Oy8\u0026pen=V9PXaHoOp1gKD25rgAg\u0026ffmpg=1714248%2C23851735%2C23804281%2C23839597%2C23357901%2C3313341%2C3316343%2C23848795%2C3300132%2C26853996%2C3300114%2C3315790%2C23857451%2C23856472%2C23851936%2C3300161%2C3314786%2C23856652%2C23859863%2C23837993%2C23833479%2C23861502%2C23842630%2C23842986%2C23861012\u0026cs=nt\u0026token=gHgig8eLY3qsQ0bXa\
I have copied 3 times the same just for this purpose because it is very big html source and I doubt I can upload it here.
Thanks for your help.
In your questions, you use (?<=\ffmpg=) where \f will match a form feed character which is not present in the data example. If you meant to use \\f it will match \f which is also not present in the example data.
You could get the match using a capturing group instead of using lookarounds as lookbehinds are not widely supported by all browsers.
If you just want to get a single match, you can omit the /g global flag.
If you use .+ you will match too much as the .+ will match until the end of the string and then backtracks until the first time it can match \\u0026cs=nt
What you could do instead is be specific in what you would allow to match which for the current string is a character class with the following characters [AC0-9%]+
You could broaden the character class with a range to match chars A-Z instead of AC for example and add more chars or ranges as required.
ffmpg=([AC0-9%]+)\\\\u0026cs=nt
Regex demo
For example
const regex = /ffmpg=([AC0-9%]+)\\\\u0026cs=nt/;
const str = `\\\\u0026doc=IcuU5Oy8\\\\u0026pen=V9PXaHoOp1gKD25rgAg\\\\u0026ffmpg=1714248%2C23851735%2C23804281%2C23839597%2C23357901%2C3313341%2C3316343%2C23848795%2C3300132%2C26853996%2C3300114%2C3315790%2C23857451%2C23856472%2C23851936%2C3300161%2C3314786%2C23856652%2C23859863%2C23837993%2C23833479%2C23861502%2C23842630%2C23842986%2C23861012\\\\u0026cs=nt\\\\u0026token=gHgig8eLY3qsQ0bXa\\\\\\\\u0026doc=IcuU5Oy8\\\\u0026pen=V9PXaHoOp1gKD25rgAg\\\\u0026ffmpg=1714248%2C23851735%2C23804281%2C23839597%2C23357901%2C3313341%2C3316343%2C23848795%2C3300132%2C26853996%2C3300114%2C3315790%2C23857451%2C23856472%2C23851936%2C3300161%2C3314786%2C23856652%2C23859863%2C23837993%2C23833479%2C23861502%2C23842630%2C23842986%2C23861012\\\\u0026cs=nt\\\\u0026token=gHgig8eLY3qsQ0bXa\\\\\\\\u0026doc=IcuU5Oy8\\\\u0026pen=V9PXaHoOp1gKD25rgAg\\\\u0026ffmpg=1714248%2C23851735%2C23804281%2C23839597%2C23357901%2C3313341%2C3316343%2C23848795%2C3300132%2C26853996%2C3300114%2C3315790%2C23857451%2C23856472%2C23851936%2C3300161%2C3314786%2C23856652%2C23859863%2C23837993%2C23833479%2C23861502%2C23842630%2C23842986%2C23861012\\\\u0026cs=nt\\\\u0026token=gHgig8eLY3qsQ0bXa\\\\`;
console.log(str.match(regex)[1]);
Try this:
(?<=ffmpg=)([A-F0-9%]+)
Explanation
Since your string only consists of url-encoded characters, you can use [A-F0-9%]+character class to capture it. It will stop when next string starts because there will be a backslash.
See online demo here.

Get last characters up to specific character

Lets say I have a string something-123.
I need to get last 5 (or less) characters of it but only up to - if there is one in the string, so the result would be like thing, but if string has no - in it, like something123 then the result would be ng123, and if string is like 123 then the result would be 123.
I know how to mach last 5 characters:
/.{5}$/
I know how to mach everything up to first -:
/[^-]*/
But I can not figure out how to combine them, and to make things worse I need to get the match without extracting it from specific groups and similar advanced regex stuff because I want to use it in SQL Anywhere, please help.
Tank you all for the help, but looks like a complete regex solution is going to be too complicated for my problem, so I did it very simple: SELECT right(regexp_substr('something-123', '[^-]*'), 4).
One option is to group the result:
(.{4})-
Now you have captured the result but without the -.
Or using lookarounds you can:
.{4}(?=-)
which matches any 4 characters that appears before "-".
You can use:
.{5}(?=(?:-[^-]*)?$)
See the regex demo
We match 5 symbols other than a newline only before the last - in the string or at the very end of the string ((?=(?:-[^-]*)?$)). You only need to collect the matches, no need checking groups/submatches.
UPDATE
To match any 1 to 5 characters other than a hyphen before the first hyphen (if present in the string), you can use
([^-]{1,5})(?:(?:-[^-]*)*)?$
See demo. We rely on a lookahead here, that checks if there are -+non-hyphen sequences are after the expected substring.
An faster alternative:
^[^-]*?([^-]{1,5})(?:-|$)
This regex will search for any characters other than - up to 1 to 5 such characters.
Note that here, the value we need is in Group 1.
How about:
(.{5})(?:-[^-]+)?$
The result is in group 1
Try this regex:
(.{1,5})(?:-.*|$)
Group 1 has the result you need
demo

Regex Match terms in between delimiters

I'd say I'm getting the hang at Regex but when it comes to extracting data, I'm lost. Here are the inputs I have to parse through:
Format:
String(String,...String,Integer)
Ex.
Jeff(White,Male,24)
Mark Zuckerberg(Facebook,9)
Grocery(Eggs,Cheese,Pancake,Bread,Milk,Strawberry,0)
I want to match the Strings and Integer, but not the commas or parenthesis.
This one is is a bit easy because the strings don't have symbols in them but the other day I needed to extract the word cake out of something like this:
<Header><Body><font=Tahoma,15pt><b>cake <\b><\font> and whenever I'd try, I'd match the entire statement, not just the cake word, because I'd do like:
.*<b>[a-zA-Z]+<\b>.*. So yeah... the whole concept of using Regex to extract bits of a string is foreign to me. How is it usually done in these two examples?
Try following .
(?<=<b>)\s*\cake\s*(?=<\\b>)
If you want to match word other than cake, try following.
(?<=<b>)\s*\w+\s*(?=<\\b>)
Regex to match string in first part of your Question (String(string, ... ,number))
^\w+\((\w+,)+\d\)$
In the first part of your Question, if you like to match only words and number (Grocery,Eggs, ... ,0) in your string, try following
(?<=^|\(|\,)\w+

Reverse assertion to match every part of string that contains specified word (RegEx)

I would like to match every part of string that is limited by: ^.*|| or ||.*|| or ||.*$ and contains text Alpha.
For example:
BetaAlpha||Omega||AlphaBeta||Alpha||Omega
Right now I have something like this:
(?<=\|\||^)(((?!((?<=\|\||^)*Alpha(\|\||$)*)).)*)(?=$|\|\|)
http://rubular.com/r/yLwFDllJaf
It matches everything except parts with "Alpha". I would like to reverse it, but can't figure it out.
Other values in this string could be dynamic, only part with Alpha occur in every string (but in random combination).
This regex matches with the example you have given through the rubular
(?<=\|\||^)[^|\s]*Alpha[^|\s]*(?=\|\||$)
I would use this regex:
(?<=^|\|\|)[^|]*Alpha[^|]*(?=\|\||$)
Explanation and demonstration here: http://regex101.com/r/fL8sK7
If the input cannot contain single "|" (e.g "Alpha|Beta||Gamma||Delta" with "Alpha|Beta" being a valid match), the previous answers are ok.
If the input CAN contain single "|", then this should work (tested in .NET and rubular):
(?<=\||^)(?!\|)((?!\|\|).)*Alpha((?!\|\|).)*
Tested with "BetaAlpha||Omega||Alpha|Beta||Alpha||Omega||TestAlpha"
Output:
"BetaAlpha", "Alpha|Beta", "Alpha", "TestAlpha"
EDIT: removed unnecessary parenthesis and tested in the rubular.