Parse ddMMMyy date string with regex in Scala - regex

I wanted to make a regex such that the following date can be matched and its elements passed to another function:
"21Feb14"
Now the problem is the first two digits. The user can write a date in which the 'day' field is one-digit long OR two-digit long:
"21feb14" and "1jan13"
both are valid inputs.
the regex I made looks like this:
val reg = """(\\d)([a-zA-Z][a-zA-Z][a-zA-Z])(\d\d)""".r
It clearly does not take into consideration that the first digit may or may not exist. How do I handle that?

? marks handles that. Like this,
(\d?\d)([a-zA-Z][a-zA-Z][a-zA-Z])(\d\d)
But I suggest you use following regex
(\d?\d)([a-zA-Z]{3})(\d\d)
Or with posix
(\d?\d)([\p{Alpha}]{3})(\d\d)

This one is far more readable and maintainable
val reg = """(\d{1,2})([a-zA-Z]{3})(\d{2})""".r
Explanations here : http://regex101.com/r/uZ9qI5

Related

Regex pattern for number or three letters

I need to sort a table field with different kind of values:
number from 0 to 999+
group of three letters like AAA, AAB, AAC, AAD, etc.
StupidTable.js enables me to add a custom alphanumeric data type, but i'm not able to define the regex pattern.
I tried this code:
$("table").stupidtable({
"alphanum":function(a,b){
console.log(a,b)
var pattern = "^[a-zA-Z0-9_.-]*$";
var re = new RegExp(pattern);
var aNum = re.exec(a).slice(1);
var bNum = re.exec(b).slice(1);
return parseInt(aNum,10) - parseInt(bNum,10);
}
})
but it doesnt work. You can check the issue on this page clicking on "nr" tab: Test
Try something like this:
const regexPattern = /^[\d\w]{3}/gm;
This pattern allows you to capture a string if it contains only a 3 digit number or a 3 letter code. If you want to capture 0 and not 000, you will need to change {3} with {1,3}, but this will also capture A instead of AAA.
You might also consider normalizing your data in some ways, e.g. converting A to AAA and 0 to 000. This could be helpful for a number of reasons assuming your variable type is a string and not actually a number type. Does that make sense?
You can see how I've created this pattern at the link below, and try some tweaks to make it work well for you. I use this tool a lot and it will also generate some code for you in different languages. Good luck with your project, let me know how it goes.
Regex101.com

Regex to insert space with certain characters but avoid date and time

I made a regex which inserts a space where ever there is any of the characters
-:\*_/;, present for example JET*AIRWAYS\INDIA/858701/IDBI 05/05/05;05:05:05 a/c should beJET* AIRWAYS\ INDIA/ 858701/ IDBI 05/05/05; 05:05:05 a/c
The regex I used is (?!a\/c|w\/d|m\/s|s\/w|m\/o)(\D-|\D:|\D\*|\D_|\D\\|\D\/|\D\;)
I have added some words exceptions like a/c w/d etc. \D conditions given to avoid date/time values getting separated, but this created an issue, the numbers followed by the above mentioned characters never get split.
My requirement is
1. Insert a space after characters -:\*_/;,
2. but date and time should not get split which may have / :
3. need exception on words like a/c w/d
The following is the full code
Private Function formatColon(oldString As String) As String
Dim reg As New RegExp: reg.Global = True: reg.Pattern = "(?!a\/c|w\/d|m\/s|s\/w|m\/o)(\D-|\D:|\D\*|\D_|\D\\|\D\/|\D\;)" '"(\D:|\D/|\D-|^w/d)"
Dim newString As String: newString = reg.Replace(oldString, "$1 ")
formatColon = XtraspaceKill(newString)
End Function
I would use 3 replacements.
Replace all date and time special characters with a special macro that should never be found in your text, e.g. for 05/15/2018 4:06 PM, something based on your name:
05MANUMOHANSLASH15MANUMOHANSLASH2018 4MANUMOHANCOLON06 PM
You can encode exceptions too, like this:
aMANUMOHANSLASHc
Now run your original regex to replace all special characters.
Finally, unreplace the macros MANUMOHANSLASH and MANUMOHANCOLON.
Meanwhile, let me tell you why this is complicated in a single regex.
If trying to do this in a single regex, you have to ask, for each / or :, "Am I a part of a date or time?"
To answer that, you need to use lookahead and lookbehind assertions, the latter of which Microsoft has finally added support for.
But given a /, you don't know if you're between the first and second, or second and third parts of the date. Similar for time.
The number of cases you need to consider will render your regex unmaintainably complex.
So please just use a few separate replacements :-)

How can I match all strings unless it contains a certain string?

So I want to match every string in this list, except the ones that contain the product SKU, which is /s7892632 <---- random string of numbers. I've been trying to do this for quite some time and have been unsuccessful. Any insight would be greatly appreciated.
/account/login?returnurl=/account/forgotpassword
/account/login?returnurl=/account/orders
/account/orders
/account/updateaddress
/account/updateemail
/account/updaterewardscard
/brands/havaianas
/careers
/Category List
/checkout
/checkout/addresses
/checkout/addresses/delivery
/checkout/addresses/deliverymethod
/checkout/affilinetbasket
/checkout/anonymous
/checkout/confirmation
/checkout/express
/checkout/login
/checkout/login?returnurl=/checkout/addresses
/checkout/null
/checkout/payment
/checkout/paypal
/checkout/quickshop/
/checkout/verify
/click-and-collect
/click-and-collect/click-and-collect-overview
/corporate/about-matalan
/corporate/careers
/corporate/cookies
/corporate/history
/customer-services/accessibility
/customer-services/contact
/customer-services/customer-services-home
/customer-services/delivery
/customer-services/faq
/customer-services/fitting-room
/customer-services/here-to-help
/customer-services/size-guides
/delivery
/events/mothers-day
/events/mothers-day/s2516241/tassle-detail-slouch-bag
/events/mothers-day/s2518752/waxed-jacket
/events/mothers-day/s2519237/fabric-buckle-tote-bag
/events/mothers-day/s2521182/heart-print-nightie
/events/mothers-day/s2521184/heart-print-dressing-gown
/events/mothers-day/s2521185/heart-print-pyjama-set
/events/mothers-day/s2521679/structured-tote-bag
/events/mothers-day/s2522143/chiffon-print-dress
/events/mothers-day/s2522347/butterfly-enamel-bowl-32cm-x-8cm
/events/mothers-day/s2526013/animal-print-jersey-blazer
/events/mothers-day/s2527624/croc-tote-bag
/events/mothers-day/s2529731/shift-dress
/events/mothers-day?page=1&size=120&cols=4&sort=&id=/events/mothers-day&priceRange[min]=2&priceRange[max]=59
/events/mothers-day?page=2&size=120&cols=4&sort=&id=/events/mothers-day&priceRange[min]=2&priceRange[max]=59
/events/mothers-day?page=2&size=36&cols=4&sort=&id=/events/mothers-day&priceRange[min]=2&priceRange[max]=59
/events/mothers-day?page=3&size=36&cols=4&sort=&id=/events/mothers-day&priceRange[min]=2&priceRange[max]=59
The following should work:
^(?!.*/s\d{7}/).*
Example: http://regexr.com?343nf
This assumes you have each string as a separate element in a list. If this is actually matching one big string with multiple lines you can use the same regex, but you may need to enable global and multiline options depending on the tool you are using (and make sure dotall/singleline is disabled).
Try this:
boolean noSku = !line.matches(".*/s\\d{5,}.*");
This uses {5,} which allows for any number of digits in the SKU greater than 4 (giving you flexibility with matching). You can change the number to whatever suits.
this matches lines that don't have the code....
^((?!s\d{7}).)*$

Regexp: Keyword followed by value to extract

I had this question a couple of times before, and I still couldn't find a good answer..
In my current problem, I have a console program output (string) that looks like this:
Number of assemblies processed = 1200
Number of assemblies uninstalled = 1197
Number of failures = 3
Now I want to extract those numbers and to check if there were failures. (That's a gacutil.exe output, btw.) In other words, I want to match any number [0-9]+ in the string that is preceded by 'failures = '.
How would I do that? I want to get the number only. Of course I can match the whole thing like /failures = [0-9]+/ .. and then trim the first characters with length("failures = ") or something like that. The point is, I don't want to do that, it's a lame workaround.
Because it's odd; if my pattern-to-match-but-not-into-output ("failures = ") comes after the thing i want to extract ([0-9]+), there is a way to do it:
pattern(?=expression)
To show the absurdity of this, if the whole file was processed backwards, I could use:
[0-9]+(?= = seruliaf)
... so, is there no forward-way? :T
pattern(?=expression) is a regex positive lookahead and what you are looking for is a regex positive lookbehind that goes like this (?<=expression)pattern but this feature is not supported by all flavors of regex. It depends which language you are using.
more infos at regular-expressions.info for comparison of Lookaround feature scroll down 2/3 on this page.
If your console output does actually look like that throughout, try splitting the string on "=" when the word "failure" is found, then get the last element (or the 2nd element). You did not say what your language is, but any decent language with string splitting capability would do the job. For example
gacutil.exe.... | ruby -F"=" -ane "print $F[-1] if /failure/"

Regex to replace string with another string in MS Word?

Can anyone help me with a regex to turn:
filename_author
to
author_filename
I am using MS Word 2003 and am trying to do this with Word's Find-and-Replace. I've tried the use wildcards feature but haven't had any luck.
Am I only going to be able to do it programmatically?
Here is the regex:
([^_]*)_(.*)
And here is a C# example:
using System;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
String test = "filename_author";
String result = Regex.Replace(test, #"([^_]*)_(.*)", "$2_$1");
}
}
Here is a Python example:
from re import sub
test = "filename_author";
result = sub('([^_]*)_(.*)', r'\2_\1', test)
Edit: In order to do this in Microsoft Word using wildcards use this as a search string:
(<*>)_(<*>)
and replace with this:
\2_\1
Also, please see Add power to Word searches with regular expressions for an explanation of the syntax I have used above:
The asterisk (*) returns all the text in the word.
The less than and greater than symbols (< >) mark the start and end
of each word, respectively. They
ensure that the search returns a
single word.
The parentheses and the space between them divide the words into
distinct groups: (first word) (second
word). The parentheses also indicate
the order in which you want search to
evaluate each expression.
Here you go:
s/^([a-zA-Z]+)_([a-zA-Z]+)$/\2_\1/
Depending on the context, that might be a little greedy.
Search pattern:
([^_]+)_(.+)
Replacement pattern:
$2_$1
In .NET you could use ([^_]+)_([^_]+) as the regex and then $2_$1 as the substitution pattern, for this very specific type of case. If you need more than 2 parts it gets a lot more complicated.
Since you're in MS Word, you might try a non-programming approach. Highlight all of the text, select Table -> Convert -> Text to Table. Set the number of columns at 2. Choose Separate Text At, select the Other radio, and enter an _. That will give you a table. Switch the two columns. Then convert the table back to text using the _ again.
Or you could copy the whole thing to Excel, construct a formula to split and rejoin the text and then copy and paste that back to Word. Either would work.
In C# you could also do something like this.
string[] parts = "filename_author".Split('_');
return parts[1] + "_" + parts[0];
You asked about regex of course, but this might be a good alternative.