Regex for any number of alphanumeric phrases between two '.' - regex

I'm having a hard time trying to phrase this question correctly when researching solutions, so I thought I would ask here. I'm trying to validate a field in my UI that a user will enter in a "Java-package" format string. So a correct example would be "com.my.app.class1". However, it needs to be the full package path, so I don't want to accept '*' in the string. I'm trying to find a way to represent this in regex to validate it. My first thought is to split the string into pieces using a . as the delimiter (var splitArray : any[] = packageInput.split('.')), then iterating over the array and check for the correct regex. However, I wanted to know if I could do it all in one regex phrase.

Something as simple as ^\w+(\.\w+)*$ will validate strings of the type you've described, as long as they contain alpha, digits, or _.
It matches all of:
class1
com.my.class1
com.my.app.class1
com.my.app.sub.class1
and doesn't match:
com.my.app.*

Related

Regex Match terms in between delimiters

I'd say I'm getting the hang at Regex but when it comes to extracting data, I'm lost. Here are the inputs I have to parse through:
Format:
String(String,...String,Integer)
Ex.
Jeff(White,Male,24)
Mark Zuckerberg(Facebook,9)
Grocery(Eggs,Cheese,Pancake,Bread,Milk,Strawberry,0)
I want to match the Strings and Integer, but not the commas or parenthesis.
This one is is a bit easy because the strings don't have symbols in them but the other day I needed to extract the word cake out of something like this:
<Header><Body><font=Tahoma,15pt><b>cake <\b><\font> and whenever I'd try, I'd match the entire statement, not just the cake word, because I'd do like:
.*<b>[a-zA-Z]+<\b>.*. So yeah... the whole concept of using Regex to extract bits of a string is foreign to me. How is it usually done in these two examples?
Try following .
(?<=<b>)\s*\cake\s*(?=<\\b>)
If you want to match word other than cake, try following.
(?<=<b>)\s*\w+\s*(?=<\\b>)
Regex to match string in first part of your Question (String(string, ... ,number))
^\w+\((\w+,)+\d\)$
In the first part of your Question, if you like to match only words and number (Grocery,Eggs, ... ,0) in your string, try following
(?<=^|\(|\,)\w+

How to extract big mgrs using regex

I have an input json:
{"id":12345,"mgrs":"04QFJ1234567890","code":"12345","user":"db3e1a-3c88-4141-bed3-206a"}
I would like to extract with regular expression MGRS of 1000 kilometer, in my example result should be: 04QFJ1267
First 2 symbols always digits, next 3 always chars and the rest always digits. MGRS have a fix length of 15 chars at all.
Is it possible?
Thanks.
All you really need to do is remove characters 8-10 and 13-15. If you want/need to do that using regex, then you could use the replace method with regex: (EDIT Edited to remove the rest of the string).
.*?(\w{7})\d{3}(\d{2})\d+.*
and replacement string:
$1$2
I see now you are using Java. So the relevant code line might look like:
resultString = subjectString.replaceAll(".*?(\\w{7})\\d{3}(\\d{2})\\d+.*", "$1$2");
The above assumes all your strings look like what you showed, and there is no need to test to be sure that "mgrs" is in the string.

Create a cell array in matlab

I have a file of tweets that I have read into matlab using dataread and I have stored each line into a 30x1 cell. I was wondering if there was a to take each hashtag out and store them in their own cell and then find the average length of a hashtag? Any help would be greatly appreciated.
You have the right idea, I think, with your regexp call. I will just clarify a few things. If you want the text in every hashtag in the tweet, you would want to use regexp to search for the pound sign (#) and include every character after that, until you reach the end of the word, e.g.
text = '#this #is a #test';
regexpi(lines,'\<#[a-z0-9_]*\>','match');
ans =
'#this' '#is' '#test'
where regexpi is a case-insensitive regexp, and the regex searches for '#' followed by a any number of letters, digits, or underscores (which are, I believe, the valid hashtag characters). The 'match' flag makes the regexp function return the actual matches.
If you don't want the actual hashtag in the final text, you could use regex look-behinds to return only the text. For instance:
regexpi(lines,'\<(?<=#)[a-z0-9_]*\>','match')
ans =
'this' 'is' 'test'
I think, technically, a hashtag must start with a letter, so this regex would return potentially invalid hashtags. It's not difficult to sort that out though.

Looking for a regular expression solution

I'm looking for a regular expression that matches the first two specific fields (variable strings written in Perl). In a file a line without a comment # starts with any character, length unspecific followed by a whitespace and another nonspecific length string followed by a whitespace: name info data1 data2 data3.
The following works for matching the second field only but I want the first two fields to match exactly: /^[^#].*\s$INFO\s/ where $INFO="info". I tried variations of the above to no avail. My first attempt was this: /^[^#]$NAME\s$INFO\s/ which seemed logical to me if $NAME="name" for the above record.
My first attempt was this: /^[^#]$NAME\s$INFO\s/
This won't work because (implied from the question) the character before $NAME is either # or nothing. As such you just need to remove that first [^#]:
/^$NAME\s$INFO\s/
Which will match the string:
"$NAME $INFO <whatever or nothing>"
Although I'm not a regex expert, this may work (I also am not clear on the precise details of the question so I made some assumptions):
'$NAME=name #$INFO=info $DATA=data1 data2 data3'.replace(/#[\S]+/g,'').match(/\$[\S]+/g);
This returns an array. The first 2 elements are the 'fields' i.e. [0]='$NAME=name' AND [1]='$DATA=data1'
Hope that helps at all. And apologies to the gods for my regex.

Is this regex correct to denote only strings with min length of 3 and max length of 6?

Rules for the regex in english:
min length = 3
max length = 6
only letters from ASCII table, non-numeric
My initial attempt:
[A-Za-z]{3-6}
A second attempt
\w{3-6}
This regex will be used to validate input strings from a HTML form (i.e. validating an input field).
A modification to your first one would be more appropriate
\b[A-Za-z]{3,6}\b
The \b mark the word boundaries and avoid matching for example 'abcdef' from 'abcdefgh'. Also note the comma between '3' and '6' instead of '-'.
The problem with your second attempt is that it would include numeric characters as well, has no word boundaries again and the hypen between '3' and '6' is incorrect.
Edit: The regex I suggested is helpful if you are trying to match the words from some text. For validation etc if you want to decide if a string matches your criteria you will have to use
^[A-Za-z]{3,6}$
I don't know which regex engine you are using (this would be useful information in your question), but your initial attempt will match all alphabetic strings longer than three characters. You'll want to include word-boundary markers such as \<[A-Za-z]{3,6}\>.
The markers vary from engine to engine, so consult the documentation for your particular engine (or update your question).
First one should be modified as below
([A-Za-z]{3,6})
Second one will allow numbers, which I think you don't want to?
first one should work, second one will include digits as well, but you want to check non-numeric strings.