How to make a regular expression looking for a list of extensions separated by a space - regex

I want to be able to take a string of text from the user that should be formated like this:
.ext1 .ext2 .ext3 ...
Basically, I am looking for a dot, a string of alphanumeric characters of any length a space, and rinse and repeat. I am a little confused on how to say " i need a period, string of characters and a space". But also, the last extension could either be followed by nothing, or a space, or a series of spaces. Also, I guess in between extensions could be followed by any number of spaces?
EDIT: I made it clearer what I was looking for.
Thanks!

Try this:
^(?:\.[A-Za-z0-9]+ +)*\.[A-Za-z0-9]+ *$
(Rubular)
In a Java string literal you need to escape the backslashes:
"^(?:\\.[A-Za-z0-9]+ +)*\\.[A-Za-z0-9]+ *$"

(\.\w+)\s* Match this and get your results.
^((\.\w+)\s*)*$ Check this and if it's true, your String is exactly what you want.
For the last pattern thing, you can't (AFAIK) do both getting all extensions (separated) and checking that the last is followed by other things. Either you check your string, or you extract the extensions from it.

I'd start with something like: ^.[a-z0-9]+([\t\n\v ]+.[a-z0-9]+)*$

Related

Regular expression to check strings containing a set of words separated by a delimiter

As the title says, I'm trying to build up a regular expression that can recognize strings with this format:
word!!cat!!DOG!! ... Phone!!home!!
where !! is used as a delimiter. Each word must have a length between 1 and 5 characters. Empty words are not allowed, i.e. no strings like !!,!!!! etc.
A word can only contain alphabetical characters between a and z (case insensitive). After each word I expect to find the special delimiter !!.
I came up with the solution below but since I need to add other controls (e.g. words can contain spaces) I would like to know if I'm on the right way.
(([a-zA-Z]{1,5})([!]{2}))+
Also note that empty strings are not allowed, hence the use of +
Help and advices are very welcome since I just started learning how to build regular expressions. I run some tests using http://regexr.com/ and it seems to be okay but I want to be sure. Thank you!
Examples that shouldn't match:
a!!b!!aaaaaa!!
a123!!b!!c!!
aAaa!!bbb
aAaa!!bbb!
Splitting the string and using the values between the !!
It depends on what you want to do with the regular expression. If you want to match the values between the !!, here are two ways:
Matching with groups
([^!]+)!!
[^!]+ requires at least 1 character other than !
!! instead of [!]{2} because it is the same but much more readable
Matching with lookahead
If you only want to match the actual word (and not the two !), you can do this by using a positive lookahead:
[^!]+(?=!!)
(?=) is a positive lookahead. It requires everything inside, i.e. here !!, to be directly after the previous match. It however won't be in the resulting match.
Here is a live example.
Validating the string
If you however want to check the validity of the whole string, then you need something like this:
^([^!]+!!)+$
^ start of the string
$ end of the string
It requires the whole string to contain only ([^!]+!!) one or more than one times.
If [^!] does not fit your requirements, you can of course replace it with [a-zA-Z] or similar.

Regex exact length of whole string

I want to match a string of exact 3 length. I am using the following regex
("\\d?[A-Za-z]{2,3}\d?")
Here the string can have 1 digit either at start or at end of the string, or the string can have 3 letters.Is there any way to define length of the matching string like :
("(\\d?[A-Za-z]{2,3}\d?){3}") // it does not work
I have another solution of it.
("(\\d[A-Za-z]{2})|([A-Za-z]{2}\\d)|([A-Za-z]{3})")
But I just want to know if there is any way to define length of whole matching string.
^.{3}$
If this isn't really your answer you need to specify it better. You have zero solutions not several. What exactly are you trying to match. Give a couple examples.
http://www.regexplanet.com/advanced/java/index.html
^(\d[a-zA-Z]{2}|[a-zA-Z]{2}\d|[a-zA-Z]{3})$
If you want that letters and numbers thing.
If you want the extra stuff at the end to be possible without the string being over you can just look for the space afterwards.
^(\d[a-zA-Z]{2}|[a-zA-Z]{2}\d|[a-zA-Z]{3})\s
From the comments:
So it's
^[^\s]{3}\s\d{7}\s.\d{6}
? -- '^' start of line, '[^\s]' not a space. '{3}' three of those. '\s' a space. '\d' a digit. '{7}' seven of those. '\s' a space. '.' some character. '\d' a digit. '{6}' of those.
Regex is basically just programmatically a way of describing what you're looking for. If you can properly form the question of what you want to match it's easy to write that directly in regex.
Your three solutions will match also longer strings. I suggest you to use word boundary (\b) or line boundary (^ and $):
\b([a-zA-Z]{2}\d|\d[a-zA-Z]{2}|[a-zA-Z]{3})\b
or
^([a-zA-Z]{2}\d|\d[a-zA-Z]{2}|[a-zA-Z]{3})$
based on the specific usage.
EDIT: fixed the regex, matching also 3 digits.

REGEX for any file extension

I am trying to build a regex to tell if a string is a valid file extension. It could be any extentions.
hello no
.hello Yes
..hello No
hello.world No
.hello.world No
.hello world No
I have tried ^\. and ^\.[\.] but can't get what i am looking for. This seems like it should be simple.
^\.[^.]+$
This means start with . and then anything other than dot (.)
You can also use this one if you want to have only aplhanumeric.:
^\.[a-zA-Z0-9]+$
Try this regex:
^\.[\w]+$
Matches a string starting with a ".", followed by one or more "word" character(s), until the end of the string.
Try this regex, which matches all strings starting with a dot followed by at least one other character:
^\.[^.]+$
If you already have a string like ".hello" with the extension and you're just testing it to see if it matches then you can try something like ^\.[^\\/:*?"<>|\s.]{1,255}$. It works with all of your example cases.
The beginning ^\. means the whole string must start with a literal dot "."
The [^\\/:*?"<>|\s.] means that after the dot you can have any character except a backslash, forward slash, colon, asterisk, question mark, double quotation mark, less than or greater than symbol, vertical bar, whitespace character, or dot. Feel free to add whatever other characters you'd like to disallow inside of the square brackets after the carrot or delete any characters that I added that you wish to allow.
(Note: the allowable characters for filenames/extensions depends on the file system.)
The {1,255}$ at the end quantifies the amount of allowable characters that we just defined all the way until the end of the string. So anything that's after the dot and allowed can be between 1 and 255 characters long and it must go on until the end of the string. Feel free to change the 255 to any number that you like.
(Note: the maximum length for filenames/extensions depends on the file system.)
If you are searching a string like "https://sub.example.com/directory1/directory2/file.php" for the file extension you should instead use \.[^\\/:*?"<>|\s.]{1,255}$ to search for the final extension including the dot.
This works for me in javascript
^[.][a-zA-Z0-9.,$;]+$
I use:
(?:.*\\)+([^\\]+)
for Windows for it produces short filename with extension.
If you are looking for a regex to get the file extension from a filename, here it is
(?<=\.)[^.\s]+$

Regex all characters except string

I want to select all space characters except those preceded by the string, Send,.
A look-ahead using (?!) will not work. What is another way to do this?
Sounds like look behind should suffice. If the string Send, immediately precedes the the space you want then it would be:
(?<!Send,)\s
If the string doesn't come directly before the space then your options could depend a bit on your particular regex flavour, since many do not support variable length look behinds.

Simple regex for matching up to an optional character?

I'm sure this is a simple question for someone at ease with regular expressions:
I need to match everything up until the character #
I don't want the string following the # character, just the stuff before it, and the character itself should not be matched. This is the most important part, and what I'm mainly asking. As a second question, I would also like to know how to match the rest, after the # character. But not in the same expression, because I will need that in another context.
Here's an example string:
topics/install.xml#id_install
I want only topics/install.xml. And for the second question (separate expression) I want id_install
First expression:
^([^#]*)
Second expression:
#(.*)$
[a-zA-Z0-9]*[\#]
If your string contains any other special characters you need to add them into the first square bracket escaped.
I don't use C#, but i will assume that it uses pcre... if so,
"([^#]*)#.*"
with a call to 'match'. A call to 'search' does not need the trailing ".*"
The parens define the 'keep group'; the [^#] means any character that is not a '#'
You probably tried something like
"(.*)#.*"
and found that it fails when multiple '#' signs are present (keeping the leading '#'s)?
That is because ".*" is greedy, and will match as much as it can.
Your matcher should have a method that looks something like 'group(...)'. Most matchers
return the entire matched sequence as group(0), the first paren-matched group as group(1),
and so forth.
PCRE is so important i strongly encourage you to search for it on google, learn it, and always have it in your programming toolkit.
Use look ahead and look behind:
To get all characters up to, but not including the pound (#): .*?(?=\#)
To get all characters following, but not including the pound (#): (?<=\#).*
If you don't mind using groups, you can do it all in one shot:
(.*?)\#(.*) Your answers will be in group(1) and group(2). Notice the non-greedy construct, *?, which will attempt to match as little as possible instead of as much as possible.
If you want to allow for missing # section, use ([^\#]*)(?:\#(.*))?. It uses a non-collecting group to test the second half, and if it finds it, returns everything after the pound.
Honestly though, for you situation, it is probably easier to use the Split method provided in String.
More on lookahead and lookbehind
first:
/[^\#]*(?=\#)/ edit: is faster than /.*?(?=\#)/
second:
/(?<=\#).*/
For something like this in C# I would usually skip the regular expressions stuff altogether and do something like:
string[] split = exampleString.Split('#');
string firstString = split[0];
string secondString = split[1];