Extract text before _ in a string using regex

Extract text before _ in a string using regex - regex

I have some large number of strings which starts like DD_filename.
How can I extract the characters before _ using regular expression.
I tried learning using from here and in that it is given a.b will retrieve characters starting from a and ending on b
I tried similarly ^._ but it is not working for me.

^._ will only match one character before _. Try this pattern:
^.*?(?=_)

Starting from the beginning of the string, capture all non-underscore characters:
"^[^_]*"
The first ^ (caret) character means that the match starts from the beginning of the string. The brackets allow you to define a set of possible characters (character class). The second ^ character means "not". So the character class is "not underscore". The star means "zero or more". So in plain English: "match from the start of the string zero or more non underscore characters".

You can try something like
.*?(?=_)
. matches any character and *? is a reluctant quantifier. (?=_) is a positive lookahead to ensure our match is followed by an _.
If you want to only extract characters that occur at the beginning of a string you can add the ^ anchor: ^.*?(?=_). ^ matches the position before the first character in the string.

Just capture all characters that are not an underscore:
"[^_]*"

Regular Expression to get all characters before "-"
Check out #stema's answer. He gives four ways to do this, but the first is probably the best.
Match result = Regex.Match(text, #"^.*?(?=-)");
Console.WriteLine(result);

Related

Removing last character from a line using regex

I just started learning regex and I'm trying to understand how it possible to do the following:
If I have:
helmut_rankl:20Suzuki12
helmut1195:wasserfall1974
helmut1951:roller11
Get:
helmut_rankl:20Suzuki1
helmut1195:wasserfall197
helmut1951:roller1
I tried using .$ which actually match the last character of a string, but it doesn't match letters and numbers.
How do I get these results from the input?

You could match the whole line, and assert a single char to the right if you want to match at least a single character.
.+(?=.)
Regex demo
If you also want to match empty strings:
.*(?=.)

This will do what you want with regex's match function.
^(.*).$
Broken down:
^ matches the start of the string
( and ) denote a capturing group. The matches which fall within it are returned.
.* matches everything, as much as it can.
The final . matches any single character (i.e. the last character of the line)
$ matches the end of the line/input

What is the way to combine two regexes? [duplicate]

I want to design an expression for not allowing whitespace at the beginning and at the end of a string, but allowing in the middle of the string.
The regex I've tried is this:
\^[^\s][a-z\sA-Z\s0-9\s-()][^\s$]\

This should work:
^[^\s]+(\s+[^\s]+)*$
If you want to include character restrictions:
^[-a-zA-Z0-9-()]+(\s+[-a-zA-Z0-9-()]+)*$
Explanation:
the starting ^ and ending $ denotes the string.
considering the first regex I gave, [^\s]+ means at least one not whitespace and \s+ means at least one white space. Note also that parentheses () groups together the second and third fragments and * at the end means zero or more of this group.
So, if you take a look, the expression is: begins with at least one non whitespace and ends with any number of groups of at least one whitespace followed by at least one non whitespace.
For example if the input is 'A' then it matches, because it matches with the begins with at least one non whitespace condition. The input 'AA' matches for the same reason. The input 'A A' matches also because the first A matches for the at least one not whitespace condition, then the ' A' matches for the any number of groups of at least one whitespace followed by at least one non whitespace.
' A' does not match because the begins with at least one non whitespace condition is not satisfied. 'A ' does not matches because the ends with any number of groups of at least one whitespace followed by at least one non whitespace condition is not satisfied.
If you want to restrict which characters to accept at the beginning and end, see the second regex. I have allowed a-z, A-Z, 0-9 and () at beginning and end. Only these are allowed.
Regex playground: http://www.regexr.com/

This RegEx will allow neither white-space at the beginning nor at the end of your string/word.
^[^\s].+[^\s]$
Any string that doesn't begin or end with a white-space will be matched.
Explanation:
^ denotes the beginning of the string.
\s denotes white-spaces and so [^\s] denotes NOT white-space. You could alternatively use \S to denote the same.
. denotes any character expect line break.
+ is a quantifier which denote - one or more times. That means, the character which + follows can be repeated on or more times.
You can use this as RegEx cheat sheet.

In cases when you have a specific pattern, say, ^[a-zA-Z0-9\s()-]+$, that you want to adjust so that spaces at the start and end were not allowed, you may use lookaheads anchored at the pattern start:
^(?!\s)(?![\s\S]*\s$)[a-zA-Z0-9\s()-]+$
^^^^^^^^^^^^^^^^^^^^
Here,
(?!\s) - a negative lookahead that fails the match if (since it is after ^) immediately at the start of string there is a whitespace char
(?![\s\S]*\s$) - a negative lookahead that fails the match if, (since it is also executed after ^, the previous pattern is a lookaround that is not a consuming pattern) immediately at the start of string, there are any 0+ chars as many as possible ([\s\S]*, equal to [^]*) followed with a whitespace char at the end of string ($).
In JS, you may use the following equivalent regex declarations:
var regex = /^(?!\s)(?![\s\S]*\s$)[a-zA-Z0-9\s()-]+$/
var regex = /^(?!\s)(?![^]*\s$)[a-zA-Z0-9\s()-]+$/
var regex = new RegExp("^(?!\\s)(?![^]*\\s$)[a-zA-Z0-9\\s()-]+$")
var regex = new RegExp(String.raw`^(?!\s)(?![^]*\s$)[a-zA-Z0-9\s()-]+$`)
If you know there are no linebreaks, [\s\S] and [^] may be replaced with .:
var regex = /^(?!\s)(?!.*\s$)[a-zA-Z0-9\s()-]+$/
See the regex demo.
JS demo:
var strs = ['a b c', ' a b b', 'a b c '];
var regex = /^(?!\s)(?![\s\S]*\s$)[a-zA-Z0-9\s()-]+$/;
for (var i=0; i<strs.length; i++){
console.log('"',strs[i], '"=>', regex.test(strs[i]))
}

if the string must be at least 1 character long, if newlines are allowed in the middle together with any other characters and the first+last character can really be anyhing except whitespace (including ##$!...), then you are looking for:
^\S$|^\S[\s\S]*\S$
explanation and unit tests: https://regex101.com/r/uT8zU0

This worked for me:
^[^\s].+[a-zA-Z]+[a-zA-Z]+$
Hope it helps.

How about:
^\S.+\S$
This will match any string that doesn't begin or end with any kind of space.

^[^\s].+[^\s]$
That's it!!!! it allows any string that contains any caracter (a part from \n) without whitespace at the beginning or end; in case you want \n in the middle there is an option s that you have to replace .+ by [.\n]+

pattern="^[^\s]+[-a-zA-Z\s]+([-a-zA-Z]+)*$"
This will help you accept only characters and wont allow spaces at the start nor whitespaces.

This is the regex for no white space at the begining nor at the end but only one between. Also works without a 3 character limit :
\^([^\s]*[A-Za-z0-9]\s{0,1})[^\s]*$\ - just remove {0,1} and add * in order to have limitless space between.

As a modification of #Aprillion's answer, I prefer:
^\S$|^\S[ \S]*\S$
It will not match a space at the beginning, end, or both.
It matches any number of spaces between a non-whitespace character at the beginning and end of a string.
It also matches only a single non-whitespace character (unlike many of the answers here).
It will not match any newline (\n), \r, \t, \f, nor \v in the string (unlike Aprillion's answer). I realize this isn't explicit to the question, but it's a useful distinction.

Letters and numbers divided only by one space. Also, no spaces allowed at beginning and end.
/^[a-z0-9]+( [a-z0-9]+)*$/gi

I found a reliable way to do this is just to specify what you do want to allow for the first character and check the other characters as normal e.g. in JavaScript:
RegExp("^[a-zA-Z][a-zA-Z- ]*$")
So that expression accepts only a single letter at the start, and then any number of letters, hyphens or spaces thereafter.

use /^[^\s].([A-Za-z]+\s)*[A-Za-z]+$/. this one. it only accept one space between words and no more space at beginning and end

If we do not have to make a specific class of valid character set (Going to accept any language character), and we just going to prevent spaces from Start & End, The must simple can be this pattern:
/^(?! ).*[^ ]$/
Try on HTML Input:
input:invalid {box-shadow:0 0 0 4px red}
/* Note: ^ and $ removed from pattern. Because HTML Input already use the pattern from First to End by itself. */
<input pattern="(?! ).*[^ ]">
Explaination
^ Start of
(?!...) (Negative lookahead) Not equal to ... > for next set
Just Space / \s (Space & Tabs & Next line chars)
(?! ) Do not accept any space in first of next set (.*)
. Any character (Execpt \n\r linebreaks)
* Zero or more (Length of the set)
[^ ] Set/Class of Any character expect space
$ End of
Try it live: https://regexr.com/6e1o4

^[^0-9 ]{1}([a-zA-Z]+\s{1})+[a-zA-Z]+$
-for No more than one whitespaces in between , No spaces in first and last.
^[^0-9 ]{1}([a-zA-Z ])+[a-zA-Z]+$
-for more than one whitespaces in between , No spaces in first and last.

Other answers introduce a limit on the length of the match. This can be avoided using Negative lookaheads and lookbehinds:
^(?!\s)([a-zA-Z0-9\s])*?(?<!\s)$
This starts by checking that the first character is not whitespace ^(?!\s). It then captures the characters you want a-zA-Z0-9\s non greedily (*?), and ends by checking that the character before $ (end of string/line) is not \s.
Check that lookaheads/lookbehinds are supported in your platform/browser.

Here you go,
\b^[^\s][a-zA-Z0-9]*\s+[a-zA-Z0-9]*\b
\b refers to word boundary
\s+ means allowing white-space one or more at the middle.

(^(\s)+|(\s)+$)
This expression will match the first and last spaces of the article..

Regex - No "p" at second position

I am learning Regex and after reading this post, I started doing some exercises and I got stuck on this exercise. Here are the two lists of words that should be matched and not matched
I started with
^(.).*\1$
and get bothered with sporous that get matched although it should not. So I found
^(.)(?!p).*\1$
that did the trick.
The best solution (uses one less character than my solution) given here is
^(.)[^p].*\1$
but I don't really understand this pattern. Actually I think I am confused about seeing the ^ anchor in a group [] and I am confused about seeing the ^ anchor somewhere else than at the beginning of the regex.
Can you help to understand what this regex is doing?

Anything in square brackets is a character class. This context uses its own mini-syntax which simply lists the allowed characters [abc] or a range of allowed characters [a-z] or disallowed characters by adding a caret as the very first character in the character class [^a-z].

Your solution uses a negative look-ahead (?!p) that does not consume characters, and just checks if the next character is not p.
The other solution uses a negated character class [^p] that will consume a character other than p.
So, the final solution depends on what you need to match/capture.

Here is the pattern explanation of ^(.)[^p].*\1$
^ start of the string/line
(.) group first character
[^p] any character except p
.* zero or more characters
\1 first matched group again
$ end of the string/line
The above regex matches any string that starts and ends with the same character and not contains p at second position.
For detail explanation visit at regex101.
Read more about Negated Character Classes.

[^p] simply means that any character will match, which is not p.
I'll explain the regex step by step in the following sentences.
^ start of the string
(.) matches any character as group 1
[^p] matches any character that is not p
.* matches any character that repeats zero or more times
\1 matches the exact matched character(s) from group 1
$ end of the string
A good source for learning regex is regex101.

^ means assert position at start of the line, however, in a character class [ ] it equates to match character other than ...
Example:
^test-[^p]-1234
Result:
test-q-1234 // match
test-p-1234 // no match
test-o-1234 // match
https://regex101.com/r/wN4zF9/1

Regular Expression (first character matching a-z)

I have this regex: /[^a-zA-Z0-9_-]/
What I want to add to above is:
first character can be only a-zA-Z
How I could make this regular expression?

Try something like this:
^[a-zA-Z][a-zA-Z0-9.,$;]+$
Explanation:
^ Start of line/string.
[a-zA-Z] Character is in a-z or A-Z.
[a-zA-Z0-9.,$;] Alphanumeric or `.` or `,` or `$` or `;`.
+ One or more of the previous token (change to * for zero or more).
$ End of line/string.

I think this would also work
^[a-zA-Z].*
If you wanted to test just the first character as being alphabetical and the rest of the string can be anything.

Regular Expression related: first character alphabet second onwards alphanumeric+some special characters

I have one question related with regular expression. In my case, I have to make sure that
first letter is alphabet, second onwards it can be any alphanumeric + some special characters.
Regards,
Anto

Try something like this:
^[a-zA-Z][a-zA-Z0-9.,$;]+$
Explanation:
^ Start of line/string.
[a-zA-Z] Character is in a-z or A-Z.
[a-zA-Z0-9.,$;] Alphanumeric or `.` or `,` or `$` or `;`.
+ One or more of the previous token (change to * for zero or more).
$ End of line/string.
The special characters I have chosen are just an example. Add your own special characters as appropriate for your needs. Note that a few characters need escaping inside a character class otherwise they have a special meaning in the regular expression.
I am assuming that by "alphabet" you mean A-Z. Note that in some other countries there are also other characters that are considered letters.
More information
Character Classes
Repetition
Anchors

Try this :
/^[a-zA-Z]/
where
^ -> Starts with
[a-zA-Z] -> characters to match

I think the simplest answer is to pick and match only the first character with regex.
String str = "s12353467457458";
if ((""+str.charAt(0)).matches("^[a-zA-Z]")){
System.out.println("Valid");
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Extract text before _ in a string using regex - regex

^._ will only match one character before _. Try this pattern: ^.*?(?=_)

Just capture all characters that are not an underscore: "[^_]*"

Regular Expression to get all characters before "-" Check out #stema's answer. He gives four ways to do this, but the first is probably the best. Match result = Regex.Match(text, #"^.*?(?=-)"); Console.WriteLine(result);

Related

Removing last character from a line using regex

What is the way to combine two regexes? [duplicate]

Regex - No "p" at second position

Regular Expression (first character matching a-z)

Regular Expression related: first character alphabet second onwards alphanumeric+some special characters

Categories

Resources