How to remove text after certain word - regex

I have a string
The best laid schemes of mice and men
How do I remove all text after the word "schemes" in ColdFusion? I suppose this can be done with regex.

Here ya go:
<cfset myString = "The best laid schemes of mice and men" />
<cfoutput>#REReplace(myString, "schemes(.*)", "schemes")#</cfoutput>

Your regex is:
schemes.*$
and replace with "schemes"
Explanation
.*$ means match any character (.) 0 or more times (*) till the end of the row ($)

Try this regex:
schemes(.*)
Replace $1 with empty string ""

Related

Regex matching all characters from the beginning of the string to the first underscore

I am trying to substring elements of a vector to only keep the part before the FIRST underscore. I am a bit of a newbie with taking substrings and don't fully understand all regex yet. I am close to the answer, I can get the part that I want to delete but still don't see how to get the opposite part. Any help and/or explanation of regex is appreciated!
my vector looks like the following, with multiple underscores in some elements
v = c("WL_Alk", "LQ_Frac_C_litter_origin", "MI_Nr_gat", "SED_C_N", "WL_CO2", "WL_S")
my desired output looks like
v_short = c("WL", "LQ", "MI", "SED", "WL", "WL")
The code that gets me the part I want to delete is sub("^[^_]*", "", v). I think I have to do something with $ in regex because sub("[_$]", "", v) deletes the first underscore, but I can't get it to delete the part behind it. Even with the regex helpfile I don't fully understand the meaning of ^, $ and * yet, so explanation on those is also appreciated!
You can use
> v = c("WL_Alk", "LQ_Frac_C_litter_origin", "MI_Nr_gat", "SED_C_N", "WL_CO2", "WL_S")
> sub("_.*", "", v)
[1] "WL" "LQ" "MI" "SED" "WL" "WL"
The "_.*" pattern matches the first _ and .* matches any 0+ characters up to the end of string greedily (that is, grabs them at one go).
With stringr str_extract, you can use your pattern:
> library(stringr)
> v_short = str_extract(v, "^[^_]*")
> v_short
[1] "WL" "LQ" "MI" "SED" "WL" "WL"
The ^[^_]* pattern matches the beginning of the string and 0 or more characters other than _.
If I understood correctly
gsub("(.*?)(_.*)","\\1",v, perl = TRUE)
Explanation:
(.*?) the first capturing group;
(_.*) the second capturing group;
\\1 return the first capturing group;
There are two ways to do it.
Either use ^[^_]+ and match string before first _. Regex101 Demo
OR
Select the part after first _ using \_.+$ and eliminate it. Regex101 Demo

How to make it start with specific word?

My job is to create a regular expression that matches any of the variations on the name: vic, Victor, Victoria, victor, Victoria, VICKY, VICTOR. VICTORIA, Vic, VIC
and so I wrote this
(Vic[a-z])|(vic[a-z])|(VIC[A-Z]*)
I have a problem though. For example if I typed "Hello aVictoria", it would match the string "Victoria"...How do I make it to match such that the first character has to be V or v?
You failed to add *,
(Vic[a-z]*)|(vic[a-z]*)|(VIC[A-Z]*)
In the below regex, you failed to add + or * on the first two regexes,
(Vic[a-z])|(vic[a-z])|(VIC[A-Z]*)
(Vic[a-z]) - It matches any single alphabetical character after Vic but it wont match more than one character after that word Vic.
If you need start of the word, you can use \b (word boundary 0-width matcher). If you need start of the line, you should use ^ (start of the line 0-width matcher). Regexp engines that also have multi-line capabilities will also define \A (start of the string 0-width matcher). Depending on what you want, stick one of those in front of your regexp, like this:
^((Vic[a-z])|(vic[a-z])|(VIC[A-Z]*))
(You also need extra parentheses, because otherwise you would match "Victor" only at start, but "victor" and "VICTOR" anywhere at all.)
That said, this won't match "vic", since you are requiring another [a-z] after it. This would be the correct regexp:
^((Vic[a-z]*)|(vic[a-z]*)|(VIC[A-Z]*))
This will match "Victor", but not "aVictor". If you write \b instead of ^, you will additionally match "I am not a Victor" (while still disallowing "aVictor").
^ character to indicate start of string, like
^(Vic[a-z])|(vic[a-z])|(VIC[A-Z]*)
Try this
/\bvic[a-zA-Z]{0,}/ig
for Explanation
Sample Example
'vic, Victor, Victoria, victor, Victoria, VICKY, VICTOR. VICTORIA, Vic, VIC, Hello aVictoriaX'.match(/\bvic[a-zA-Z]{0,}/ig)
//output
["vic", "Victor", "Victoria", "victor", "Victoria", "VICKY", "VICTOR", "VICTORIA", "Vic", "VIC"]
//Hello aVictoriaX isn't matched

Delete numerals at the end but keep dates and text

I'm a beta tester for a hockey game and sometimes the schedules I get are fouled up. Can anyone help this Notepad-challenged newbie?
Turn this:
19;10;2012;Oklahoma City Barons;San Antonio Rampage323
19;10;2012;Milwaukee Admirals;Charlotte Checkers572
19;10;2012;Manchester Monarchs;Providence Bruins002
19;10;2012;Albany Devils;Syracuse Crunch579
Into this:
19;10;2012;Oklahoma City Barons;San Antonio Rampage
19;10;2012;Milwaukee Admirals;Charlotte Checkers
19;10;2012;Manchester Monarchs;Providence Bruins
19;10;2012;Albany Devils;Syracuse Crunch
Thanks!
To teach you some regex...
First you can match digits with \d
Secondly, you can "anchor" the match, the $ means "the end of the string"
Finally, you want to specify 1 or more digits, so you add the + quantifier to the \d token I mentioned earlier to create \d+
3.1. If the numbers are not ALWAYS on the end, make it optional with * ('0 or more') \d*
Full regex: \d+$ or \d*$
Assuming Perl:
cat file | perl -ne 's/\d+$//' > newfile
Where file is the file with the numbers and newfile is the corrected entry.

Regex pattern with different rule at end of string?

The pattern should match word with capictal letter .
words are seprated with [ ]+
But the last word should not have [ ] after it.
there is no limit of number of the words.
I have managed to do : (http://regexr.com?32s1h)
^([A-Z]{1}[a-z]+([a-z]+)?[ ]+)+$
which is Working for Xav Tvc Dcc_ //notice the last space
but not for Xav Tvc Dcc
How can i fix my regex ?
If space behind the last word is optional, use regex pattern
^(?:[A-Z][a-z]*(?:[ ]+|$))+$
...and if there should not be a space behind the last word, then go with
^(?:[A-Z][a-z]*(?:[ ]+(?=.)|$))+$
Require a word not followed by a space at the end:
^([A-Z]{1}[a-z]+([a-z]+)?[ ]+)*[A-Z]{1}[a-z]+([a-z]+)?$
Quick PowerShell test:
PS Home:> 'Xav Tvc Dcc ','Xav Tvc Dcc' -match '^([A-Z]{1}[a-z]+([a-z]+)?[ ]+)*[A-Z]{1}[a-z]+([a-z]+)?$'|%{"<$_>"}
<Xav Tvc Dcc>
If you're worried about possible errors introduced by changing the regex you could always construct it on the fly:
var word = "([A-Z]{1}[a-z]+([a-z]+)";
var regex = string.Format("^({0}?[ ]+){0}$", word);
Or similar for whatever language you use.
Why not do something like this:
^([A-Z]{1}[a-z]+\s?)*
Show Here
And then run a trim operation to remove the trailing spaces?

How to unpunctuate, lowercase, de-space and hyphenate a string with regex?

If I have a string like this
Newsflash: The Big(!) Brown Dog's Brother (T.J.) Ate The Small Blue Egg
how would I convert that into the following using regex:
newsflash-the-big-brown-dogs-brother-tj-ate-the-small-blue-egg
In other words, punctuation is discarded and spaces are replaced with hyphens.
It sounds like you want to create a "URL plug" -- a URL-friendly version of an article's title, for example. That means you'll want to make sure you remove all possible non-URL-friendly characters, not just a few. You might do it this way (in order):
Remove all non-letter non-number non-space characters by:
Replacing regex [^A-Za-z0-9 ] with the empty string "".
Replace all spaces with a dash by:
Replacing regex \s+ with the string "-".
Lower-case the string by:
Java s = s.toLowerCase();
JavaScript s = s.toLowerCase();
C# s = s.ToLowerCase();
Perl $s = lc($s);
Python s = s.lower()
PHP $s = strtolower($s);
Ruby s = s.downcase
Replace the regex [\s-]+ with "-", then replace [^\w-] with "".
Then, call ToLowerCase or equivalent.
In Javascript:
var s = "Newsflash: The Big(!) Brown Dog's Brother (T.J.) Ate The Small Blue Egg";
alert(s.replace(/[\s+-]/g, '-').replace(/[^\w-]/g, '').toLowerCase());
Replace /\W+/ with '-', that will replace all non-word characters with a dash.
Then, collapse dashes by replacing /-+/ with '-'.
Then, lowercase the string - pure regex solutions cannot do that. You didn't say which language you are using, so I cannot give you an example, but your language might have String.toLowercase() or a tr/// call (tr/A-Z/a-z/, for example, in Perl).