Regex to delete everything behind first letter - regex

I have a regex \b\d+\K[a-z] Replace with: \u$0
This makes letters in front of numbers caps, for example:
123host
1643domain
into
123Host
1643Domain
What I need to figure out now is how can I delete the numbers.
So I need:
123host
to become
host
and so on, all entries have a numbers in front of them like this:
6410james
599stacks
Into
james
stacks
I tried doing \b\d+\K[a-z] replace with nothing, but it just deletes the first letter, I'm a total noob and any help would be appreciated.

You can simply find \d+ or [0-9]+ and replace it with an empty string, if all samples have the digits in the start. ^\d+ or ^[0-9]+ would also work fo our cases, however it would not work if we'd have digits after the letters.
The expression is explained on the top right panel of this demo if you wish to explore/simplify/modify it.

The pattern you probably want to search for is:
^[^a-zA-Z]*
and then replace with empty string. This is a literal translation of the requirement to remove every non letter from the start of the string.
Demo

Related

Finding a substring using regex

Disclaimer: This question is more from curiosity and will to learn a bit more about Regex, I know it can be achieved with other methods.
I have a string that represents a list, like so: "egg,eggplant,orange,egg", and I want to search for all the instances of the item egg in this list.
I can't search for the substring egg, because it would also return eggplant.
So, I tried to write a regex expression to solve this and got to this expression ((?:^|\w+,)egg(?:$|,\w+))+ (I used this website to build the regex)
Basically, it searches for the word egg at the beginning of the string, the end of the string and in-between commas (while making sure those aren't trailing commas).
And it works fine, except this edge case: "egg,eggplant,egg"
Based on this site, I can see that the first egg is matched but then the regex engine continues until the last comma. Then for the last egg it has the remaining sting ,egg which doesn't match…
So, what can I do to fix the expression and find all the instances of a word in a string that represent a list?
You can use
(?<![^,])egg(?![^,])
Or its less efficient equivalent:
(?<=,|^)egg(?=,|$)
See the regex demo. Details:
(?<![^,]) - a negative lookbehind that requires start of string or comma to appear immediately to the left of the current location
egg - a word
(?![^,]) - a negative lookahead that requires end of string or comma to appear immediately to the right of the current location.
See the regex graph:

Regex - Keep all digits with length of 10-13 digits

search for regex where Keep all digits with length of 10-13 digits and delete the rest in notepad++
my regex doesnt work
[^\d{10,13}]
it finds numbers with commas too :(
Searching for
^(?:.*?(\d{10,13}).*|.*)$
and replacing with
\1
you keep just the 10 to 13 digit long numbers (and empty lines).
Remove the empty lines searching for
^\n
and replacing with nothing.
See it in action: RegEx101.
Addressing #WiktorStribiżew's comments: Relying on the sought after numbers to be always surrounded by white space (which has been checked with OP - but not for the potential case, lines to (effectively) hold just numbers) the search expression could be adjusted to
^(?:.*\s(\d{10,13})\s.*|.*)$
still replacing with
\1
to handle comma holding strings of numbers correctly: RegEx101
By the way:
[^\d{10,13}]
is a character class, which matches anything, which is not:
a number, or
any character out of "{10,3}" (without the quotes, but including the curly braces).
Please comment if and as this requires adjustment / further detail.
To match numbers that are not exactly 3 digits long:
\b(\d{1,9}|\d{14,})\b
You can find all 10-13 length stand alone digits like this
(?<!\d)\d{10,13}(?!\d)
What you do then is up to you.
I don`t know how does notepad works, but this I think this is the regex you are looking for: ^([0-9]){10,13}$
A good page to create/test regex: http://regexr.com/

Regex that selects everything after first consecutive capitalized words

I'd like to select everything after the first few consecutive capitalized words. ie:
Terry Smith is a good school teacher. She works tirelessly.
would become;
is a good school teacher. She works tirelessly.
So far this doesn't work work;
(^[A-Z][a-z]+(?=\s[A-Z])(?:\s[A-Z][a-z]+)+)([\s\S]*)
I'm using it in Drupal's feeds tamper plugin with the "find replace regex" feature in order to replace everything after "Terry Smith" with blank space.
The following expression with match all consecutive capitalized words at the beginning of the sentence.
^(?:(?:[A-Z][a-z]+)(?>\s*))+
Regex101 Demo
If you want to remove that part from the setnence then all you have to do is replace it with the empty string.
If you want to replace the part that comes after it then you can use the following expression:
^((?:(?:[A-Z][a-z]+)(?>\s*))+)([\s\S]+)
and use a replacement string of $1 or whatever in your language that is used to reference the first captured group.
Regex101 Demo
This will find the capital words:
[A-Z][a-z]+(?=\b)\s*
You might want to replace the + with * after [a-z] to also match single-character capital words.
To get all capitalized words at the beginning of the string, add ^( and )+ around it:
^([A-Z][a-z]+(?=\b)\s*)+

remove repeated character between words

I am trying out the quiz from Regex 101
In Task 6, the question is
Oh no! It seems my friends spilled beer all over my keyboard last night and my keys are super sticky now. Some of the time when I press a key, I get two duplicates. Can you pppllleaaaseee help me fix this? Content in bold should be removed.
I have tried this regex
([a-z])(\1{2})
But couldn't get the solution.
The solution for the riddle on that website is:
/(.)\1{2}/g
Since any key on the keyboard can get stuck, so we need to use ..
\1 in the regex means match whatever the 1st capturing group (.) matches.
Replacement is $1 or \1.
The rest of your regex is correct, just that there are unnecessary capturing groups.
Your regex is correct if you want to match exactly three characters. If you want to match at least three, that is
([a-z])(\1{2,})
or
([a-z])(\1\1+)
Since you don't need to capture anything but the first occurence, these are slightly better:
([a-z])\1{2} # your original regex (exactly three occurences)
([a-z])\1{2,}
([a-z])\1\1+
Now, the replacement should be exactly one occurence of the character, and nothing more:
\1
Replace:
(.)\1+
with:
\1
This of course requires that your regex engine suports backreferences... Also, in the replacement part, and according to regex engines, \1 may have to be written as $1.
I'd do it with (\w)(\1+)? but can't find out how to "remove" within the given site...
Best way would be to replace the results of the secound match with empty strings

How can I make a regex match the next 4 characters immediately after finding something?

I'm trying to write a regex to sift through a sizable amount of data. After it finds something, I want it to match the next 4 characters whatever they are. How can I do this?
/match long stuff here..../
The . in a regex is "Any character." Four of them gets you four characters. You could also do:
/match long stuff here.{4}/
This may depend on what language you are writing your regex in.
The expression .... matches any four characters. Append that to your pattern, and put parenthesis around it so that whatever those characters are will be captured.
For example:
[Hh]ello [Ww]orld(....)
Look at this example: I want to match an IP and the next 4 characters after it.I have a regex
(?:\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})(.{4})
if you match that against the following string 192.167.45.45xabc the first part (?:\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) will match the IP and the last part (.{4}) will match xabc. (I had added ?: at the beginning to make the first block noncapturing - if you want to capture the IP to just remove ?:)
I hope this helps