Is there a RegEx to remove the first instance of "."? - regex

I am trying to remove the first dot "." from a sequence of numbers like this: 2500155978.06. intending to have 250015597806.
Typically, I try to only match what I need and substitute later, i.e. match all "." and then remove the first match. I have been trying with ^[^.]+ but I am only getting the digits up to the first "."
Thought about using a capture group with a positive lookahead but it got me nowhere (still learning RegEx).
Thank you in advance for your time and assistance!

You can use
^(\d+)\.
and replace with $1, the placeholder pointing to the value stored in Group 1.
See the regex demo. Details:
^ - start of string
(\d+) - Group 1 (later referred to with $1 from the replacement pattern): one or more digits
\. - a dot.

Related

Regex - extract last term between _ and before . from path

This is the regex that I'm currently testing
[\w\. ]+(?=[\.])
My ultimate goal is to include a regex expression to extract using regexp_extract in Impala/Hive query.
regexp_extract(col, '[\w\. ]+(?=[\.])', 1)
This doesn't work in Impala however.
Examples of path to extract from:
D:\mypath\Temp\abs\device\Program1.lua
D:\mypath\Temp\abs\device\SE1_Test-program.lua
D:\mypath\Temp\abs\device\Test_program.lua
D:\mypath\Temp\abs\device\Device_Test_Case-general.lua
The regex I've tested extracts the term I'm looking for but it's not good enough, for the second and third, fourth cases I would need to extract only the part after the last underscore.
My expections are:
Program1
Test-program
program
Case-general
Any suggestions? I'm also open to using something other than regexp_extract.
Note that Impala regex does not support lookarounds, and thus you need a capturing group to get a submatch out of the overall match. Also, if you use escaping \ in the pattern, make sure it is doubled.
You can use
regexp_extract(col, '([^-_\\\\]+)\\.\\w+$', 1)
See the regex demo.
The regex means
([^-_\\]+) - Group 1: one or more chars other than -, _ and \
\. - a dot
\w+ - one or more word chars
$ - end of string.
Using \w also matches an underscore, instead you can use [a-zA-Z0-9] instead.
Add matching a dot and hyphen in the character class, capture that in group 1 and match the expected trailing dot.
Note that you don't have to escape dots in a character class.
([a-zA-Z0-9.-]+)[.]
See a regex101 demo
Example using regexp_extract where the , 1 gets the group 1 value:
regexp_extract(col, '([a-zA-Z0-9.-]+)[.]', 1)
If it should be at the end of the string only, matching the last dot without matching any backslashes in between:
regexp_extract(col, '([a-zA-Z0-9.-]+)[.][^\\\\.]+$', 1)

How to exclude a specific string with REGEX? (Perl)

For example, I have these strings
APPLEJUCE1A
APPLETREE2B
APPLECAKE3C
APPLETEA1B
APPLEWINE3B
APPLEWINE1C
I want all of these strings except those that have TEA or WINE1C in them.
APPLEJUCE1A
APPLETREE2B
APPLECAKE3C
APPLEWINE3B
I've already tried the following, but it didn't work:
^APPLE(?!.*(?:TEA|WINE1C)).*$
Any help is appreciated as I'm also kinda new to this.
If you indeed have mutliple strings as you claim, there's no need to jam all that in one regex pattern.
/^APPLE/ && !/TEA|WINE1C/
If you have a single string, the best approach is probably to splice it into lines (split /\n/), but you could also use a single regex match too
/^APPLE(?!.*TEA|WINE1C).*/mg
You can use
^APPLE(?!.*TEA)(?!.*WINE1C).*
See the regex demo.
Details:
^ - start of string
APPLE - a fixed string
(?!.*TEA) - no TEA allowed anywhere to the right of the current location
(?!.*WINE1C) - no WINE1C allowed anywhere to the right of the current location
.* - any zero or more chars other than line break chars as many as possible.
If you don't want to match a string that has both or them (which is not in the current example data):
^APPLE(?!.*(WINE1C|TEA).*(?!\1)(?:TEA|WINE1C)).*
Explanation
^ Start of string
APPLE match literally
(?! Negative lookahead
.*(WINE1C|TEA) Capture either one of the values in group 1
.* Match 0+ characters
(?!\1)(?:TEA|WINE1C) Match either one of the values as long as it is not the same as previously matched in group 1
) Close the lookahead
.* Match the rest of the line
Regex demo

How can I remove something from the middle of a string with regex?

I have strings which look like this:
/xxxxx/xxxxx-xxxx-xxxx-338200.html
With my regex:
(?<=-)(\d+)(?=\.html)
It matches just the numbers before .html.
Is it possible to write a regex that matches everything that surrounds the numbers (matches the .html part and the part before the numbers)?
In your current pattern you already use a capturing group. In that case you might also match what comes before and after instead of using the lookarounds
-(\d+)\.html
To get what comes before and after the digits, you could use 2 capturing groups:
^(.*-)\d+(\.html)$
Regex demo
In the replacement use the 2 groups.
This should do the job:
.*-\d+\.html
Explanation: .* will match anything until -\d+ say it should match a - followed by a sequence of digits before a \.html (where \. represents the character .).
To capture groups, just do (.*-)(\d+)(\.html). This will put everything before the number in a group, the number in another group and everything after the number in another group.

How to create proper regular expression to find last character which I want to?

I need to create regex to find last underscore in string like 012344_2.0224.71_3 or 012354_5.00123.AR_3.335_8
I have wanted find last part with expression [^.]+$ and then find underscore at found element but I can not handle it.
I hope you can help me :)
Just use a negative character class [^_] that will match everything except an underscore (this helps to ensure no other underscores are found afterwards) and end of string $
Pattern would look as such:
(_)[^_]*$
The final underscore _ is in a capturing group, so you are wanting to return the submatch. You would replace the group 1 (your underscore).
See it live: Regex101
Notice the green highlighted portion on Regex101, this is your submatch and is what would be replaced.
The simplest solution I can imagine is using .*\K_, however not all regex flavours support \K.
If not, another idea would be to use _(?=[^_]*$)
You have a demo of the first and second option.
Explanation:
.*\K_: Fetches any character until an underscore. Since the * quantifier is greedy, It will match until the last underscore. Then \K discards the previous match and then we match the underscore.
_(?=[^_]*$): Fetch an underscore preceeded by non-underscore characters until the end of the line
If you want nothing but the "net" (i.e., nothing matched except the last underscore), use positive lookahead to check that no more underscores are in the string:
/_(?=[^_]*$)/gm
Demo
The pattern [^.]+$ matches not a dot 1+ times and then asserts the end of the string. The will give you the matches 71_3 and 335_8
What you want to match is an underscore when there are no more underscores following.
One way to do that is using a negative lookahead (?!.*_) if that is supported which asserts what is at the right does not match any character followed by an underscore
_(?!.*_)
Pattern demo

How can I match all instances of the first letter?

For example, for this string I want to match all A and a:
"All the apples make good cake."
Here's what I did: /(.)[^.]*\1*/ig
I started by getting the first character in the group, which can be any character: (.) Then I added [^.]* because I don't want to match any other character that isn't the first one. Finally I added \1* because I wanted to match the first character again. All other similar variations that I've tried don't seem to work.
The regex you are trying to build would capture very first character then any thing up to the same character as much as possible, using a negative lookahead (tempered dot):
(?i)(\w)(?:(?!\1).)*
Capturing group 1 holds the character you need. Try it on a live demo.
If regex engine supports \K match re-setter token then you can append it to the regex above to only match desired part:
(?i)(\w)(?:(?!\1).)*\K