Regex - finding last digits from a string - regex

I've a string that is of the form:
<somedomain>/index.php?attachments/24322
I'm required to find out ending number which can have any count of digits after the '/'. That is 24322, in this example. Also, the number will always have 'attachments/' before it. That is, the URL must have the format 'attachments/'
Can someone help me write the regex to achieve this?
I'm still at a beginner with Regex. I'd be using it with preg_match_all in my php code.
Thank you in advance for reading this question and your time.

https://regex101.com/r/iPy6av/1
$re = '/(\d+)$/';
$str = '<somedomain>/index.php?attachments`enter code here`/24322';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
// Print the entire match result
var_dump($matches);
Now loop through the matches array you will get the number.

If the string always ends with attachments/[digits] you could use:
attachments/\K\d+$
That would match
attachments # Match attachments
\K # Reset the starting point of the reported match
\d+ # Match one or more digits
$ # End of the string
Php output test

Related

PHP, Incosistent results with preg_replace (RegEx)

I'm looking for a nudge in the right direction because the results I get from preg_replace don't make sense to me.
I've the following RegEx:
/([a-zA-Z0-9]{1,})/([a-zA-Z0-9]{1,})/([a-zA-Z0-9_]{1,})/([a-zA-Z0-9_]{1,})/([a-zA-Z0-9_]{1,})\b
I've a file which consists of lines like this:
1:
*/tdn/quota/plot_3/boot_tdd_8/Homes_Homes1/boot/bplsed/ruc001/No Files/pl1/Cookies/MMTException/container.rig,11/12/2017,29/11/2017,29/11/2017*
2:
*/vdm/quota/plot_1/boot_tdd_1/Homes_Homes2/.etc/nonrig_tile_edit.vids,07/08/2014,07/08/2014,07/08/2014*
3:
*/vdm/quota/plot_5/boot_tdd_3/Homes_Homes1/boot/int/rlt111/pl1/Cookies/container.rig,19/11/2019,13/11/2017,13/11/2017*
My goal is to only keep everything after the /Homes_Homes/ part.
I get the correct replacement for the first file path with my Regex:
*/boot/bplsed/ruc001/No Files/pl1/Cookies/MMTException/container.rig,11/12/2017,29/11/2017,29/11/2017*
The second file path is also correct:
*/.etc/nonrig_tile_edit.vids,07/08/2014,07/08/2014,07/08/2014*
However, for the last file path I get:
*/container.rig*
instead of
*/boot/int/rlt111/pl1/Cookies/container.rig,19/11/2019,13/11/2017,13/11/2017*
Why does preg_replace fail with the third file path?
The reason you get that last result is because preg_replace will replace all the matches and in the last example string the pattern matches twice.
What you might do is set the 4th parameter $limit to 1 to do a single replacement.
Not all the character classes in your pattern match an underscore, but if it would be ok to do so, you might shorten the pattern using a quantifier {5}, and anchor ^ to assert the start of the srting and make use of \K to match and then forget the * at the start of the string.
^\*\K(?:/\w+){5}
Regex demo | Php demo
For example
$re = '~^\*\K(?:/\w+){5}~';
$strings = [
"*/tdn/quota/plot_3/boot_tdd_8/Homes_Homes1/boot/bplsed/ruc001/No Files/pl1/Cookies/MMTException/container.rig,11/12/2017,29/11/2017,29/11/2017*",
"*/vdm/quota/plot_1/boot_tdd_1/Homes_Homes2/.etc/nonrig_tile_edit.vids,07/08/2014,07/08/2014,07/08/2014*",
"*/vdm/quota/plot_5/boot_tdd_3/Homes_Homes1/boot/int/rlt111/pl1/Cookies/container.rig,19/11/2019,13/11/2017,13/11/2017*"
];
foreach ($strings as $s) {
echo preg_replace($re, '', $s) . PHP_EOL;
}
Output
*/boot/bplsed/ruc001/No Files/pl1/Cookies/MMTException/container.rig,11/12/2017,29/11/2017,29/11/2017*
*/.etc/nonrig_tile_edit.vids,07/08/2014,07/08/2014,07/08/2014*
*/boot/int/rlt111/pl1/Cookies/container.rig,19/11/2019,13/11/2017,13/11/2017*

how to replace a string with a dynamic string

Case 1.
I have a string of alphabets like fthhdtrhththjgyhjdtygbh. Using regex I want to change it to ftxxxxxxxxxxxxxxxxxxxxx, i.e, keep the first two letters and replace the rest by x.
After a lot of googling, I achieved this:
s/^(\w\w)(\w+)/$1 . "x" x length($2)/e;
Case 2.
I have a string of alphabets like sdsABCDEABCDEABCDEABCDEABCDEsdf. Using regex I want to change it to sdsABCDExyxyxyABCDEsdf, i.e, keep the first and last ABCDE and replace the ABCDE in the middle with xy.
I achieved this:
s/ABCDE((ABCDE)+)ABCDE/$len = length($1)\/5; ABCDE."xy"x $len . ABCDE/e;
Problem : I am not happy with my solution to the mentioned problem. Is there any better or neat solution to the mentioned problem.
Contraint : Only one regex have to be used.
Sorry for the poor English in the title and the body of the problem, english isn't my first language. Please ask in comments if anything is not clear.
Task 1: Simplify the password hider regex
Use a Positive Lookbehind Assertion to replace all word characters preceded by two other word characters. This removes the need for the /e Modifier:
my $str = 'fthhdtrhththjgyhjdtygbh';
$str =~ s/(?<=\w{2})\w/x/g;
print $str;
Outputs:
ftxxxxxxxxxxxxxxxxxxxxx
Task 2: Translate inner repeated pattern regex
Use both a Positive Lookbehind and Lookahead Assertion to replace all ABCDE that are bookended by the same string:
my $str = 'sdsABCDEABCDEABCDEABCDEABCDEsdf';
$str =~ s/(?<=(ABCDE))\1(?=\1)/xy/g;
print $str, "\n";
Output:
sdsABCDExyxyxyABCDEsdf
One regex, less redundancy using \1 to refer to first captured group,
s|(ABCDE)\K (\1+) (?=\1)| "xy" x (length($2)/length($1)) |xe;

Regex to match alphanumerics, URL operators except forward slashes

I've been trying for the past couple of hours to get this regex right but unfortunately, I still can't get it. Tried searching through existing threads too but no dice. :(
I'd like a regex to match the following possible strings:
userprofile?id=123
profile
search?type=player&gender=male
someotherpage.htm
but not
userprofile/
helloworld/123
Basically, I'd like the regex to match alphanumerics, URL operators such as ?, = and & but not forward slashes. (i.e. As long as the string contains a forward slash, the regex should just return 0 matches.)
I've tried the following regexes but none seem to work:
([0-9a-z?=.]+)
(^[^\/]*$[0-9a-z?=.]+)
([0-9a-z?=.][^\/]+)
([0-9a-z?=.][\/$]+)
Any help will be greatly appreciated. Thank you so much!
The reason they all match is that your regexp matches part of the string and you've not told it that it needs to match the entire string. You need to make sure that it doesn't allow any other characters anywhere in the string, e.g.
^[0-9a-z&?=.]+$
Here's a small perl script to test it:
#!/usr/bin/perl
my #testlines = (
"userprofile?id=123",
"userprofile",
"userprofile?type=player&gender=male",
"userprofile.htm",
"userprofile/",
"userprofile/123",
);
foreach my $testline(#testlines) {
if ($testline =~ /^[0-9a-z&?=.]+$/) {
print "$testline matches\n";
} else {
print "$testline doesn't match - bad regexp, no cookie\n";
}
}
This should do the trick:
/\w+(\.htm|\?\w+=\w*(&\w+=\w*)*)?$/i
To break this down:
\w+ // Match [a-z0-9_] (1 or more), to specify resource
( // Alternation group (i.e., a OR b)
\.htm // Match ".htm"
| // OR
\? // Match "?"
\w+=\w* // Match first term of query string (e.g., something=foo)
(&\w+=\w*)* // Match remaining terms of query string (zero or more)
)
? // Make alternation group optional
$ // Anchor to end of string
The i flag is for case-insensitivity.

How to return the first five digits using Regular Expressions

How do I return the first 5 digits of a string of characters in Regular Expressions?
For example, if I have the following text as input:
15203 Main Street
Apartment 3 63110
How can I return just "15203".
I am using C#.
This isn't really the kind of problem that's ideally solved by a single-regex approach -- the regex language just isn't especially meant for it. Assuming you're writing code in a real language (and not some ill-conceived embedded use of regex), you could do perhaps (examples in perl)
# Capture all the digits into an array
my #digits = $str =~ /(\d)/g;
# Then take the first five and put them back into a string
my $first_five_digits = join "", #digits[0..4];
or
# Copy the string, removing all non-digits
(my $digits = $str) =~ tr/0-9//cd;
# And cut off all but the first five
$first_five_digits = substr $digits, 0, 5;
If for some reason you really are stuck doing a single match, and you have access to the capture buffers and a way to put them back together, then wdebeaum's suggestion works just fine, but I have a hard time imagining a situation where you can do all that, but don't have access to other language facilities :)
it would depend on your flavor of Regex and coding language (C#, PERL, etc.) but in C# you'd do something like
string rX = #"\D+";
Regex.replace(input, rX, "");
return input.SubString(0, 5);
Note: I'm not sure about that Regex match (others here may have a better one), but basically since Regex itself doesn't "replace" anything, only match patterns, you'd have to look for any non-digit characters; once you'd matched that, you'd need to replace it with your languages version of the empty string (string.Empty or "" in C#), and then grab the first 5 characters of the resulting string.
You could capture each digit separately and put them together afterwards, e.g. in Perl:
$str =~ /(\d)\D*(\d)\D*(\d)\D*(\d)\D*(\d)/;
$digits = $1 . $2 . $3 . $4 . $5;
I don't think a regular expression is the best tool for what you want.
Regular expressions are to match patterns... the pattern you are looking for is "a(ny) digit"
Your logic external to the pattern is "five matches".
Thus, you either want to loop over the first five digit matches, or capture five digits and merge them together.
But look at that Perl example -- that's not one pattern -- it's one pattern repeated five times.
Can you do this via a regular expression? Just like parsing XML -- you probably could, but it's not the right tool.
Not sure this is best solved by regular expressions since they are used for string matching and usually not for string manipulation (in my experience).
However, you could make a call to:
strInput = Regex.Replace(strInput, "\D+", "");
to remove all non number characters and then just return the first 5 characters.
If you are wanting just a straight regex expression which does all this for you I am not sure it exists without using the regex class in a similar way as above.
A different approach -
#copy over
$temp = $str;
#Remove non-numbers
$temp =~ s/\D//;
#Get the first 5 numbers, exactly.
$temp =~ /\d{5}/;
#Grab the match- ASSUMES that there will be a match.
$first_digits = $1
result =~ s/^(\d{5}).*/$1/
Replace any text starting with a digit 0-9 (\d) exactly 5 of them {5} with any number of anything after it '.*' with $1, which is the what is contained within the (), that is the first five digits.
if you want any first 5 characters.
result =~ s/^(.{5}).*/$1/
Use whatever programming language you are using to evaluate this.
ie.
regex.replace(text, "^(.{5}).*", "$1");

Get numbers from string with regex

I am trying to write a regex to get the numbers from strings like these ones:
javascript:ShowPage('6009',null,null,null,null,null,null,null)
javascript:BlockLink('2146',null,null,null)
I am having difficulty writing the regex to grab these numbers.
How should I do this?
Try this:
(\d+)
What language are you using to parse these strings?
If you let me know I can help you with the code you would need to use this regular expression.
Assuming:
you want to capture the digits
there's only one set of digits per line
Try this:
/(\d+)/
then $1 (Perl) or $matches[1] (PHP) or whatever your poison of choice is, should contain the digits.
Integer or float:
/\d+((.|,)\d+)?/
just match numbers: \d+
// PHP
$string = 'ssss 12.2';
$pattern = '/\D*(\d+)(.|,)?(\d+)?\D*/';
$replacement = '$1.$3';
$res = (float)preg_replace($pattern, $replacement, $string);
// output 12.2