How to grab a numeric id in URI?

How to grab a numeric id in URI? - regex

I need to grab the first numeric ID after #/.
For example, I would like only grab 68 and 112 in these URI:
//www.domain.tld/category/14-457-myproduct.html#/68-attribute-fm_300_224_39_b
//www.domain.tld/category/36-578-myproduct.html#/112-attribute-fm_489_471_51_w
I even tried to split the URI in several steps, but it does not work once on my site (Smarty template).

Have you tried something like this :
<?php
$pattern = "/#\/([0-9])*-/";
$url1 = "//www.domain.tld/category/14-457-myproduct.html#/68-attribute-fm_300_224_39_b";
$url2 = "//www.domain.tld/category/36-578-myproduct.html#/112-attribute-fm_489_471_51_w";
echo getNumber($url1, $pattern);
echo "\n";
echo getNumber($url2, $pattern);
function getNumber($url, $pattern){
preg_match($pattern, $url, $matches);
return substr($matches[0], 2, -1);
}

Related

Regular expression only finding first match

I'm working on something that is similar to other designs I've done, but for some reason, it's only finding the first key/value pair, whereas other ones found all of them. It looks good in regex101.com, which is where I typically test these.
I'm parsing c++ code to get what I need for a reference spreadsheet for error tracking across a system, and results go into a spreadsheet, or is used as a key to lookup info in another file. I do something similar for about 20 files, plus there's other data coming from a sql query, or access/mdb file. The data for this file looks like this:
m_ErrorMap.insert(make_pair(
MAKEWORD(scError,seFatal),
HOP_FATAL_ERROR ));
m_ErrorMap.insert(make_pair(
MAKEWORD(scError,seNotSelected),
HOP_NOT_SELECTED));
m_ErrorMap.insert(make_pair(
MAKEWORD(scError,seCoverOpen),
HOP_COVER_OPEN ));
m_ErrorMap.insert(make_pair(
MAKEWORD(scError,seLeverPosition),
HOP_LEVER_POSITION ));
m_ErrorMap.insert(make_pair(
MAKEWORD(scError,seJam),
HOP_JAM ));
I read this as a string from the file (looks good), and feed it into this Function as $fileContent:
Function Get-Contents60{
[cmdletbinding()]
Param ([string]$fileContent)
Process
{
#m_ErrorMap.insert(make_pair(
#MAKEWORD(scError,seJam),
#HOP_JAM ));
# construct regex
switch -Regex ($fileContent -split '\r?\n') { #this is splitting on each line test regex with https://regex101.com/
'MAKEWORD["("][\w]+,(\w+)[")"],' { #seJam
# add relevant key to key collection
$keys = $Matches[1] } #only match once
',(HOP.*?)[\s]' { # HOP_JAM
# we've reached the relevant error, set it for all relevant keys
foreach($key in $keys){
Write-Host "60 key: $key"
Write-Host "Matches[0]: $($Matches[0]) Matches[1]: $($Matches[1])"
$errorMap[$key] = $($Matches[1])
Write-Host "60 key: $key ... value: $($errorMap[$key])"
}
}
'break' {
# reset/clear key collection
$keys = #()
}
}#switch
#Write-Host "result:" $result -ForegroundColor Green
#$result;
return $errorMap
}#End of Process
}#End of Function
I stepped through it in VSCode, and its finding the first key/value pair, and after that it's not finding anything. I looked at it in regex101.com, and it's finding line endings/breaks, and the MAKEWORD regex and HOP regex are finding what they should on each line it should.
I'm not sure if the issue is that they aren't all in the same line, and maybe I need to change it so it doesn't break on newline and breaks on something else for each key/value pair? I'm a little fuzzy on this.
I'm using powershell 5.1, and VSCode.
Update:
I modified Theo's answer and it worked great. I had simplified the class name from m_HopErrorMap to m_ErrorMap for this question, and the regular expression was grabbing that for each one. I modified that slightly, and Theo's works.
function Get-Contents60{
[cmdletbinding()]
Param ([string]$fileContent)
# create an ordered hashtable to store the results
$errorMap = [ordered]#{}
# process the lines one-by-one
switch -Regex ($fileContent -split '\r?\n') {
'MAKEWORD\([^,]+,([^)]+)\),' { # seJam, seFatal etc.
$key = $matches[1]
}
'(HOP_[^)]+)' {
$errorMap[$key] = $matches[1].Trim()
}
}
# output the completed data as object
[PsCustomObject]$errorMap
return $errorMap
}

I would simplify your function to
function Get-Contents60{
[cmdletbinding()]
Param ([string]$fileContent)
# create an ordered hashtable to store the results
$errorMap = [ordered]#{}
# process the lines one-by-one
switch -Regex ($fileContent -split '\r?\n') {
'MAKEWORD\([^,]+,([^)]+)\),' { # seJam, seFatal etc.
$key = $matches[1]
}
'(HOP[^)]+)' {
$errorMap[$key] = $matches[1].Trim()
}
}
# output the completed data as object
[PsCustomObject]$errorMap
}
Then, using your example text, for which I'm using a Here-string, but in real life you would load the file content with $c = Get-Content -Path 'X:\TheErrors.txt' -Raw you do
$result = Get-Contents60 -fileContent $c
To display on screen
$result | Format-Table -AutoSize
giving you
seFatal seNotSelected seCoverOpen seLeverPosition seJam
------- ------------- ----------- --------------- -----
HOP_FATAL_ERROR HOP_NOT_SELECTED HOP_COVER_OPEN HOP_LEVER_POSITION HOP_JAM

Perl deferred interpolation of string

I have a situation where there is a triage script that takes in a message, compares it against a list of regex's and the first one that matches sets the bucket. Some example code would look like this.
my $message = 'some message: I am bob';
my #buckets = (
{
regex => '^some message:(.*)',
bucket => '"remote report: $1"',
},
# more pairs
);
foreach my $e (#buckets) {
if ($message =~ /$e->{regex}/i) {
print eval "$e->{bucket}";
}
}
This code will give remote report: I am bob. I keep looking at this and feel like there has to be a better way to do this then it is done now. especially with the double quoting ('""') in the bucket. Is there a better way for this to be handled?

Perl resolves the interpolation when that expression is evaluated. For that, it is sufficient to use a subroutine, no eval needed:
...
bucket => sub { "remote report: $1" },
...
print $e->{bucket}->();
Note that you effectively eval your regexes as well. You can use pre-compiled regex objects in your hash, with the qr// operator:
...
regex => qr/^some message:(.*)/i,
...
if ($message =~ /$e->{regex}/) {

You could use sprintf-style format strings:
use strict;
use warnings;
my $message = 'some message: I am bob';
my #buckets = (
{
regex => qr/^some message:(.*)/,
bucket => 'remote report: %s',
},
# more pairs
);
foreach my $e (#buckets) {
if (my #matches = ($message =~ /$e->{regex}/ig)) {
printf($e->{bucket}, #matches);
}
}

Truncate a URL while ignoring DIV tags

We use the following code to display a value in the output of a Wordpress site page. However, occasionally the output is too long to fit within the box we've set for it so we'd like to truncate it.
$markup = str_replace('%%', get_post_meta($post_id, '_sf_submission_field_'.get_the_ID(), true), htmlspecialchars_decode(get_post_meta(get_the_ID(), 'markup', true)));
$text = preg_replace('#(script|about|applet|activex|chrome):#is', "\\1:", $markup);
$ret = ' ' . $text;
$ret = preg_replace("#(^|[\n ])([\w]+?://[\w\#$%&~/.\-;:=,?#\[\]+]*)#is", "\\1\\2", $ret);
$ret = preg_replace("#(^|[\n ])((www|ftp)\.[\w\#$%&~/.\-;:=,?#\[\]+]*)#is", "\\1\\2", $ret);
$ret = preg_replace("#(^|[\n ])([a-z0-9&\-_.]+?)#([\w\-]+\.([\w\-\.]+\.)*[\w]+)#i", "\\1\\2#\\3", $ret);
$ret = substr($ret, 1);
echo $ret;
Using substr as follows $ret = substr($ret, 0, 30); is would be great, however, part of the input string has styling div tags and other text which cannot be truncated. So my question is how can I truncate JUST the part of the string that has a URL in it... and in turn not truncate the href itself as it still needs to be a clickable link.
Here is a sample input string:
<i class="icon-twitter-squared"></i> http://www.stackoverflow.com/reallylongurl
...I'd like only the http://www.stackoverflow.com/reallylongurl to be truncated to www.stackoverfl...
for example - it needs to remain clickable as the original untruncated URL.
Many thanks for your suggestions!

Update: To get the link that is not part of href and also as you asked in the comment you can use this regex:
(?<!href=")https?://(.{9}).*?/\w+
Working demo

Perl Regex on a mechanize->content page

I am fiddling around in perl and I managed to retrieve a HTML page from a source. However I just want to retrieve 1 particulair line. The line starts with a date formatted as follow: dd/mm/YYYY.
The HTML is in displayed with print $resp->content(); $resp being a response from a $mechanice->submit_form();
This is where the resp is made:
my $resp = $m->submit_form(
//bunch of data
},
);
How do I achieve this? I am familiar with PHP but I just started with Perl.
Thanks

Here's an example from some Mechanize code that I have.
my $mech = WWW::Mechanize->new();
$mech->get("url that takes you to the page with the form");
$mech->submit_form(form_name => 'someform',
fields => {'user_name' => 'user's
'password' => 'password'},
button => 'submit');
return if not $mech->success();
my $content = $mech->content();
if ($content =~ m|(\d{2,2}/\d{2,2}/\d{4,4}.*)|g) {
print "My line: $1\n";
}

Regexp to find youtube url, strip off parameters and return clean video url?

imagine this url:
http://www.youtube.com/watch?v=6n8PGnc_cV4&feature=rec-LGOUT-real_rn-2r-13-HM
what is the cleanest and best regexp to do the following:
1.) i want to strip off every thing after the video URL. so that only http://www.youtube.com/watch?v=6n8PGnc_cV4 remains.
2.) i want to convert this url into http://www.youtube.com/v/6n8PGnc_cV4
Since i'm not much of a regexp-ert i need your help:
$content = preg_replace('http://.*?\?v=[^&]*', '', $content);
return $content;
edit: check this out! I want to create a really simple WordPress plugin that just recognizes every normal youtube URL in my $content and replaces it with the embed code:
<?php
function videoplayer($content) {
$embedcode = '<object class="video" width="308" height="100"><embed src="' . . '" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="308" height="100" wmode="opaque"></embed></object>';
//filter normal youtube url like http://www.youtube.com/watch?v=6n8PGnc_cV4&feature=rec-LGOUT-real_rn-2r-13-HM
//convert it to http://www.youtube.com/v/6n8PGnc_cV4
//use embedcode and pass along the new youtube url
$content = preg_replace('', '', $content);
//return embedcode
return $content;
}
add_filter('the_content', 'videoplayer');
?>

I use this search criteria in my script:
/((http|ftp)\:\/\/)?([w]{3}\.)?(youtube\.)([a-z]{2,4})(\/watch\?v=)([a-zA-Z0-9_-]+)(\&feature=)?([a-zA-Z0-9_-]+)?/

You could just split it on the first ampersand.
$content = explode('&', $content);
$content = $content[0];

Edit: Simplest regexp: /http:\/\/www\.youtube\.com\/watch\?v=.*/
Youtube links are all the same. To get the video id from them, first you slice off the extra parameters from the end and then slice off everything but the last 11 characters. See it in action:
$url = "http://www.youtube.com/watch?v=1rnfE4eo1bY&feature=...";
$url = $url.left(42); // "http://www.youtube.com/watch?v=1rnfE4eo1bY"
$url = $url.right(11); // "1rnfE4eo1bY"
$result = "http://www.youtube.com/v/" + $url; // "http://www.youtube.com/v/1rnfE4eo1bY"
You can uniformize all your youtube links (by removing useless parameters) with a Greasemonkey script: http://userscripts.org/scripts/show/86758. Greasemonkey scripts are natively supported as addons in Google Chrome.
And as a bonus, here is a one (okay, actually two) liner:
$url = "http://www.youtube.com/watch?v=1rnfE4eo1bY&feature=...";
$result = "http://www.youtube.com/v/" + $url.left(42).right(11);
--3ICE

$url = "http://www.youtube.com/v/6n8PGnc_cV4";
$start = strpos($url,"v=");
echo 'http://www.youtube.com/v/'.substr($url,$start+2);

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to grab a numeric id in URI? - regex

Related

Regular expression only finding first match

Perl deferred interpolation of string

Truncate a URL while ignoring DIV tags

Perl Regex on a mechanize->content page

Regexp to find youtube url, strip off parameters and return clean video url?

Categories

Resources