Regex Check Facebook Video URL - regex

I try to check facebook video url using regex.
this is example Valid fb video URL :
https://www.facebook.com/video.php?v=100000000000000 (VALID)
this is example Valid fb video URL with username :
https://www.facebook.com/{username}/videos/100000000000000
note : {username} can contain any string.
example :
https://www.facebook.com/username1/videos/100000000000000 (VALID)
https://www.facebook.com/username2/videos/100000000000000 (VALID)
But my reqex still wrong if i check fb video url with username.
This is my regex :
^http(s)?://(www\.)?facebook.([a-z]+)/(?!(?:video\.php\?v=\d+|usernameFB/videos/\d+)).*$
You can run it :
https://regex101.com/r/dF5iP1/6

This will work for you:
^(https?://www\.facebook\.com/(?:video\.php\?v=\d+|.*?/videos/\d+))$
Demo
https://regex101.com/r/sC6oR2/3

UPDATED October 2018
Neither of the two existing REGEX proposals worked for me, and there are more visible cases than the ones considered.
Here's my REGEX Proposal:
^(?:(?:https?:)?\/\/)?(?:www\.)?facebook\.com\/[a-z\.]+\/videos\/(?:[a-z0-9\.]+\/)?([0-9]+)\/?(?:\?.*)?$
^(?:(?:https?:)?\/\/)?(?:www\.)?facebook\.com\/[a-zA-Z0-9\.]+\/videos\/(?:[a-zA-Z0-9\.]+\/)?([0-9]+)
I ignored video.php, I think it's old enough to safely ignore it.
Matches:
https://www.facebook.com/aguardos.nocturnos/videos/vb.1614866072064590/1828228624061666/?type=2&theater
https://www.facebook.com/aguardos.nocturnos/videos/vb.1614866072064590/1828228624061666?type=2&theater
https://www.facebook.com/aguardos.nocturnos/videos/1828228624061666/
https://www.facebook.com/latavernadelssomnis/videos/1609038972452561/?hc_ref=NEWSFEED
//www.facebook.com/aguardos.nocturnos/videos/1828228624061666/
https://facebook.com/aguardos.nocturnos/videos/1828228624061666/
http://www.facebook.com/aguardos.nocturnos/videos/1828228624061666/
www.facebook.com/aguardos.nocturnos/videos/18282286240612666/
facebook.com/aguardos.nocturnos/videos/18282286240612666/
https://www.facebook.com/aguardos.nocturnos/videos/1828228624061666
https://www.facebook.com/WEAU13News/videos/588612391555522/UzpfSTEzMzAzMDk4NjM6MTAyMTMxMjMzNDE3ODE0MTI/
I do not own nor I have watched any of the videos. I just picked random ones that were on my facebook feed.
Groups
Video ID.
Gotchas
One of the most common Facebook video formats is more complex than I'd like it to be and matching every case perfectly with REGEX would probably lead to a very messy query.
https://www.facebook.com/RolandGarros/videos/10155404760334920/FOO (valid)
https://www.facebook.com/RolandGarros/videos/FOO/10155404760334920 (valid)
https://www.facebook.com/RolandGarros/videos/10155404760334920/FOO/FOO (invalid)
The way this one seems to work is by retrieving the numeric value in the first or second part after videos/.
https://www.facebook.com/RolandGarros/videos/10155361533554920/1015536153355492134
What about this one where two valid numeric values are involved? It seems like the second one is the one that will prevail.
For this reason the REGEX solution above was softened1 to match only the beginning of the Facebook URL, up to the video group that we're looking for. Considering that your goal's probably to extract the video ID, rather than verify the URL, I think that's a valid trade-off. At the end of the day, you'll be checking the video either way (either through API or scrapping) to extract the video information since an ID doesn't mean that the video exists or it's public.
1 Not just softened, but also improved to match the test case format.
Test
You can easily test it yourself # Regex101

This is a little different than Pedro's, but it works well.
^http(?:s)?://(?:www\.)?facebook.(?:[a-z]+)/((?:video\.php\?v=\d+|username\d/videos/\d+)).*$
https://regex101.com/r/nV4rI3/1

Latest:
/(?:https?:\/\/)?(?:www.|web.|m.)?(facebook|fb).(com|watch)\/(?:video.php\?v=\d+|(\S+)|photo.php\?v=\d+|\?v=\d+)|\S+\/videos\/((\S+)\/(\d+)|(\d+))\/?/

That will help you
regexr.com/4tdur
you can use like this
const myURL = "https://www.facebook.com/video.php?v=100000000000000";
const res = /^https?:\/\/www\.facebook\.com.*\/(video(s)?|watch|story)(\.php?|\/).+$/gm.test(myURL);
console.log(res);

The Facebook Video URLs nowadays are of the formats as following:-
https://www.facebook.com/NowThisPolitics/videos/968643940204333/
https://www.facebook.com/chandni.nathani2/videos/10158204539960536/UzpfSTEwMDAwMTc3MzU1MjI2NzoyNzMxNDUyMTYzNTkwNTQy/
Also, since the facebook could be replaced by fb, I created this regex:
/(?:https?:\/{2})?(?:w{3}\.)?(facebook|fb).com\/.*\/videos\/.*/

Related

Regex for dates format

I am working under the Web Application based on ASP.NET MVC 5 and I have a great problem in my project with the field which gives the user the ability to choose format for showing Dates in the application.
The goal is to make RegularExpressionAttribute with the regex for validation date formats inputted by user.
Acceptable formats must be:
m/d/y,
m-d-y,
m:d:y,
d/m/y,
d-m-y,
d:m:y,
y/m/d,
y-m-d,
y:m:d
and the length of the date symbols may be as 'y' so far 'yyyy'. And they can be upper case.
So after hard-coding I've made the acceptable one:
((([mM]{1,4})([\/]{1})([dD]{1,4})([\/]{1})([yY]{1,4}))|(([mM]{1,4})([\-]{1})([dD]{1,4})([\-]{1})([yY]{1,4}))|(([mM]{1,4})([\:]{1})([dD]{1,4})([\:]{1})([yY]{1,4})))|((([dD]{1,4})([\/]{1})([mM]{1,4})([\/]{1})([yY]{1,4}))|(([dD]{1,4})([\-]{1})([mM]{1,4})([\-]{1})([yY]{1,4}))|(([dD]{1,4})([\:]{1})([mM]{1,4})([\:]{1})([yY]{1,4})))|((([yY]{1,4})([\/]{1})([mM]{1,4})([\/]{1})([dD]{1,4}))|(([yY]{1,4})([\-]{1})([mM]{1,4})([\-]{1})([dD]{1,4}))|(([yY]{1,4})([\:]{1})([mM]{1,4})([\:]{1})([dD]{1,4})))|((([yY]{1,4})([\/]{1})([dD]{1,4})([\/]{1})([mM]{1,4}))|(([yY]{1,4})([\-]{1})([dD]{1,4})([\-]{1})([mM]{1,4}))|(([yY]{1,4})([\:]{1})([dD]{1,4})([\:]{1})([mM]{1,4})))
This one works... But according to my scarce regex knowledge and experience I hope to get some help and better example for resolving this puzzle.
Thanks.
You have to generalize a bit.
m{1,4}([:/-])d{1,4}\1y{1,4}|d{1,4}([:/-])m{1,4}\2y{1,4}|y{1,4}([:/-])m{1,4}\3d{1,4}
Explanation:
instead of e.g. [mM] use m and set option for case insensitive match
([:/-]) all allowed delimiters as group
\1...\3 back reference to the delimiter group 1...3

Are my regex just wrong or is there a buggy behaviour in td-agent's format behaviour?

I am using fluentd, elasticsearch and kibana to organize logs. Unfortunately, these logs are not written using any standard like apache, so I had to come up with the regex for the format myself. I used this site here to verify that they are working: http://fluentular.herokuapp.com/ .
The logs have roughly this format here:
DEBUG: 24.04.2014 16:00:00 [SingleActivityStrategy] Start Activitiy 'barbecue' zu verabeiten.
the format regex I am using is as follows:
format /(?<pri>([INFO]|[DEBUG]|[ERROR])+)...(?<date>(\d{2}\.\d{2}\.\d{4})).(?<time>(\d{2}:\d{2}:\d{2})).\[(?<subject>(.*))\].(?<msg>(.*))/
Now, judging by that website that is supposed to test specifically fluentd's behaviour with regexes, the output SHOULD be this one:
Record
Key Value
pri DEBUG
date 24.04.2014
subject SingleActivityStrategy
msg Start Activitiy 'barbecue' zu verabeiten.
Instead though, I have this ?bug? that pri is always shortened to DEBU. Same for ERROR which becomes ERRO, only INFO stays INFO. I am not very experienced with regular expressions and I find it hard to believe that this is a bug, still it confuses me and any help is greatly appreciated.
I'm not sure I can link the complete config file because I dont personally own these log files and I am trying to keep it on a level that my boss won't get mad at me for posting sensitive information, but should it definately be needed, I will post them later on after having asked him how much I can reveal.
In general, the logs always look roughly like this:
First the priority, which is either DEBUG, ERROR or INFO, next the date , next what we call the subject which is always written in [ ] and finally just a message.
Here is a link to fluentular with the format I am using and a teststring that produces the right result in fluentular, but not in my config file:
Fluentular
Sorry I couldn't make it work like a regular link to just click on.
Another link to test out regex with my format and test string is this one:
http://rubular.com/r/dfXOkQYNXP
tl;dr version:
my td-agent format regex cuts off the last letter, although fluentular says it shouldn't. My fault or a bug?
How the regex would look if you're trying to match the data specifically:
(INFO|DEBUG|ERROR)\:\s+(\d{2}\.\d{2}\.\d{4})\s(\d{2}:\d{2}:\d{2})\s\[(.*)\](.*)
In your format string, you were using . and ... for where your spaces and colon should be. I'm not to sure on why this works in Fluentular, but you should have matched the \: explicitly and each space between the values.
So you'd be looking at the following regular expression with the Fluentd fields (which are grouping names):
(?<pri>(INFO|ERROR|DEBUG))\:\s+(?<date>(\d{2}\.\d{2}\.\d{4}))\s(?<time>(\d{2}:\d{2}:\d{2}))\s\[(?<subject>(.*))\]\s(?<msg>(.*))
Meaning your td-agent.conf should look like:
<source>
type tail
path /var/log/foo/bar.log
pos_file /var/log/td-agent/foo-bar.log.pos
tag foo.bar
format /(?<pri>(INFO|ERROR|DEBUG))\:\s+(?<date>(\d{2}\.\d{2}\.\d{4}))\s(?<time>(\d{2}:\d{2}:\d{2}))\s\[(?<subject>(.*))\]\s(?<msg>(.*))/
</source>
I would also take a look into comparing Logstash vs. Fluentd. I like Logstash far more because you create Grok filters to match the type of data you want, and it makes formatting your fields much easier because you are providing an abstraction layer, but you essentially will get the same data.
And I would watch out when you're using sites like Rubular, as they are fairly particular about multi-line matching and the like. I'd suggest something like Regexr which gives immediate feedback and you can set global and multiline matching as well.

Regex - Extract number from a link

I have this link www.xxx.yy/yyy/zzzzzz/xyz-z-yzy-/93797038 and I want to take the number 93797038 in order to pass it into another link.
For example: I want afterwards something like www.m.xxx.yy/93797038 which is the same page as before but in its mobile version.
In general, I know that I have to type www.xxx.yy/(.*) for extracting anything following the in the main url and then I group the result with www.m.xxx.yy/%1 which redirects to the same page but in the mobile version.
Any ideas how to do it?
EDIT: The link www.xxx.yy/yyy/zzzzzz/xyz-z-yzy-/93797038 is automated. The part that is the same each time is only the www.xxx.yy . Every time the system runs produces different urls. I want each time to take the number from those urls, e.g. the 93797038 in my case.
\/(\d+?)$ will get the trailing digits after the final /.
Why you want regex? You can use
string str = #"www.xxx.yy/yyy/zzzzzz/xyz-z-yzy-/93797038";
string digit = str.Split('/').Last();
instead.

How to create Gmail filter searching for text only at start of subject line?

We receive regular automated build messages from Jenkins build servers at work.
It'd be nice to ferret these away into a label, skipping the inbox.
Using a filter is of course the right choice.
The desired identifier is the string [RELEASE] at the beginning of a subject line.
Attempting to specify any of the following regexes causes emails with the string release in any case anywhere in the subject line to be matched:
\[RELEASE\]*
^\[RELEASE\]
^\[RELEASE\]*
^\[RELEASE\].*
From what I've read subsequently, Gmail doesn't have standard regex support, and from experimentation it seems, as with google search, special characters are simply ignored.
I'm therefore looking for a search parameter which can be used, maybe something like atstart:mystring in keeping with their has:, in: notations.
Is there a way to force the match only if it occurs at the start of the line, and only in the case where square brackets are included?
Sincere thanks.
Regex is not on the list of search features, and it was on (more or less, as Better message search functionality (i.e. Wildcard and partial word search)) the list of pre-canned feature requests, so the answer is "you cannot do this via the Gmail web UI" :-(
There are no current Labs features which offer this. SIEVE filters would be another way to do this, that too was not supported, there seems to no longer be any definitive statement on SIEVE support in the Gmail help.
Updated for link rot The pre-canned list of feature requests was, er canned, the original is on archive.org dated 2012, now you just get redirected to a dumbed down page telling you how to give feedback. Lack of SIEVE support was covered in answer 78761 Does Gmail support all IMAP features?, since some time in 2015 that answer silently redirects to the answer about IMAP client configuration, archive.org has a copy dated 2014.
With the current search facility brackets of any form () {} [] are used for grouping, they have no observable effect if there's just one term within. Using (aaa|bbb) and [aaa|bbb] are equivalent and will both find words aaa or bbb. Most other punctuation characters, including \, are treated as a space or a word-separator, + - : and " do have special meaning though, see the help.
As of 2016, only the form "{term1 term2}" is documented for this, and is equivalent to the search "term1 OR term2".
You can do regex searches on your mailbox (within limits) programmatically via Google docs: http://www.labnol.org/internet/advanced-gmail-search/21623/ has source showing how it can be done (copy the document, then Tools > Script Editor to get the complete source).
You could also do this via IMAP as described here:
Python IMAP search for partial subject
and script something to move messages to different folder. The IMAP SEARCH verb only supports substrings, not regex (Gmail search is further limited to complete words, not substrings), further processing of the matches to apply a regex would be needed.
For completeness, one last workaround is: Gmail supports plus addressing, if you can change the destination address to youraddress+jenkinsrelease#gmail.com it will still be sent to your mailbox where you can filter by recipient address. Make sure to filter using the full email address to:youraddress+jenkinsrelease#gmail.com. This is of course more or less the same thing as setting up a dedicated Gmail address for this purpose :-)
Using Google Apps Script, you can use this function to filter email threads by a given regex:
function processInboxEmailSubjects() {
var threads = GmailApp.getInboxThreads();
for (var i = 0; i < threads.length; i++) {
var subject = threads[i].getFirstMessageSubject();
const regex = /^\[RELEASE\]/; //change this to whatever regex you want, this one should cover OP's scenario
let isAtLeast40 = regex.test(subject)
if (isAtLeast40) {
Logger.log(subject);
// Now do what you want to do with the email thread. For example, skip inbox and add an already existing label, like so:
threads[i].moveToArchive().addLabel("customLabel")
}
}
}
As far as I know, unfortunately there isn't a way to trigger this with every new incoming email, so you have to create a time trigger like so (feel free to change it to whatever interval you think best):
function createTrigger(){ //you only need to run this once, then the trigger executes the function every hour in perpetuity
ScriptApp.newTrigger('processInboxEmailSubjects').timeBased().everyHours(1).create();
}
The only option I have found to do this is find some exact wording and put that under the "Has the words" option. Its not the best option, but it works.
I was wondering how to do this myself; it seems Gmail has since silently implemented this feature. I created the following filter:
Matches: subject:([test])
Do this: Skip Inbox
And then I sent a message with the subject
[test] foo
And the message was archived! So it seems all that is necessary is to create a filter for the subject prefix you wish to handle.

preg match email and name from to

i want to find name and email from following formats (also if you know any other format that been getting use in mail application for sending emails, please tell in comment :))
how can i know name and email for following format strings (its one string and can be in any following format):
- jon435#hotmail.com
- james jon435#hotmail.com
- "James Jordan" <jon435#hotmail.com> (gmail format)
- janne - jon44#hotmail.com (possible format)
The answer is straightforward, at least for the email portion. The rest can be special-cased away.
(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")#(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
Proof I'm not insane.
If you only have those strings, it is going to require more work than a simple regular expression. For instance, your first example doesn't include the full name, it is only the e-mail, thus, you would have to use the Microsoft Live ID API to retrieve that information...and that turns out to be really hard.
What exactly are you trying to do? Perhaps there is another way?