Neo4J, cypher and wildcards not working - regex

I have read the various answers on SO and also the help pages for neo4j. However, I can't get my wildcard match to work. For example, if I put in the cypher query
MATCH (author:Author )-[:WROTE]->(article:Article)
WHERE article.id =~ 'Art10526689' RETURN author, article.date
I get the correct answer. however, If I put in the query
MATCH (author:Author )-[:WROTE]->(article:Article)
WHERE article.id =~ "Art1052668*" RETURN author, article.date
I do not get anything returned. I have used '"' because it seems that the lucene might be sensitive, and the '=~' because it was suggested it was better than simply doing (article:Article {id:'Art1052668*'}), though that doesn't work either.
As usual, any help will be deeply appreciated!
Regards, Richard

Richard, you are close to an answer. It think what is happening is you an misconstruing wild carding with the regular expression syntax supported by Neo4j. In you query the 8* actually means match 8 and 0..infinitely more 8s. If you are looking to just replace the 9 in the article id with a single character then you would use the . character. If you would like 0..infinite characters after the 8 then you would use Art1052668.*. You can add case insensitivity too with (?i), see example below...
MATCH (author:Author )-[:WROTE]->(article:Article)
WHERE article.id =~ "(?i)Art1052668.*"
RETURN author, article.date

Related

How to extract FirstName and LastName from html tags with regex?

I have response body which contains
"<h3 class="panel-title">Welcome
First Last </h3>"
I want to fetch 'First Last' as a output
The regular expression I have tried are
"Welcome(\s*([A-Za-z]+))(\s*([A-Za-z]+))"
"Welcome \s*([A-Za-z]+)\s*([A-Za-z]+)"
But not able to get the result. If I remove the newline and take it as
"<h3 class="panel-title">Welcome First Last </h3>" it is detecting in online regex maker.
I suspect your problem is the carriage return between "Welcome" and the user name. If you use the "single-line mode" flag (?s) in your regex, it will ignore newlines. Try these:
(?s)Welcome(\s*([A-Za-z]+))(\s*([A-Za-z]+))
(?s)Welcome \s*([A-Za-z]+)\s*([A-Za-z]+)
(this works in jMeter and any other java or php based regex, but not in javascript. In the comments on the question you say you're using javascript and also jMeter - if it is a jMeter question, then this will help. if javaScript, try one of the other answers)
Well, usually I don't recommend regex for this kind of work. DOM manipulation plays at its best.
but you can use following regex to yank text:
/(?:<h3.*?>)([^<]+)(?:<\/h3>)/i
See demo at https://regex101.com/r/wA2sZ9/1
This will extract First and Last names including extra spacing. I'm sure you can easily deal with spaces.
In jmeter reg exp extractor you can use:
<h3 class="panel-title">Welcome(.*?)</h3>
Then take value using $1$.
In the data you shown welcome is followed by enter.If actually its part of response then you have to use \n.
<h3 class="panel-title">Welcome\n(.*?)</h3>
Otherwise above one is enough.
First verify this in jmeter using regular expression tester of response body.
Welcome([\s\S]+?)<
Try this, it will definitely work.
Regular expressions are greedy by default, try this
Welcome\s*([A-Za-z]+)\s*([A-Za-z]+)
Groups 1 and 2 contain your data
Check it here

Selecting URLs using RegExp but ignoring them when surrounded by double quotes

I've searched around quite a bit now, but I can't get any suggestions to work in my situation. I've seen success with negative lookahead or lookaround, but I really don't understand it.
I wish to use RegExp to find URLs in blocks of text but ignore them when quoted. While not perfect yet I have the following to find URLs:
(https?\://)?(\w+\.)+\w{2,}(:[0-9])?\/?((/?\w+)+)?(\.\w+)?
I want it to match the following:
www.test.com:50/stuff
http://player.vimeo.com/video/63317960
odd.name.amazone.com/pizza
But not match:
"www.test.com:50/stuff
http://plAyerz.vimeo.com/video/63317960"
"odd.name.amazone.com/pizza"
Edit:
To clarify, I could be passing a full paragraph of text through the expression. Sample paragraph of what I'd like below:
I would like the following link to be found www.example.com. However this link should be ignored "www.example.com". It would be nice, but not required, to have "www.example.com and www.example.com" ignored as well.
A sample of a different one I have working below. language is php:
$articleEntry = "Hey guys! Check out this cool video on Vimeo: player.vimeo.com/video/63317960";
$pattern = array('/\n+/', '/(https?\:\/\/)?(player\.vimeo\.com\/video\/[0-9]+)/');
$replace = array('<br/><br/>',
'<iframe src="http://$2?color=40cc20" width="500" height="281" frameborder="0" webkitAllowFullScreen mozallowfullscreen allowFullScreen></iframe>');
$articleEntry = preg_replace($pattern,$replace,$articleEntry);
The result of the above will replace any new lines "\n" with a double break "" and will embed the Vimeo video by replacing the Vimeo address with an iframe and link.
I've found a solution!
(?=(([^"]+"){2})*[^"]*$)((https?:\/\/)?(\w+\.)+\w{2,}(:[0-9]+)?((\/\w+)+(\.\w+)?)?\/?)
The first part from (? to *$) what makes it work for me. I found this as an answer in java Regex - split but ignore text inside quotes? by https://stackoverflow.com/users/548225/anubhava
While I had read that question before, I had overlooked his answer because it wasn't the one that "solved" the question. I just changed the single quote to double quote and it works out for me.
add ^ and $ to your regex
^(https?\://)?(\w+\.)+\w{2,}(:[0-9])?\/?((/?\w+)+)?(\.\w+)?$
please notice you might need to escape the slashes after http (meaning https?\:\/\/)
update
if you want it to be case sensitive, you shouldn't use \w but [a-z]. the \w contains all letters and numbers, so you should be careful while using it.

How to Regex Multiple URLs From Same Variable In Perl

I'm trying to search a field in a database to extract URLs. Sometimes there will be more than 1 URL in a field and I would like to extract those in to separate variables (or an array).
I know my regex isn't going to cover all possibilities. As long as I flag on anything that starts with http and ends with a space I'm ok.
The problem I'm having is that my efforts either seem to get only 1 URL per record or they get only 1 the last letter from each URL. I've tried a couple different techniques based on solutions other have posted but I haven't found a solution that works for me.
Sample input line:
Testing http://marko.co http://tester.net Just about anything else you'd like.
Output goal
$var[0] = http://marko.co
$var[1] = http://tester.net
First try:
if ( $status =~ m/http:(\S)+/g ) {
print "$&\n";
}
Output:
http://marko.co
Second try:
#statusurls = ($status =~ m/http:(\S)+/g);
print "#statusurls\n";
Output:
o t
I'm new to regex, but since I'm using the same regex for each attempt, I don't understand why it's returning such different results.
Thanks for any help you can offer.
I've looked at these posts and either didn't find what I was looking for or didn't understand how to implement it:
This one seemed the most promising (and it's where I got the 2nd attempt from, but it didn't return the whole URL, just the letter: How can I store regex captures in an array in Perl?
This has some great stuff in it. I'm curious if I need to look at the URL as a word since it's bookended by spaces: Regex Group in Perl: how to capture elements into array from regex group that matches unknown number of/multiple/variable occurrences from a string?
This one offers similar suggestions as the first two. How can I store captures from a Perl regular expression into separate variables?
Solution:
#statusurls = ($status =~ m/(http:\S+)/g);
print "#statusurls\n";
Thanks!
I think that you need to capture more than just one character. Try this regex instead:
m/http:(\S+)/g

Regex to match URL not followed by " or <

I'm trying to modify the url-matching regex at http://daringfireball.net/2010/07/improved_regex_for_matching_urls to not match anything that's already part of a valid URL tag or used as the link text.
For example, in the following string, I want to match http://www.foo.com, but NOT http://www.bar.com or http://www.baz.com
www.foo.com http://www.baz.com
I was trying to add a negative lookahead to exclude matches followed by " or <, but for some reason, it's only applying to the "m" in .com. So, this regex still returns http://www.bar.co and http://www.baz.co as matches.
I can't see what I'm doing wrong... any ideas?
\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))(?!["<])
Here is a simpler example too:
((((ht|f)tps?:\/\/)|(www.))[a-zA-Z0-9_\-.:#/~}?]+)(?!["<])
I looked into this issue last year and developed a solution that you may want to look at - See: URL Linkification (HTTP/FTP) This link is a test page for the Javascript solution with many examples of difficult-to-linkify URLs.
My regex solution, written for both PHP and Javascript - is not simple (but neither is the problem as it turns out.) For more information I would recommend also reading:
The Problem With URLs by Jeff Atwood, and
An Improved Liberal, Accurate Regex Pattern for Matching URLs by John Gruber
The comments following Jeff's blog post are a must read if you want to do this right...
Note also that John Gruber's regex has a component that can go into realm of catastrophic backtracking (the part which matches one level of matching parentheses).
Yeah, its actually trivial to make it work if you just want to exclude trailing characters, just make your expression 'independent', then no backtracking will occurr in that segment.
(?>\b ...)(?!["<])
A perl test:
use strict;
use warnings;
my $str = 'www.foo.com http://www.baz.comhttp://www.some.com';
while ($str =~ m~
(?>
\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))
)
(?!["<])
~xg)
{
print "$1\n";
}
Output:
www.foo.com
http://www.some.com

How do I escape an apostrophe in my XPath text query with Perl and Selenium?

I have an XPath query which needs to match some text in a span attribute, as follows:
my $perl_query = qq(span[text\(\)='It's a problem']);
$sel->click_ok($perl_query);
Where the text has no apostrophe there is no problem.
I've tried the following instead of 'It's a problem':
'It\'s a problem'
'It&apos\;s a problem'
'It\${apos}s a problem' #some thread on Stackoverflow suggested that this was a solution implemented by Selenium, but it doesn't work.
Any ideas?
On a different note, if I can't solve this, I'd be happy enough matching 'a problem' but not sure how to do regex matching in XPath with Selenium.
Thanks for any pointers
It's an XPath problem rather than the Perl problem.
The problem was discussed and answered here in great detail:
http://kushalm.com/the-perils-of-xpath-expressions-specifically-escaping-quotes (broken link)
In a nutshell, modify your xquery to assemble the quote-containing string using concat()
my $perl_query = qq(span[text\(\)=concat("It","'","s a problem"]);
A couple of suggestions; hopefully at least one of them will work:
my $perl_query = qq!span[text()='It\\'s a problem']!;
my $perl_query = qq!span[text()="It's a problem"]!;
I just had the same problem and google didn't give me a satisfied solution.
I tried to substring this: value=' - ending with an Apostrophe.
My XPath that works look like:
"substring-after(., concat('value=', ''''))"
So four Apostrophes in a row.
Well the post is quite old. But here goes my working answer for those who still come wandering around looking for escaping single apostrophe and unable to find proper answer.
Text = It's a problem
Solution xpath = //div[text()=\"It's a problem\"]
or
Solution xpath = //div[contains(text(),\"It's a\")]
Is it possible that the actual text on the web page is a curly quote and not a straight apostrophe? Also, you may have extra space at the beginning and end of the span, so that the strict equality against your string won't match.
Consider breaking up your string if possible:
my $spanValue = q/text()='It's a problem'/;
my $perlQuery = qq/span[$spanValue]/;
# $perlQuery = span[text()='It's a problem']
The solution to escaping apostrophes in xpath string literals is to double the apostrophe, e.g.
qq(span[text()='It''s a problem'])