Compare words in Prestashop Smarty tpl file (Cyrillic symbols) - compare

ALMOST found solution here
But as i can understand THIS {if $haystack1|strstr:"_thestring_"}Found!{/if} not working with non Latin symbols...
The problem: I need to check if string 'терминалы' exist in $payment_method.desc variable
Here is a Smarty code
(The Variable **$payment_method.desc** contain this text 'Оплата наличными через кассы и терминалы'):
{assign "desc" $payment_method.desc}
{assign "var_1" "терминалы"}
{if $desc|#mb_stristr:$var_1|#var_dump}Found!{/if}
{if $desc|#mb_strstr:$var_1|#var_dump}Found!{/if}
{if $desc|#strstr:$var_1|#var_dump}Found!{/if}
Same code work if use Latin symbols.

Smarty var declaration uses PHP internal encoding.
You should check the last parameter of mb_* functions related to encoding. Check this: mb_strstr
This post could help you too: php case-insensitive comparison of russian characters
If you are sure that string has Russian characters you should consider convert from "Windows-1251" encoding.
Any PHP function could be called from Smarty, so you could test with all of them.
Good luck.

Related

Selecting URLs using RegExp but ignoring them when surrounded by double quotes

I've searched around quite a bit now, but I can't get any suggestions to work in my situation. I've seen success with negative lookahead or lookaround, but I really don't understand it.
I wish to use RegExp to find URLs in blocks of text but ignore them when quoted. While not perfect yet I have the following to find URLs:
(https?\://)?(\w+\.)+\w{2,}(:[0-9])?\/?((/?\w+)+)?(\.\w+)?
I want it to match the following:
www.test.com:50/stuff
http://player.vimeo.com/video/63317960
odd.name.amazone.com/pizza
But not match:
"www.test.com:50/stuff
http://plAyerz.vimeo.com/video/63317960"
"odd.name.amazone.com/pizza"
Edit:
To clarify, I could be passing a full paragraph of text through the expression. Sample paragraph of what I'd like below:
I would like the following link to be found www.example.com. However this link should be ignored "www.example.com". It would be nice, but not required, to have "www.example.com and www.example.com" ignored as well.
A sample of a different one I have working below. language is php:
$articleEntry = "Hey guys! Check out this cool video on Vimeo: player.vimeo.com/video/63317960";
$pattern = array('/\n+/', '/(https?\:\/\/)?(player\.vimeo\.com\/video\/[0-9]+)/');
$replace = array('<br/><br/>',
'<iframe src="http://$2?color=40cc20" width="500" height="281" frameborder="0" webkitAllowFullScreen mozallowfullscreen allowFullScreen></iframe>');
$articleEntry = preg_replace($pattern,$replace,$articleEntry);
The result of the above will replace any new lines "\n" with a double break "" and will embed the Vimeo video by replacing the Vimeo address with an iframe and link.
I've found a solution!
(?=(([^"]+"){2})*[^"]*$)((https?:\/\/)?(\w+\.)+\w{2,}(:[0-9]+)?((\/\w+)+(\.\w+)?)?\/?)
The first part from (? to *$) what makes it work for me. I found this as an answer in java Regex - split but ignore text inside quotes? by https://stackoverflow.com/users/548225/anubhava
While I had read that question before, I had overlooked his answer because it wasn't the one that "solved" the question. I just changed the single quote to double quote and it works out for me.
add ^ and $ to your regex
^(https?\://)?(\w+\.)+\w{2,}(:[0-9])?\/?((/?\w+)+)?(\.\w+)?$
please notice you might need to escape the slashes after http (meaning https?\:\/\/)
update
if you want it to be case sensitive, you shouldn't use \w but [a-z]. the \w contains all letters and numbers, so you should be careful while using it.

freepascal regexp replace

Is there an easy way to do a RegExp replace in FreePascal/Lazarus?
Hunting around I can see that I can do a match fairly easily, but I'm struggling to find functions to do a search and replace.
What I'm trying to acheive is as follows.
I have an XML file loaded into a SynEdit component.
The XML file has a decalaration at the start
The DTD is held in a seperate file.
I don't want to combine the two in one file, but I do wantto validate the XML as it is being editted.
I'm reading the XML into a string variable and I want to insert the DTD between the and the XML content in a temporary string variable (to create a compliant XML with self contained DTD) that can be parsed and validated.
So essentially I have:
<?Line1?>
Line2
Line3
And I want to do a RegExp type search and replace for '<?Line1?>' replaceing with '<?Line1?>\n<![DTD\nINFO WOULD\nGO HERE\n!]' to give me:
<?Line1?>
<![DTD
INFO WOULD
GO HERE
!]
Line2
Line3
For example in PHP I would use:
preg_replace('/(<\?.*\?>)/im','$1
<![DTD
INFO WOULD
GO HERE
!]',$sourcestring);
But there doesn't seem to be an equivalent set of regexp functions for FreePascal / Lazarus - just a simple/basic RegExp match function.
Or is there an easier way without using regular expressions - I don't want to assume that the declaration is always there in the correct position on Line 1 though - just to complicate things.
Thanks,
FM
As far as I know, the PerlRegEx unit isn't compatible with Free Pascal. But you can use the RegExpr unit, which comes with Free Pascal.
If I understand correctly, you want a replacement with substitution. Here is a simple example that you can adapt to your need.
{$APPTYPE CONSOLE}
{$IFDEF FPC}{$MODE DELPHI}{$ENDIF}
uses
regexpr;
var
s: string;
begin
s := 'My name is Bond.';
s := ReplaceRegExpr(
'My name is (\w+?)\.',
s,
'His name is $1.',
TRUE // Use substitution
);
WriteLn(s); // His name is Bond.
ReadLn;
end.

URL safe characters RegEx that will allow UTF-8 accents!

I'm looking for a RegEx pattern to use in a rereplace() function that will keep URL safe characters, but include UTF-8 characters with accents. For example: ç and ã.
Something like: url = rereplace(local.url, "pattern") etc. I prefer a ColdFusion only solution, but I'm open to using Java too since it's so easy to integrate with CF.
My URL pattern will look like: /posts/[postId]/[title-with-accents-like-ç-and-ã]
I don't know what language you are using. Perl has some utf8 matching, see for example Tatsuhiko Miyagawa's URI::Find::UTF8
This can be done by matching alpha numeric characters using \w.
rereplace(string, "[^\w]", "", "all")
See this answer for reference.

Find/Replace regex to remove html tags

Using find and replace, what regex would remove the tags surrounding something like this:
<option value="863">Viticulture and Enology</option>
Note: the option value changes to different numbers, but using a regular expression to remove numbers is acceptable
I am still trying to learn but I can't get it to work.
I'm not using it to parse HTML, I have data from one of our company websites that we need in excel, but our designer deleted the original data file and we need it back. I have a list of the options and need to remove the HTML tags, using Notepad++ to find and replace
This works for me Notepad++ 5.8.6 (UNICODE)
search : <option value="\d+">(.*?)</option>
replace : $1
Be sure to select "Regular expression" and ". matches newline"
I have done by using following regular expression:
Find this : <.*?>|</.*?>
and
replace with : \r\n (this for new line)
By using this regular expression (<.*?>|</.*?>) we can easily find value between your HTML tags like below:
I have input:
<otpion value="123">1</option><otpion value="1234">2</option><otpion value="1235">3</option><otpion value="1236">4</option><otpion value="1237">5</option>
I need to find values between options like 1,2,3,4,5
and got below output :
This works perfectly for me:
Select "Regular Expression" in "Find" Mode.
Enter [<].*?> in "Find What" field and leave the "Replace With" field empty.
Note that you need to have version 5.9 of Notepad++ for the ? operator to work.
as found here:
digoCOdigo - strip html tags in notepad++
Something like this would work (as long as you know the format of the HTML won't change):
<option value="(\d+)">(.+)</option>
String s = "<option value=\"863\">Viticulture and Enology</option>";
s.replaceAll ("(<option value=\"[0-9]+\">)([^<]+)</option>", "$2")
res1: java.lang.String = Viticulture and Enology
(Tested with scala, therefore the res1:)
With sed, you would use a little different syntax:
echo '<option value="863">Viticulture and Enology</option>'|sed -re 's|(<option value="[0-9]+">)([^<]+)</option>|\2|'
For notepad++, I don't know the details, but "[0-9]+" should mean 'at least one digit', "[^<]" anything but a opening less-than, multiple times. Masking and backreferences may differ.
Regexes are problematic, if they span multiple lines, or are hidden by a comment, a regex will not recognize it.
However, a lot of html is genereated in a regex-friendly way, always fitting into a line, and never commented out. Or you use it in throwaway code, and can check your input before.

How can I change my regular expression to read UTF-8?

I got very far in a script I am working on only to find out it has a problem reading UTF-8 characters.
I have a contact in Sweden that made a VM on his machine with some UTF-8 in it and when my script hit that VM it lost its mind, but it was able to read all of the other VMs that are in the "normal" charset.
Anyhow, maybe my code will make more sense.
#!/usr/bin/perl
use strict;
use warnings;
#use utf8;
use Net::OpenSSH;
# Create a hash for storing the options needed by Net::OpenSSH
my %ssh_options = (
port => '22',
user => 'root',
password => 'password'
);
# Create a new Net::OpenSSH object
my $ssh = Net::OpenSSH->new('192.168.2.101', %ssh_options);
# Create an array and capture the ESX\ESXi output from the current server
my #getallvms = $ssh->capture('vim-cmd vmsvc/getallvms');
shift #getallvms;
# Process data gathered from server
foreach my $vm (#getallvms) {
# Match ID, NAME
$vm =~ m/^(?<id> \d+)\s+(?<name> .+?)\s+/xm;
my $id = "$+{id}";
my $name = "$+{name}";
print "$id\n";
print "$name\n";
print "\n";
}
I have narrowed it down to my regular expression as the problem, because here the raw output from the server before regular expression is applied.
416
TEST Box åäö!"''*#
And this is what I get after I apply my regular expression
416
TEST
For some reason the regular expression is not matching, I just don't know why. And the current regular expression in the example is the third attempt at getting it to work.
The FULL line that I am matching looks like this. The way my regular expression was done was because I only need the first two blocks of information, the expression you have wants to copy the entire line.
The code:
432 TEST Box åäö!"''*# [Store] TEST Box +w6XDpMO2IQ-_''_+Iw/TEST Box +w6XDpMO2IQ _''_+Iw.vmx slesGuest vmx-04
The subpattern
(?<name> .+?)\s+
in your regular expression means “match and remember one or more non-newline characters, but stop as soon as you find whitespace,” so $name contains TEST because the pattern stopped matching when it saw the space just before Box.
The VI Toolkit wiki gives an example of the getallvms subcommand's output:
# vmware-vim-cmd -H 10.10.10.10 -U root -P password /vmsvc/getallvms
Vmid Name File Guest OS Version Annotation
64 bartPE [store] BartPE/BartPE.vmx winXPProGuest vmx-04
96 trustix [store] Trustix/Trustix.vmx otherLinuxGuest vmx-04
The case is slightly different from the example in your question, but it appears that we can look for [store] as a bumper for the match:
/^(?<id> \d+) \s+ (?<name> .+?) \s+ \[store]/mix
The non-greedy quantifier +? means match one or more of something, but the match wants to hand control to the rest of the pattern as quickly as possible. Remember that [ has a special meaning in regular expressions, but the pattern \[ matches a literal rather than introducing a character class.
I think of this technique as bookending or tacking-and-stretching. If you want to extract a chunk of text that's difficult to characterize, look for surrounding features that are easy to match—often as simple as ^ or $. Then use a stretchy pattern to grab everything in between, usually (.+) or (.+?). Read the “Quantifiers” section of the perlre documentation for an explanation of your many options.
This fixes the immediate problem, and you can also add polish in a few areas.
Do not use $1, $2, and friends unconditionally! Always test that the pattern matches before using capture variables. For example
if (/(foo|bar|baz)/) {
print "got $1\n";
}
else {
print "no match\n";
}
An unprotected print $1 can produce surprising results that are tough to debug.
Judicious use of Perl's defaults can help emphasize the computation and lets the mechanism fade into the background. Dropping $vm in favor of $_ as the implicit loop variable and implicit match target makes for a nicer result.
Your comments merely translate from Perl to English. The most helpful comments explain the why, not the what. Also keep in mind Rob Pike's advice on commenting:
If your code needs a comment to be understood, it would be better to rewrite it so it's easier to understand.
In the assignments from %+, the quotes don't do anything useful. The values are already strings, so remove the quotes.
my $id = $+{id};
my $name = $+{name};
Below is a modified version of your code that captures everything after the number but before [store] into $name. The utf8 pragma declares that your source code—not, as with a common mistake, your input—contains UTF-8. The test below simulates with a canned echo the output from vim-cmd on the Swedish VM.
As Tom suggested, I use the Encode module to decode the output that arrives through the SSH connection and encode it for benefit of the local host before printing it out.
The perlunifaq documentation advises decoding external data into Perl's internal format and then encoding any output just before it's written. I assume that the value returned from $ssh->capture(...) uses UTF-8 encoding, that is, that the remote host is sending UTF-8. We see the expected result because I'm running a modern distribution of Linux and ssh-ing back to it, but in the wild, you may be dealing with some other encoding.
You're able to get away with skipping the calls to decode and encode because Perl's internal format happens to match those of the hosts you're using. In general, however, cutting corners can get you into trouble:
What if I don't decode?
What if I don't encode?
Finally, the code!
#! /usr/bin/env perl
use strict;
use utf8;
use warnings;
use Encode;
use Net::OpenSSH;
my %ssh_options = ();
my $ssh = Net::OpenSSH->new('localhost', %ssh_options);
# Create an array and capture the ESX\ESXi output from the current server
#my #getallvms = $ssh->capture('vim-cmd vmsvc/getallvms');
my #getallvms = $ssh->capture(<<EOEcho);
echo -e 'JUNK\n416 TEST Box åäö!"'\\'\\''*# [Store] TEST Box +w6XDpMO2IQ-_''_+Iw/TEST Box +w6XDpMO2IQ _''_+Iw.vmx slesGuest vmx-04'
EOEcho
shift #getallvms;
for (#getallvms) {
$_ = decode "utf8", $_, Encode::FB_CROAK;
if (/^(?<id> \d+) \s+ (?<name> .+?) \s+ \[store]/mix) {
my $id = $+{id};
my $name = $+{name};
print encode("utf8", $id), "\n",
encode("utf8", $name), "\n",
"\n";
}
else {
print "no match\n";
}
}
Output:
416
TEST Box åäö!"''*#
If you know the string you work on is UTF-8 and Net::OpenSSH doesn't (and hence doesn't mark it as such), you can convert it to an internal representation Perl can work on with one of:
use Encode;
decode_utf8( $in_place );
$decoded = decode_utf8( $raw );
So you have make sure, that Perl understand those names as UTF-8 encoded strings. So far I don't think it has. A comprehensive overview about UTF-8 in Perl.
You can test your strings unicodeness with Encode::is_utf8 and decode them with Encode::decode('UTF-8', $your_string).
UTF-8 is pretty messy still in Perl, IMHO. You must have pretty patient with it.
To print UTF-8 strings out in pretty way, you should use something like that in your script:
BEGIN {
binmode(STDOUT, ':encoding(UTF-8)');
binmode(STDERR, ':encoding(UTF-8)'); # Error messages
}
If you got Perl understand your UTF-8 names, you could regex them properly too.
Recent Net::OpenSSH releases have native support for charset encoding/decoding in capture methods:
my #getallvms = $ssh->capture({stream_encoding => 'utf8'},
'vim-cmd vmsvc/getallvms');