How to convert html special characters into their codes using php - html-entities

Actually I want to convert all the special characters into their codes as I've shown below:
! !
" "
# #
$ $
% %
& &
' '
( (
) )
* *
+ +
, ,
- -
. .
/ /
Is there any way to convert them into respected codes in php.

There are functions called html_entity_decode and htmlentities in PHP.

$str="&#33";
echo html_entity_decode($str); //output !

Related

Rearranging elements in Python

i am new to Python and i cant get this.I have a List and i want to take the input from there and write those in files .
p = ['Eth1/1', 'Eth1/5','Eth2/1', 'Eth2/4','Eth101/1/1', 'Eth101/1/2', 'Eth101/1/3','Eth102/1/1', 'Eth102/1/2', 'Eth102/1/3','Eth103/1/1', 'Eth103/1/2', 'Eth103/1/3','Eth103/1/4','Eth104/1/1', 'Eth104/1/2', 'Eth104/1/3','Eth104/1/4']
What i am trying :
with open("abc1.txt", "w+") as fw1, open("abc2.txt", "w+") as fw2:
for i in p:
if len(i.partition("/")[0]) == 4:
fw1.write('int ' + i + '\n mode\n')
else:
i = 0
while i < len(p):
start = p[i].split('/')
if (start[0] == 'Eth101'):
i += 3
key = start[0]
i += 1
while i < len(p) and p[i].split('/')[0] == key:
i += 1
end = p[i-1].split('/')
fw2.write('confi ' + start[0] + '/' + start[1] + '-' + end[1] + '\n mode\n')
What i am looking for :
abc1.txt should have
int Eth1/1
mode
int Eth1/5
mode
int Eth2/1
mode
int Eth 2/4
mode
abc2.txt should have :
int Eth101/1/1-3
mode
int Eth102/1/1-3
mode
int Eth103/1/1-4
mode
int Eth104/1/1-4
mode
So any Eth having 1 digit before " / " ( e:g Eth1/1 or Eth2/2
)should be in one file that is abc1.txt .
Any Eth having 3 digit before " / " ( e:g Eth101/1/1 or Eth 102/1/1
) should be in another file that is abc2.txt and .As these are in
ranges , need to write it like Eth101/1/1-3, Eth102/1/1-3 etc
Any Idea ?
I don't think you need a regex here, at all. All your items begin with 'Eth' followed by one or more digits. So you can check the length of the items before first / occurs and then write it to a file.
p = ['Eth1/1', 'Eth1/5','Eth2/1', 'Eth2/4','Eth101/1/1', 'Eth101/1/2', 'Eth101/1/3','Eth102/1/1', 'Eth102/1/2', 'Eth102/1/3','Eth103/1/1', 'Eth103/1/2', 'Eth103/1/3','Eth103/1/4','Eth104/1/1', 'Eth104/1/2', 'Eth104/1/3','Eth104/1/4']
with open("abc1.txt", "w+") as fw1, open("abc2.txt", "w+") as fw2:
for i in p:
if len(i.partition("/")[0]) == 4:
fw1.write('int ' + i + '\n mode\n')
else:
fw2.write('int ' + i + '\n mode\n')
I refactored your code a little to bring with-statement into play. This will handle correctly closing the file at the end. Also it is not necessary to iterate twice over the sequence, so it's all done in one iteration.
If the data is not as clean as provided, then you maybe want to use regexes. Independent of the regex itself, by writing if re.match(r'((Eth\d{1}\/\d{1,2})', "p" ) you proof if a match object can be created for given regex on the string "p", not the value of the variable p. This is because you used " around p.
So this should work for your example. If you really need a regex, this will turn your problem in finding a good regex to match your needs without any other issues.
As these are in ranges , need to write it like Eth101/1/1-3, Eth102/1/1-3 etc
This is something you can achieve by first computing the string and then write it in the file. But this is more like a separate question.
UPDATE
It's not that trivial to compute the right network ranges. Here I can present you one approach which doesn't change my code but adds some functionality. The trick here is to get groups of connected networks which aren't interrupted by their numbers. For that I've copied consecutive_groups. You can also do a pip install more-itertools of course to get that functionality. And also I transformed the list to a dict to prepare the magic and then retransformed dict to list again. There are definitely better ways of doing it, but this worked for your input data, at least.
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
from itertools import groupby
from operator import itemgetter
p = ['Eth1/1', 'Eth1/5', 'Eth2/1', 'Eth2/4', 'Eth101/1/1', 'Eth101/1/2',
'Eth101/1/3', 'Eth102/1/1', 'Eth102/1/2', 'Eth102/1/3', 'Eth103/1/1',
'Eth103/1/2', 'Eth103/1/3', 'Eth103/1/4', 'Eth104/1/1', 'Eth104/1/2',
'Eth104/1/3', 'Eth104/1/4']
def get_network_ranges(networks):
network_ranges = {}
result = []
for network in networks:
parts = network.rpartition("/")
network_ranges.setdefault(parts[0], []).append(int(parts[2]))
for network, ranges in network_ranges.items():
ranges.sort()
for group in consecutive_groups(ranges):
group = list(group)
if len(group) == 1:
result.append(network + "/" + str(group[0]))
else:
result.append(network + "/" + str(group[0]) + "-" +
str(group[-1]))
result.sort() # to get ordered results
return result
def consecutive_groups(iterable, ordering=lambda x: x):
"""taken from more-itertools (latest)"""
for k, g in groupby(
enumerate(iterable), key=lambda x: x[0] - ordering(x[1])
):
yield map(itemgetter(1), g)
# only one line added to do the magic
with open("abc1.txt", "w+") as fw1, open("abc2.txt", "w+") as fw2:
p = get_network_ranges(p)
for i in p:
if len(i.partition("/")[0]) == 4:
fw1.write('int ' + i + '\n mode\n')
else:
fw2.write('int ' + i + '\n mode\n')

Matching regular expressions

I have a regular expression, it's basically to update log4j syntax to log4j2 syntax, removing the string replacement. The regular expression is as follows
(?:^\(\s*|\s*\+\s*|,\s*)(?:[\w\(\)\.\d+]*|\([\w\(\)\.\d+]*\s*(?:\+|-)\s*[\w\(\)\.\d+]*\))(?:\s\+\s*|\s*\);)
This will successfully match the variables in the following strings
("Unable to retrieve things associated with this='" + thingId + "' in " + (endTime - startTime) + " ms");
("Persisting " + things.size() + " new or updated thing(s)");
("Count in use for thing=" + secondThingId + " is " + countInUse);
("Unable to check thing state '" + otherThingId + "' using '" + address + "'", e);
But not '+ thingCollection.get(0).getMyId()' in
("Exception occured while updating thingId="+ thingCollection.get(0).getMyId(), e);
I am getting better with regular expressions, but this one has me a bit stumped. Thanks!
For some reason, when some people are writing a regex pattern, they forget that the whole of the Perl language is still available
I would just delete all the strings and find the remaining substrings that look like variable names
use strict;
use warnings 'all';
use feature qw/ say fc /;
use List::Util 'uniq';
my #variables;
while ( <DATA> ) {
s/"[^"]*"//g;
push #variables, /\b[a-z]\w*/ig;
}
say for sort { fc $a cmp fc $b } uniq #variables;
__DATA__
("Unable to retrieve things associated with this='" + thingId + "' in " + (endTime - startTime) + " ms");
("Persisting " + things.size() + " new or updated thing(s)");
("Count in use for thing=" + secondThingId + " is " + countInUse);
("Unable to check thing state '" + otherThingId + "' using '" + address + "'", e);
("Exception occured while updating thingId="+ thingCollection.get(0).getMyId(), e);
output
address
countInUse
e
endTime
get
getMyId
otherThingId
secondThingId
size
startTime
thingCollection
thingId
things
You should be able to simplify your regex to match things in between '+' signs.
(?:\+)([^"]*?)(?:[\+,])
Working Example
(Note the ? after the * this makes the * lazy so it matches as little as possible to catch all occurrences)
If you want just the variable you could access the first capture group from that expression or ignore the capture group to get the full match.
Updated Version (?:\+)([^"]*?)(?:[\+,])|\s([^"+]*?)\);Working Example
Note with the new version that the variable might get placed into capture group 2 instead of 1
You might be able to pare it down to this (?:^\(\s*|\s*\+\s*|,\s*)(?:[\w().\s+]+|\([\w().\s+-]*\))(?:(?=,)|\s*\+\s*|\s*\);)
101 regex
It consolidates some constructs.
To fix the immediate problem, I added a comma in some classes.
A note that this kind of regex is fraught with problematic type of flow.
(?:
^ \( \s*
| \s* \+ \s*
| , \s*
)
(?:
[\w().\s+]+
| \( [\w().\s+-]* \)
)
(?:
(?= , )
| \s* \+ \s*
| \s* \);
)

RegEx for computer name validation (cannot be more than 15 characters long, be entirely numeric, or contain the following characters...)

I have these requirements to follow:
Windows computer name cannot be more than 15 characters long, be
entirely numeric, or contain the following characters: ` ~ ! # # $ % ^
& * ( ) = + _ [ ] { } \ | ; : . ' " , < > / ?.
I want to create a RegEx to validate a given computer name.
I can see that the only permitted character is - and so far I have this:
/^[a-zA-Z0-9-]{1,15}$/
which matches almost all constraints except the "not entirely numeric" part.
How to add last constraints to my RegEx?
You could use a negative lookahead:
^(?![0-9]{1,15}$)[a-zA-Z0-9-]{1,15}$
Or simply use two regular expressions:
^[a-zA-Z0-9-]{1,15}$
AND NOT
^[0-9]{1,15}$;
Here is a live example:
var regex1 = /^(?![0-9]{1,15}$)[a-zA-Z0-9-]{1,15}$/;
var regex2 = /^[a-zA-Z0-9-]{1,15}$/;
var regex3 = /^[0-9]{1,15}$/;
var text1 = "lklndlsdsvlk323";
var text2 = "4214124";
console.log(text1 + ":", !!text1.match(regex1));
console.log(text1 + ":", text1.match(regex2) && !text1.match(regex3));
console.log(text2 + ":", !!text2.match(regex1));
console.log(text2 + ":", text2.match(regex2) && !text2.match(regex3));

regex remove punct removes non-punctuation characters in R

While filtering and cleaning text in Hebrew, I found that
gsub("[[:punct:]]", "", txt)
actually removes a relevant character. The character is "ק" and it is located in the "E" spot on the keyboard. Interestingly, the gsub function in R removes the "ק" character and then all words get messed up. Does anyone have an idea why?
According to Regular Expressions as used in R:
Certain named classes of characters are predefined. Their
interpretation depends on the locale (see locales); the interpretation
below is that of the POSIX locale.
Acc. to POSIX locale, [[:punct:]]should capture ! " # $ % & ' ( ) * + , - . / : ; < = > ? # [ \ ] ^ _ ` { | } ~. So, you might need to adjust your regex to remove only the characters you want:
txt <- "!\"#$%&'()*+,\\-./:;<=>?#[\\\\^\\]_`{|}~"
gsub("[\\\\!\"#$%&'()*+,./:;<=>?#[\\^\\]_`{|}~-]", "", txt, perl = T)
Sample program output:
[1] ""

Convert punctuation to space

I have a bunch of strings with punctuation in them that I'd like to convert to spaces:
"This is a string. In addition, this is a string (with one more)."
would become:
"This is a string In addition this is a string with one more "
I can go thru and do this manually with the stringr package (str_replace_all()) one punctuation symbol at a time (, / . / ! / ( / ) / etc. ), but I'm curious if there's a faster way I'd assume using regex's.
Any suggestions?
x <- "This is a string. In addition, this is a string (with one more)."
gsub("[[:punct:]]", " ", x)
[1] "This is a string In addition this is a string with one more "
See ?gsub for doing quick substitutions like this, and ?regex for details on the [[:punct:]] class, i.e.
‘[:punct:]’ Punctuation characters:
‘! " # $ % & ' ( ) * + , - . / : ; < = > ? # [ \ ] ^ _ ` { |
} ~’.
have a look at ?regex
library(stringr)
str_replace_all(x, '[[:punct:]]',' ')
"This is a string In addition this is a string with one more "