Remove non-ASCII characters

Remove non-ASCII characters - regex

I have a problem where odd characters (from Word etc) are getting into a field in the database and then when I am showing that field it is showing spurious characters.
Is it possible with a RegEx to remove any non-ASCII characters? Obviously I want people to still be able to use any special characters like !#£$%^&*()_-+= etc just not non-ASCII characters.
If anyone could help that would be great!
Many Thanks!
Updated: This is in CLASSIC ASP.

In order to do this task you will need to build up various regular expressions and execute them with a sub routine call before inserting your record into the database.
Here is an excerpt from an explanation from 1stclassmedia.
str = str.replace( /\s*FONT-FAMILY:[^;"]*;?/gi, "" ) ;
str = str.replace(/<(\w[^>]*) class=([^ |>]*)([^>]*)/gi, "<$1$3") ;
str = str.replace( /<(\w[^>]*) style="([^\"]*)"([^>]*)/gi, "<$1$3" ) ;
str = str.replace( /\s*style="\s*"/gi, '' ) ;
str = str.replace( /<SPAN\s*[^>]*>\s* \s*<\/SPAN>/gi, ' ' ) ;
str = str.replace( /<SPAN\s*[^>]*><\/SPAN>/gi, '' ) ;
str = str.replace(/<(\w[^>]*) lang=([^ |>]*)([^>]*)/gi, "<$1$3") ;
str = str.replace( /<SPAN\s*>(.*?)<\/SPAN>/gi, '$1' ) ;
str = str.replace( /<FONT\s*>(.*?)<\/FONT>/gi, '$1' ) ;
//some RegEx code for the picky browsers
var re = new RegExp("(<P)([^>]*>.*?)(<\/P>)","gi") ;
str = str.replace( re, "<div$2</div>" ) ;
var re2 = new RegExp("(<font|<FONT)([^*>]*>.*?)(<\/FONT>|<\/font>)","gi") ;
str = str.replace( re2, "<div$2</div>") ;
str = str.replace( /size|SIZE = ([\d]{1})/g, '' ) ;

Related

Replace % in the end of string with Regex?

I want to remove " %" at the end of some text. I'd like to do it with a regular expression, because ABAP does not easily handle text at the end of a string.
DATA lv_vtext TYPE c LENGTH 10 VALUE 'TEST %'.
REPLACE REGEX ' %$' IN lv_vtext WITH ''.
But it does not replace anything. When I leave out "$" the text will be removed as expected, but I fear it might find more occurrences than wanted.
I experimented with \z or \Z instead of $, but to no avail.

This answer is about an alternative way without REGEX. POSIX regular expressions are quite slow, moreover some people are reluctant to use it, so if you're not completely closed to do it in normal ABAP:
lv_vtext = COND #( WHEN contains( val = lv_vtext end = ` %` )
THEN substring( val = lv_vtext len = strlen( lv_vtext ) - 2 )
ELSE lv_vtext ).
Code with context:
DATA(lv_vtext) = `test %`.
lv_vtext = COND #( WHEN contains( val = lv_vtext end = ` %` )
THEN substring( val = lv_vtext len = strlen( lv_vtext ) - 2 )
ELSE lv_vtext ).
ASSERT lv_vtext = `test`.

You can use
REPLACE REGEX '\s%\s*$' IN lv_vtext WITH ''
The benefit of using \s is that it matches any Unicode whitespace chars. The \s*$ matches any trailing (white)spaces that you might have missed.
The whole pattern matches
\s - any whitespace
% - a % char
\s* - zero or more whitespaces
$ - at the end of string.

search and replace using regex_replace

I have a string to be searched
QString sObjectName = "looolok"
The regex_search for ".?o" results in 3 matched texts which I push to a vector matchedText
"lo" "oo" "lo"
Now I my replace text is "o"
So I would expect the str to be changed to
oook
I am using boost xpressive regex_replace for this operation . This is my code
std::vector<QString>::iterator it = matchedText.begin();
wsregex regExp;
std::string strOut;
std::string::iterator itStr = strOut.begin(); ;
for( ; it != matchedText.end(); ++it )
{
regExp = wsregex::compile( (*it).toStdWString() );
boost::xpressive::regex_replace( itStr, sObjectName.begin(), sObjectName.end(), regExp, qReplaceBy.toStdString(), regex_constants::format_perl );
}
However the strOut contains ooook.
What am I missing ?

Regex for strings not starting with "My" or "By"

I need Regex which matches when my string does not start with "MY" and "BY".
I have tried something like:
r = /^my&&^by/
but it doesn't work for me
eg
mycountry = false ; byyou = false ; xyz = true ;

You could test if the string does not start with by or my, case insensitive.
var r = /^(?!by|my)/i;
console.log(r.test('My try'));
console.log(r.test('Banana'));
without !
var r = /^([^bm][^y]|[bm][^y]|[^bm][y])/i;
console.log(r.test('My try'));
console.log(r.test('Banana'));
console.log(r.test('xyz'));

if you are only concerned with only specific text at the start of the string than you can use latest js string method .startsWith
let str = "mylove";
if(str.startsWith('my') || str.startsWith('by')) {
// handle this case
}

Try This(Regex is NOT case sensitive):
var r = /^([^bm][y])/i; //remove 'i' for case sensitive("by" or "my")
console.log('mycountry = '+r.test('mycountry'));
console.log('byyou= '+r.test('byyou'));
console.log('xyz= '+r.test('xyz'));
console.log('Mycountry = '+r.test('Mycountry '));
console.log('Byyou= '+r.test('Byyou'));
console.log('MYcountry = '+r.test('MYcountry '));
console.log('BYyou= '+r.test('BYyou'));

A regex for extracting " ; " or "=" symbols from source code?

For example
int val = 13;
Serial.begin(9600);
val = DigitalWrite(900,HIGH);
I really want to extract special symbols like = and ;.
I've been able to extracted symbols that appear adjacent in the code, but I need all occurrences.
I tried [^ "//"A-Za-z\t\n0-9]* and [\;\=\{\}\,]+. Neither worked.
what's wrong?
i had made a rule for my scanner like below.(had been changed)
semicolon [;]([\n]|[^ "//"])
assignment (.)?[=]+
brace ([{]|[}])([\n]|[^ "//"])
roundbarcket ("()")" "
the problem was occurred like these situations
int val= 13; // it couldn't recognize "=" because "val" and "=" is adjoined. i want to recognize them either adjoined or not
serial.read(); // it couldn't recognize () and ; with individually. if i add semicolon rule and roundbarcket rule, (); was recognized.
how can i solve them ?

You want to break "DigitalWrite(900,HIGH);" into "DigitalWrite" "(" "900" "," "HIGH" ")" ";". I think looping each substring is the fastest way.
string text = "val = DigitalWrite(900,HIGH);";
string[] symbols = new string[] { "(", ")", ",", "=", ";"};
List<string> tokens = new List<string>();
string word = "";
for( int i = 0; i < text.Length; i++ )
{
string letter = text.Substring( i, 1 );
if( !letter.Equals( " " ) )
{
if( tokens.Contains( letter ) )
{
if( word.Length > 0 )
{
tokens.Add( word );
word = "";
}
tokens.Add( letter );
}
else
{
word += letter;
if(i == text.Length - 1 )
tokens.Add( word );
}
}
}

So searching for ";" and "=" is the ultimate goal you want to achieve?
In such case, why don't you just use something like .find() function?
Or, you can split strings by ";" first and search for "=" after.
If you want to grab text between "=" and ";", try use =([^;]*); or =(.*?);

Find and Replace with ASP Classic

I have an function in ASP VB. and I need to replace the exact word in it. For example I have an string like "wool|silk/wool|silk". I want to replace just silk and not silk/wool.
' "|" is a devider
cur_val = "wool|silk/wool|silk"
cur_val_spl = Split("wool|silk/wool|silk", "|")
key_val = "silk"
For Each i In cur_val_spl
If i = key_val Then
cur_val = Replace(cur_val, ("|" & i), "")
cur_val = Replace(cur_val, i, "")
End If
Next
Response.Write(cur_val)
In this case my result would be "wool/wool" but what I really want is this "wool|silk/wool".
I really appreciate any help.

You should build a new string as you go
' "|" is a devider
cur_val = "wool|silk/wool|silk"
cur_val_spl = Split("wool|silk/wool|silk", "|")
result = ""
key_val = "silk"
addPipe = false
For Each i In cur_val_spl
If i <> key_val Then
if addPipe then
result = result & "|"
else
addPipe = true
end if
result = result & i
End If
Next
Response.Write(result)

you could do it with a regular expression but this is shorter
cur_val = "wool|silk/wool|silk"
Response.Write left(mid(replace("|"&cur_val&"|","|wool|","|silk|"),2),len(cur_val))
'=>silk|silk/wool|silk
Too bad you allready accepted the other answer 8>)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Remove non-ASCII characters - regex

Related

Replace % in the end of string with Regex?

search and replace using regex_replace

Regex for strings not starting with "My" or "By"

A regex for extracting " ; " or "=" symbols from source code?

Find and Replace with ASP Classic

Categories

Resources