To remove garbage characters from a string using regex

To remove garbage characters from a string using regex - regex

I want to remove characters from a string other then a-z, and A-Z. Created following function for the same and it works fine.
public String stripGarbage(String s) {
String good = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyz";
String result = "";
for (int i = 0; i < s.length(); i++) {
if (good.indexOf(s.charAt(i)) >= 0) {
result += s.charAt(i);
}
}
return result;
}
Can anyone tell me a better way to achieve the same. Probably regex may be better option.
Regards
Harry

Here you go:
result = result.replaceAll("[^a-zA-Z0-9]", "");
But if you understand your code and it's readable then maybe you have the best solution:
Some people, when confronted with a
problem, think "I know, I'll use
regular expressions." Now they have
two problems.

The following should be faster than anything using regex, and your initial attempt.
public String stripGarbage(String s) {
StringBuilder sb = new StringBuilder(s.length());
for (int i = 0; i < s.length(); i++) {
char ch = s.charAt(i);
if ((ch >= 'A' && ch <= 'Z') ||
(ch >= 'a' && ch <= 'z') ||
(ch >= '0' && ch <= '9')) {
sb.append(ch);
}
}
return sb.toString();
}
Key points:
It is significantly faster use a StringBuilder than string concatenation in a loop. (The latter generates N - 1 garbage strings and copies N * (N + 1) / 2 characters to build a String containing N characters.)
If you have a good estimate of the length of the result String, it is a good idea to preallocate the StringBuilder to hold that number of characters. (But if you don't have a good estimate, the cost of the internal reallocations etc amortizes to O(N) where N is the final string length ... so this is not normally a major concern.)
Searching testing a character against (up to) 3 character ranges will be significantly faster on average than searching for a character in a 62 character String.
A switch statement might be faster especially if there are more character ranges. However, in this case it will take many more lines of code to list the cases for all of the letters and digits.
If the non-garbage characters match existing predicates of the Character class (e.g. Character.isLetter(char) etc) you could use those. This would be a good option if you wanted to match any letter or digit ... rather than just ASCII letters and digits.
Other alternatives to consider are using a HashSet<Character> or a boolean[] indexed by character that were pre-populated with the non-garbage characters. These approaches work well if the set of non-garbage characters is not known at compile time.

This regex works:
result=s.replace(/[^A-Z0-9a-z]/ig,'');
s being the string passed to you function and result is the string with alphanumeric and numbers only.

I know this post is old, but you can shorten Stephen C's answer a little by using the System.Char structure.
public String RemoveNonAlphaNumeric(String value)
{
StringBuilder sb = new StringBuilder(value);
for (int i = 0; i < value.Length; i++)
{
char ch = value[i];
if (Char.IsLetterOrDigit(ch))
{
sb.Append(ch);
}
}
return sb.ToString();
}
Still accomplishes the same thing in a more compact fashion.
The Char has some really great functions for checking text. Here are some for your future reference.
Char.GetNumericValue()
Char.IsControl()
Char.IsDigit()
Char.IsLetter()
Char.IsLower()
Char.IsNumber()
Char.IsPunctuation()
Char.IsSeparator()
Char.IsSymbol()
Char.IsWhiteSpace()

this works:
public static String removeGarbage(String s) {
String r = "";
for ( int i = 0; i < s.length(); i++ )
if ( s.substring(i,i+1).matches("[A-Za-z]") ) // [A-Za-z0-9] if you want include numbers
r = r.concat(s.substring(i, i+1));
return r;
}
(edit: although it's not so efficient)

/**
* Remove characters from a string other than ASCII
*
* */
private static StringBuffer goodBuffer = new StringBuffer();
// Static initializer for ACSII
static {
for (int c=1; c<128; c++) {
goodBuffer.append((char)c);
}
}
public String stripGarbage(String s) {
//String good = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyz";
String good = goodBuffer.toString();
String result = "";
for (int i = 0; i < s.length(); i++) {
if (good.indexOf(s.charAt(i)) >= 0) {
result += s.charAt(i);
}
else
result += " ";
}
return result;
}

Related

Checking if the first character of all the strings are same or not in a array of strings

I have an array of strings, I want to check whether the first characters of all the strings are the same or not.
I know how to retrieve the first character of a string, by this method
char first_letter;
first_letter = (*str)[0];
Initially, I thought to go the brute force way, by checking for the first letter for every strings, using a nested for loop.
int flag = 0
char f1,f2;
for(int i = 0;i < size_arr - 1;i++){
f1 = (*str[i])[0];
for(int j = i + 1;j < size_arr;j++){
f2 = (*str[j])[0];
if(f1 != f2)
flag += 1;
}
}
if(!(flag))
cout<<"All first characters same";
else
cout<<"Different";
But I need an approach to find whether the first letters of all the strings present in an array are the same or not. Is there any efficient way?

You needn't use a nested for loop.Rather modify your code this way
for(int i = 0;i < size_arr - 2;i++){
f1 = (*str[i])[0];
f2 = (*str[i+1])[0];
if( f1!=f2 ){
printf("not same characters at first position");
break;
flag=1;
}
}
if(flag==0)printf("same characters at first position");

I made this C approach for you (it's because you have used character arrays here, not std::string of C++ – so it's convenient to describe using C code):
#include <stdio.h>
#define MAX_LENGTH 128
int main(void) {
char string[][MAX_LENGTH] = {"This is string ONE.", "This one is TWO.",
"This is the third one."};
char first_letter = string[0][0];
int total_strs = sizeof(string) / sizeof(string[0]);
int FLAG = 1;
// Iterate through each letter of each string
for (int i = 0; i < total_strs; i++)
// First letter of the string is equal to first_letter?
if (string[i][0] != first_letter) {
FLAG = 0; // set to 0 as soon as it finds
break; // the initial_letter is NOT equal to the first
} // letter
if (FLAG)
fprintf(stdout, "The strings have the same initial letters.\n");
else
fprintf(stdout, "Not all strings have the same initial letters.\n");
return 0;
}
If you want to convert it to a C++ code, no big issue – just replace stdio.h with iostream, int FLAG = 1 with bool FLAG = true, fprintf() to std::cout statements, that's it.
In case you need to work with std::string for the same job, just simply get the array of those strings, set the flag as true by default, iterate through each string, and match in case the first string's initial letter is equivalent to others, eventually, mark the flag as false in as soon as a defected string is found.
The program will display (if same initial vs. if not):
The strings have the same initial letters.
Not all strings have the same initial letters.

C++ insert symbol if rotation is specific character

I'm trying my luck at decrypting/crypting and I want to insert characters if a said rotation would result in a few specific characters. I have a constant string called CHARS ="ABCXYZabcxyz". My crypted string at the moment is "eDhrS3S0/".
I am using ASCII rotation 4, and if the current string character would be one of my characters from "CHAR" I want to add / before and / after the said character, but I cant get it working, this is my code at the moment for this.
const string CHARS="ABCXYZabcxyz";
string crypt = "eDhrS3S0/", encrypted;
string cryptTemp = crypt;
for (int i=0; i<cryptTemp.length(); i++){
for (int j=0; j<CHARS.length(); j++){
if (((int)crypt[i]-4) == (int)CHARS[j]){
crypt.insert(crypt[i],"0",-1);
crypt.insert(crypt[i],CHARS[j], 0);
crypt.insert(crypt[i],"0",+1);
}
}
}
I manage to replace the characters if they match chars without rotation, but once I add "-5" in the if statement nothing happens and I am really stuck at this point. The first character in the string "e" should translate to "a" after I remove 4 from it, but I cant get it working.

Adding some separation of concerns will make your code clearer:
Pull out the rot4 code into a separate function.
Explicitly call this function and assign the result to c
Use std::string#find instead of a loop.
Accumulate all characters in ret and return that.
char rot4(char c) {
bool wasupper = isupper(c);
c = tolower(c);
int value = int(c - 'a') - 4;
if (value < 0) value += 26;
c = value + (wasupper ? 'A' : 'a');
return c;
}
string decrypt(string crypt) {
string ret;
for (int i=0; i<crypt.length(); i++){
char c = rot4(crypt[i]);
if (CHARS.find(c) != string::npos) {
ret += '/';
ret += c;
ret += '/';
} else {
ret += c;
}
}
return ret;
}
As for your original question, I'm pretty sure you were using the wrong overload of std::string#insert.

How to replace certain items in a char array with an integer in C++?

Below is an example code that is not working the way I want.
#include <iostream>
using namespace std;
int main()
{
char testArray[] = "1 test";
int numReplace = 2;
testArray[0] = (int)numReplace;
cout<< testArray<<endl; //output is "? test" I wanted it 2, not a '?' there
//I was trying different things and hoping (int) helped
testArray[0] = '2';
cout<<testArray<<endl;//"2 test" which is what I want, but it was hardcoded in
//Is there a way to do it based on a variable?
return 0;
}
In a string with characters and integers, how do you go about replacing numbers? And when implementing this, is it different between doing it in C and C++?

If numReplace will be in range [0,9] you can do :-
testArray[0] = numReplace + '0';
If numReplace is outside [0,9] you need to
a) convert numReplace into string equivalent
b) code a function to replace a part of string by another evaluated in (a)
Ref: Best way to replace a part of string by another in c and other relevant post on SO
Also, since this is C++ code, you might consider using std::string, here replacement, number to string conversion, etc are much simpler.

You should look over the ASCII table over here: http://www.asciitable.com/
It's very comfortable - always look on the Decimal column for the ASCII value you're using.
In the line: TestArray[0] = (int)numreplace; You've actually put in the first spot the character with the decimal ASCII value of 2. numReplace + '0' could do the trick :)
About the C/C++ question, it is the same in both and about the characters and integers...
You should look for your number start and ending.
You should make a loop that'll look like this:
int temp = 0, numberLen, i, j, isOk = 1, isOk2 = 1, from, to, num;
char str[] = "asd 12983 asd";//will be added 1 to.
char *nstr;
for(i = 0 ; i < strlen(str) && isOk ; i++)
{
if(str[i] >= '0' && str[i] <= '9')
{
from = i;
for(j = i ; j < strlen(str) && isOk2)
{
if(str[j] < '0' || str[j] > '9')//not a number;
{
to=j-1;
isOk2 = 0;
}
}
isOk = 0; //for the loop to stop.
}
}
numberLen = to-from+1;
nstr = malloc(sizeof(char)*numberLen);//creating a string with the length of the number.
for(i = from ; i <= to ; i++)
{
nstr[i-from] = str[i];
}
/*nstr now contains the number*/
num = atoi(numstr);
num++; //adding - we wanted to have the number+1 in string.
itoa(num, nstr, 10);//putting num into nstr
for(i = from ; i <= to ; i++)
{
str[i] = nstr[i-from];
}
/*Now the string will contain "asd 12984 asd"*/
By the way, the most efficient way would probably be just looking for the last digit and add 1 to it's value (ASCII again) as the numbers in ASCII are following each other - '0'=48, '1'=49 and so on. But I just showed you how to treat them as numbers and work with them as integers and so. Hope it helped :)

loop logic, encrypting array C++

I am trying to perform some operations on an array which the final goal is to do a simple encryption. But anyways my array is 458 characters long which consists of mostly letters and some commas, periods, etc. I am trying to start from last character of array and go to the first character and uppercase all the letters in the array. It reads the last character "" correctly, but then the next step in the for loop is like 4 characters over and skipped a few letters. Is something wrong with my control logic?
void EncryptMessage (ofstream& outFile, char charArray[], int length)
{
int index;
char upperCased;
char current;
for (index = length-1; index <= length; --index)
{
if (charArray[index] >= 'A' && charArray[index] <= 'Z')
{
upperCased = static_cast<char>(charArray[index]);
current = upperCased;
outFile << current;
}
else
{
charArray[index]++;
current = charArray[index];
}
}
}

Change:
for (index = length-1; index <= length; --index)
to:
for (index = length-1; index >= 0; --index)

In the else leg of your if statement, you're setting the value of current, but never writing it out, so all that gets written out are what start as capital letters (and, as others have pointed out, your loop condition isn't correct).
If I were doing this, I'd structure it a bit differently. I'd write a small functor to encrypt a single letter:
struct encrypt {
char operator()(char input) {
if (isupper(input))
return input;
else
return input+1;
}
};
Then I'd put the input into an std::string, and operate on it using std::transform:
std::string msg("content of string goes here.");
std::transform(msg.rbegin(), msg.rend(),
std::ostream_iterator<char>(outFile, ""),
encrypt());

How to find string in a string

I somehow need to find the longest string in other string, so if string1 will be "Alibaba" and string2 will be "ba" , the longest string will be "baba". I have the lengths of strings, but what next ?
char* fun(char* a, char& b)
{
int length1=0;
int length2=0;
int longer;
int shorter;
char end='\0';
while(a[i] != tmp)
{
i++;
length1++;
}
int i=0;
while(b[i] != tmp)
{
i++;
length++;
}
if(dlug1 > dlug2){
longer = length1;
shorter = length2;
}
else{
longer = length2;
shorter = length1;
}
//logics here
}
int main()
{
char name1[] = "Alibaba";
char name2[] = "ba";
char &oname = *name2;
cout << fun(name1, oname) << endl;
system("PAUSE");
return 0;
}

Wow lots of bad answers to this question. Here's what your code should do:
Find the first instance of "ba" using the standard string searching functions.
In a loop look past this "ba" to see how many of the next N characters are also "ba".
If this sequence is longer than the previously recorded longest sequence, save its length and position.
Find the next instance of "ba" after the last one.
Here's the code (not tested):
string FindLongestRepeatedSubstring(string longString, string shortString)
{
// The number of repetitions in our longest string.
int maxRepetitions = 0;
int n = shortString.length(); // For brevity.
// Where we are currently looking.
int pos = 0;
while ((pos = longString.find(shortString, pos)) != string::npos)
{
// Ok we found the start of a repeated substring. See how many repetitions there are.
int repetitions = 1;
// This is a little bit complicated.
// First go past the "ba" we have already found (pos += n)
// Then see if there is still enough space in the string for there to be another "ba"
// Finally see if it *is* "ba"
for (pos += n; pos+n < longString.length() && longString.substr(pos, n) == shortString; pos += n)
++repetitions;
// See if this sequence is longer than our previous best.
if (repetitions > maxRepetitions)
maxRepetitions = repetitions;
}
// Construct the string to return. You really probably want to return its position, or maybe
// just maxRepetitions.
string ret;
while (maxRepetitions--)
ret += shortString;
return ret;
}

What you want should look like this pseudo-code:
i = j = count = max = 0
while (i < length1 && c = name1[i++]) do
if (j < length2 && name2[j] == c) then
j++
else
max = (count > max) ? count : max
count = 0
j = 0
end
if (j == length2) then
count++
j = 0
end
done
max = (count > max) ? count : max
for (i = 0 to max-1 do
print name2
done
The idea is here but I feel that there could be some cases in which this algorithm won't work (cases with complicated overlap that would require going back in name1). You may want to have a look at the Boyer-Moore algorithm and mix the two to have what you want.

The Algorithms Implementation Wikibook has an implementation of what you want in C++.

http://www.cplusplus.com/reference/string/string/find/
Maybe you made it on purpose, but you should use the std::string class and forget archaic things like char* string representation.
It will make you able to use lots of optimized methods, such as string research, etc.

why dont you use strstr function provided by C.
const char * strstr ( const char * str1, const char * str2 );
char * strstr ( char * str1, const char * str2 );
Locate substring
Returns a pointer to the first occurrence of str2 in str1,
or a null pointer if str2 is not part of str1.
The matching process does not include the terminating null-characters.
use the length's now and create a loop and play with the original string anf find the longest string inside.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

To remove garbage characters from a string using regex - regex

Here you go: result = result.replaceAll("[^a-zA-Z0-9]", ""); But if you understand your code and it's readable then maybe you have the best solution: Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.

This regex works: result=s.replace(/[^A-Z0-9a-z]/ig,''); s being the string passed to you function and result is the string with alphanumeric and numbers only.

this works: public static String removeGarbage(String s) { String r = ""; for ( int i = 0; i < s.length(); i++ ) if ( s.substring(i,i+1).matches("[A-Za-z]") ) // [A-Za-z0-9] if you want include numbers r = r.concat(s.substring(i, i+1)); return r; } (edit: although it's not so efficient)

Related

Checking if the first character of all the strings are same or not in a array of strings

C++ insert symbol if rotation is specific character

How to replace certain items in a char array with an integer in C++?

loop logic, encrypting array C++

How to find string in a string

Categories

Resources