Input filtering using scanf

Input filtering using scanf - c++

I want to filter input. I don't know what is the best way. I want words starting with alpha-bates to be read. For example, if the input is:
This is 1 EXAMPLE1 input.
The string should be like this:
This is EXAMPLE1 input
What is the easiest way to filter input like this?
I tried using "%[a-zA-Z]s", but it not working.

Your scan string "%[a-zA-Z]s" probably isn't want you think it is. Drop that trailing s.
"%[a-zA-Z]" will scan a string consisting entirely of lower and uppercase letters. So numbers will be discounted. However, you want to scan alpha-numeric strings that begin with a lower or uppercase letter. scanf doesn't provide a facility to look for a string in that way. You can, instead, scan for an alpha-numeric string with "%[a-zA-Z0-9]", and then drop the scanned input if it the first character of the string is numeric.
Using scanf is tricky for various reasons. The string may be longer than you expect, and cause buffer overflow. If the input isn't in the format you expect, then scanf may fail to advance past the unexpected input. It is usually more reliable to read the input into a buffer unconditionally, and parse the buffer. For example:
const char *wants
= "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
std::string word;
while (std::cin >> word) {
if (!isalpha(word[0])) continue;
std::string::size_type p = word.find_first_not_of(wants);
word = word.substr(0, p);
//... do something with word
}

Related

Splitting of string by spaces and outputting the columns into different arrays

So i have this text file which basically has 2 columns of letters and numbers separated spaces. I want to split these 2 columns and place them in separate arrays.
I tried using the getLine method with space as the delimiter but I am only able to place them in the same array. I can do this with fileOpen.eof method but that causes too many problems in my program
while(getline(openFile, letters, ' ')){
index++;
lettersArray[index] = letters;
}
I expect the output of lettersArray[index] to be a column of letters only.

I think you are using the getline function in the wrong way. Take a look at how it works here: http://www.cplusplus.com/reference/string/string/getline/
You are basically telling the getline function to use the space character to use as the delimiter. So it is processing the letters in the file in the odd numbered iterations of the while loop and the numbers in the file in the even numbered iterations of the while loop.
If you want to stick to using the getline function, here is a possible modification to make it work.
while(getline(openFile, letters, ' ')){
index++;
lettersArray[index] = letters;
getline(openFile, letters);
}
The call to the getline function on the last line of the while loop, gets rid of the remaining part of the current line.

Reading from a file, multiple delimeters

So I have some code which reads from a file and separates by the commas in the file. However some things in the file often have spaces after or before the commas so it's causing a bit of a problem when executing the code.
This is the code that I which reads in the data from the file. Using the same kind of format I was wondering if there was a way to prepare for this spaces
while(getline(inFile, line)){
stringstream linestream(line);
// each field of the inventory file
string type;
string code;
string countRaw;
int count;
string priceRaw;
int price;
string other;
//
if(getline(linestream,type,',') && getline(linestream,code,',')
&& getline(linestream,countRaw,',')&& getline(linestream,priceRaw,',')){
// optional other
getline(linestream,other,',');
count = atoi(countRaw.c_str());
price = atoi(priceRaw.c_str());
StockItem *t = factoryFunction(code, count, price, other, type);
list.tailAppend(t);
}
}

The better approach for those kind of problems is a state machine. Each character that you get should act in a simple way. You don't state if you need spaces between words non delimited by commas, so I suppose you need them. I don't know what you need to do with double spaces, I suppose you need to keep things as are. So start reading one character at a time, there are two variables the start positions and the limit position. When you start you are determining the start position ( state 1 ). If you find any character different than the space character you set that start position to that character and you change your state to ( state 2 ). When in state 2 when you find a non space character you set the limit position to the next position than the character you found. If you find a comma character you get the string that begins form start to limit and you change again into state 1.

Strategy to replace spaces in string

I need to store a string replacing its spaces with some character. When I retrieve it back I need to replace the character with spaces again. I have thought of this strategy while storing I will replace (space with _a) and (_a with _aa) and while retrieving will replace (_a with space) and (_aa with _a). i.e even if the user enters _a in the string it will be handled. But I dont think this is a good strategy. Please let me know if anyone has a better one?

Replacing spaces with something is a problem when something is already in the string. Why don't you simply encode the string - there are many ways to do that, one is to convert all characters to hexadecimal.
For instance
Hello world!
is encoded as
48656c6c6f20776f726c6421
The space is 0x20. Then you simply decode back (hex to ascii) the string.
This way there are no space in the encoded string.
-- Edit - optimization --
You replace all % and all spaces in the string with %xx where xx is the hex code of the character.
For instance
Wine having 12% alcohol
becomes
Wine%20having%2012%25%20alcohol
%20 is space
%25 is the % character
This way, neither % nor (space) are a problem anymore - Decoding is easy.
Encoding algorithm
- replace all `%` with `%25`
- replace all ` ` with `%20`
Decoding algorithm
- replace all `%xx` with the character having `xx` as hex code
(You may even optimize more since you need to encode only two characters: use %1 for % and %2 for , but I recommend the %xx solution as it is more portable - and may be utilized later on if you need to code more characters)

I'm not sure your solution will work. When reading, how would you
distinguish between strings that were orginally " a" and strings that
were originally "_a": if I understand correctly, both will end up
"_aa".
In general, given a situation were a specific set of characters cannot
appear as such, but must be encoded, the solution is to choose one of
allowed characters as an "escape" character, remove it from the set of
allowed characters, and encode all of the forbidden characters
(including the escape character) as a two (or more) character sequence
starting with the escape character. In C++, for example, a new line is
not allowed in a string or character literal. The escape character is
\; because of that, it must be encoded as an escape sequence as well.
So we have "\n" for a new line (the choice of n is arbitrary), and
"\\" for a \. (The choice of \ for the second character is also
arbitrary, but it is fairly usual to use the escape character, escaped,
to represent itself.) In your case, if you want to use _ as the
escape character, and "_a" to represent a space, the logical choice
would be "__" to represent a _ (but I'd suggest something a little
more visually suggestive—maybe ^ as the escape, with "^_" for
a space and "^^" for a ^). When reading, anytime you see the escape
character, the following character must be mapped (and if it isn't one
of the predefined mappings, the input text is in error). This is simple
to implement, and very reliable; about the only disadvantage is that in
an extreme case, it can double the size of your string.

You want to implement this using C/C++? I think you should split your string into multiple part, separated by space.
If your string is like this : "a__b" (multiple space continuous), it will be splited into:
sub[0] = "a";
sub[1] = "";
sub[2] = "b";
Hope this will help!

With a normal string, using X characters, you cannot write or encode a string with x-1 using only 1 character/input character.
You can use a combination of 2 chars to replace a given character (this is exactly what you are trying in your example).
To do this, loop through your string to count the appearances of a space combined with its length, make a new character array and replace these spaces with "//" this is just an example though. The problem with this approach is that you cannot have "//" in your input string.
Another approach would be to use a rarely used char, for example "^" to replace the spaces.
The last approach, popular in a combination of these two approaches. It is used in unix, and php to have syntax character as a literal in a string. If you want to have a " " ", you simply write it as \" etc.

Why don't you use Replace function
String* stringWithoutSpace= stringWithSpace->Replace(S" ", S"replacementCharOrText");
So now stringWithoutSpace contains no spaces. When you want to put those spaces back,
String* stringWithSpacesBack= stringWithoutSpace ->Replace(S"replacementCharOrText", S" ");

I think just coding to ascii hexadecimal is a neat idea, but of course doubles the amount of storage needed.
If you want to do this using less memory, then you will need two-letter sequences, and have to be careful that you can go back easily.
You could e.g. replace blank by _a, but you also need to take care of your escape character _. To do this, replace every _ by __ (two underscores). You need to scan through the string once and do both replacements simultaneously.
This way, in the resulting text all original underscores will be doubled, and the only other occurence of an underscore will be in the combination _a. You can safely translate this back. Whenever you see an underscore, you need a lookahed of 1 and see what follows. If an a follows, then this was a blank before. If _ follows, then it was an underscore before.
Note that the point is to replace your escape character (_) in the original string, and not the character sequence to which you map the blank. Your idea with replacing _a breaks. as you do not know if _aa was originally _a or a (blank followed by a).

I'm guessing that there is more to this question than appears; for example, that you the strings you are storing must not only be free of spaces, but they must also look like words or some such. You should be clear about your requirements (and you might consider satisfying the curiosity of the spectators by explaining why you need to do such things.)
Edit: As JamesKanze points out in a comment, the following won't work in the case where you can have more than one consecutive space. But I'll leave it here anyway, for historical reference. (I modified it to compress consecutive spaces, so it at least produces unambiguous output.)
std::string out;
char prev = 0;
for (char ch : in) {
if (ch == ' ') {
if (prev != ' ') out.push_back('_');
} else {
if (prev == '_' && ch != '_') out.push_back('_');
out.push_back(ch);
}
prev = ch;
}
if (prev == '_') out.push_back('_');

Finding a substring within a string

In C++ I have a phonebook with many names, such as Sinatra, Frank, and I want the user to be able to input any length of string to scan the file for it. Once I have the user input a string of any desired length, how do I scan an entire string of "Sinatra, Frank" for just "Frank" or "Sinatra" or "atra" and see which name(s) it belongs to?

You can use the std::string::find method:
string s = "Sinatra, Frank";
string::sizetype index = s.find("Frank");
This gets you the index of the match (which in this case is 9).

A question: is your phonebook a flat file with each name on a new line (as in your example with "Sinatra, Frank" in a format like "Lastname, Firstname", etc.), or do you have some structure of this phonebook where each name-string is a node of an array, a linked list, etc?
Note that for strstr():
strstr(const char *s1, const char *s2)
locates the first occurrence of string s2 in s1, which may be sufficient for you.
For your input string, always be sure to check size limits in one way or another; if the user enters a string through some interface it should be explicitly handled to ensure it doesn't exceed your storage for it or contain malevolent characters or code.
Ken's solution produces the position of the substring in the original string (and so long as it's not null that mean's there's a 'hit') but doesn't tell you which entry of the phonebook is the hit; your code will need to track which entry/entries are hits so you can return a meaningful set of results.

You can use strstr() to locate one substring within a string.

If it's a std::string, you can use the .find() method of the "Sinatra,Frank" string

input, output and \n's

So I'm trying to solve this problem that asks to look for palindromes in strings, so seems like I've got everything right, however the problem is with the output.
Here's the original and my out put:
http://pastebin.com/c6Gh8kB9
Here's whats been said about input and input of the problem:
Input format :
A file with no more than 20,000
characters. The file has one or more
lines. No line is longer than 80
characters (not counting the newline
at the end).
Output format :
The first line of the output should be the length of the longest
palindrome found. The next line or
lines should be the actual text of the
palindrome (without any surrounding
white space or punctuation but with
all other characters) printed on a
line (or more than one line if
newlines are included in the
palindromic text). If there are
multiple palindromes of longest
length, output the one that appears
first.
Here's how I read the input :
string test;
string original;
while (getline(fin,test))
original += test;
And here's how I output it:
int len = answer.length();
answer = cleanUp(answer);
while (len > 0){
string s3 = answer.substr(0,80);
answer.erase(0,80);
fout << s3 << endl;
len -= 80;
}
cleanUp() is a function to remove the illegal characters from the beginning and the end. I'm guessing that the problem is with \n's and the way I read the input. How can I fix this ?

No line is longer than 80 characters (not counting the newline at the end)
does not imply that every line is 80 characters except for the last, while your output code does assume this by taking 80 characters off answer in every iteration.
You may want to keep the newlines in the string until the output phase. Alternatively, you might store newline positions in a separate std::vector. The first option complicates your palindrome search routine; the second your output code.
(If I were you, I'd also index into answer instead of taking chunks off with substr/erase; your output code is now O(n^2) while it could be O(n).)

After rereading, it appears that I misunderstood the question. I was thinking in terms of each line representing a single word, and the intent is to test whether that "word" is palindromic.
After rereading, I think the question is really more like: "Given a sequence of up to 20,000 characters, find the longest palindromic sub-sequence. Oh, incidentally, the input is broken up into lines of no more than 80 characters."
If that's correct, I'd ignore the line-length completely. I'd read the entire file into a single buffer, then search for palindromes in that buffer.
To find the palindromes, I'd simply walk through each position in the array, and find the longest possible palindrome with that as its center point:
for (int i=1; i<total_chars; i++)
for (n=1; n<min(i, total_chars-i); n++)
if (array[i+n] != array[i-n])
// Candidate palindrome is from array[i-n+1] to array[i+n-1]

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Input filtering using scanf - c++

Related

Splitting of string by spaces and outputting the columns into different arrays

Reading from a file, multiple delimeters

Strategy to replace spaces in string

Finding a substring within a string

input, output and \n's

Categories

Resources