Counting number of words in char array C++ - c++

I am working on an algorithm that will count the number of words in a char array. So far it seems to not work the way it should. When a character is reach and it is not whitespace, it should be considered to be part of a word. Once you reach white space, we are not in a word anymore. For example, "Hello World" is two words because of the space between "hello" and "world".
Code:
for(int l = 0; l < count; l++){
if(isalpha(letters[l]) && !in_word){
num_words++;
in_word = true;
}else{
in_word = false;
}
}
sample input:
aaaaa bbb aaa lla bub www
sample output:
13 words
desired output: 6 words
Possible answer:
for(int l = 0; l < count; l++){
if(isalpha(letters[l]) && !in_word){
num_words++;
in_word = true;
}else if(!isalpha(letters[l])){
in_word = false;
}
}

Step through that code (in a debugger, in your head/on paper).
Given the input "abc def"
Assuming in_word = false initially
The first character is 'a', in_word is false, so num_words++, in_word=true
The next character is 'b', in_word is true, so in_word=false
Hopefully you will see what is wrong

easy way to do this: trim the string, count the spaces, add 1

If you want to get nice handling of newlines, spaces punctuation etc you could use a regex. You may even be able to adapt this to work correctly with utf-8 strings too. However it requires C++11 support.
#include <iostream>
#include <string>
#include <regex>
int main ()
{
std::string s ("this subject has a submarine as a subsequence");
std::smatch m;
std::regex e ("\\b(\w*)\\b")
int count = 0;
while (std::regex_search (s,m,e)) {
++count;
s = m.suffix().str();
}
std::cout<<"Number of matches = "<<count<<std::endl;
return 0;
}

Related

Checking if the first character of all the strings are same or not in a array of strings

I have an array of strings, I want to check whether the first characters of all the strings are the same or not.
I know how to retrieve the first character of a string, by this method
char first_letter;
first_letter = (*str)[0];
Initially, I thought to go the brute force way, by checking for the first letter for every strings, using a nested for loop.
int flag = 0
char f1,f2;
for(int i = 0;i < size_arr - 1;i++){
f1 = (*str[i])[0];
for(int j = i + 1;j < size_arr;j++){
f2 = (*str[j])[0];
if(f1 != f2)
flag += 1;
}
}
if(!(flag))
cout<<"All first characters same";
else
cout<<"Different";
But I need an approach to find whether the first letters of all the strings present in an array are the same or not. Is there any efficient way?
You needn't use a nested for loop.Rather modify your code this way
for(int i = 0;i < size_arr - 2;i++){
f1 = (*str[i])[0];
f2 = (*str[i+1])[0];
if( f1!=f2 ){
printf("not same characters at first position");
break;
flag=1;
}
}
if(flag==0)printf("same characters at first position");
I made this C approach for you (it's because you have used character arrays here, not std::string of C++ – so it's convenient to describe using C code):
#include <stdio.h>
#define MAX_LENGTH 128
int main(void) {
char string[][MAX_LENGTH] = {"This is string ONE.", "This one is TWO.",
"This is the third one."};
char first_letter = string[0][0];
int total_strs = sizeof(string) / sizeof(string[0]);
int FLAG = 1;
// Iterate through each letter of each string
for (int i = 0; i < total_strs; i++)
// First letter of the string is equal to first_letter?
if (string[i][0] != first_letter) {
FLAG = 0; // set to 0 as soon as it finds
break; // the initial_letter is NOT equal to the first
} // letter
if (FLAG)
fprintf(stdout, "The strings have the same initial letters.\n");
else
fprintf(stdout, "Not all strings have the same initial letters.\n");
return 0;
}
If you want to convert it to a C++ code, no big issue – just replace stdio.h with iostream, int FLAG = 1 with bool FLAG = true, fprintf() to std::cout statements, that's it.
In case you need to work with std::string for the same job, just simply get the array of those strings, set the flag as true by default, iterate through each string, and match in case the first string's initial letter is equivalent to others, eventually, mark the flag as false in as soon as a defected string is found.
The program will display (if same initial vs. if not):
The strings have the same initial letters.
Not all strings have the same initial letters.

C++ Validating Comma Placement in Numerical Input

I'm taking my first cs course and I'm currently learning the different ways to validate numerical inputs. Below is a bool function I wrote to check comma positions, but when I enter 65,000 it thinks the comma is in the wrong place.
bool commas(string input) {
bool commas = true;
long len = input.length();
int counter = 0;
for(int z = len-1; z >= 0; --z) {
if(counter == 3) {
if(input[z] != ',') {
commas = false;
counter = 0;
}
}
else {
if(input[z] == ',') {
commas = false;
}
else {
++counter;
}
}
}
return commas;
}
The easiest way to figure out if the comma is in the correction position is to go backwards (starting from the rightmost character) within the string.
The reason why this is easier is that if you were to start from the leftmost character and go forward, when you encounter a ,, you don't really know at that point whether that comma is in a valid position. You will only know until later on within the iteration process.
On the other hand, when you go from right-to-left, you know that if you encounter a comma, that comma is in a valid position -- there is no need to wait until you've gone further in the string to determine if the comma is valid.
To do this, it takes an adjustment in the loop to go backwards, and an additional counter to track the current group of 3 digits, since a comma can only occur after a group of 3 digits has been processed.
This is an untested example (except for the simple tests in main):
#include <string>
#include <cctype>
#include <iostream>
bool commas(std::string input)
{
// if the string is empty, return false
if (input.empty())
return false;
// this keeps count of the digit grouping
int counter = 0;
// loop backwards
for (int z = static_cast<int>(input.size()) - 1; z >= 0; --z)
{
// check if the current character is a comma, and if it is in
// position where commas are supposed to be
if (counter == 3)
{
if (input[z] != ',')
return false;
// reset counter to 0 and process next three digits
counter = 0;
}
else
// this must be a digit, else error
if (input[z] == ',')
return false;
else
// go to next digit in group of 3
++counter;
}
// the first character must be a digit.
return isdigit(static_cast<unsigned char>(input[0]));
}
int main()
{
std::string tests[] = { "123,,345",
"123,345",
"123,345,678",
"12,345",
"1223,345",
",123,345",
"1",
"",
"65,00",
"123"};
const int nTests = sizeof(tests) / sizeof(tests[0]);
for (int i = 0; i < nTests; ++i)
std::cout << tests[i] << ": " << (commas(tests[i]) ? "good" : "no good") << "\n";
}
Output:
123,,345: no good
123,345: good
123,345,678: good
12,345: good
1223,345: no good
,123,345: no good
1: good
: no good
65,00: no good
123: good
The way this works is simple -- we just increment a counter and see if the current position we're looking at (position z) in the string is a position where a comma must exist.
The count simply counts each group of 3 digits -- when that group of 3 digits has been processed, then the next character (when going backwards) must be a comma, otherwise the string is invalid. We also check if the current position is where a comma cannot be placed.
Note that at the end, we need to check for invalid input like this:
,123
This is simply done by inspecting the first character in the string, and ensuring it is a digit.

Uppercase characters of a String

The following code is supposed to make the first character uppercase, as well as any other occurrences of that same character.
For example, if the input is "complication", the output should be "CompliCation". But the output is "Complication" instead.
#include <cctype>
#include <iostream>
#include <string>
int main()
{
std::string cadena;
std::cout << "Write a word: ";
std::cin >> word;
for (int i = 0; i < word.length(); i++)
{
if (word[0] == word[i])
word[i] = std::toupper(word[i]);
}
std::cout << word<< '\n';
}
What is wrong with my code?
By the time you compare the second c, the first c has been converted to C. Hence, Cadena[0] == Cadena[i] is false.
Store the first character first and then compare it with the characters of the string.
char c = Cadena[0];
for (i = 0; i < Cadena.length(); i++)
{
if (c == Cadena[i])
Cadena[i] = toupper(Cadena[i]);
}
You can even pre-compute the uppercase character and use it in the loop.
char c = Cadena[0];
char upperC = toupper(c);
for (i = 0; i < Cadena.length(); i++)
{
if (c == Cadena[i])
Cadena[i] = upperC;
}
Because you make the first character upper case, so the 7th character does not match any more.
Modify your loop like this instead.
char c = Cadena[0];
for (i = 0; i < Cadena.length(); i++)
{
if (Cadena[i] == c)
Cadena[i] = toupper(Cadena[i]);
}
After the first loop Cadena[0] = 'C' so when you encounter another occurrence of this letter, you do the test : if ('C' == 'c') which results to false.
You should first capitalize the characters that are the same to the first character (start your loop at i=1), then capitalize the first character after your for loop.
Because once you apply toupper on the very first character it becomes upper case. Then when it is compared with the same character but lowcase, the comparison returns false. Because 'C' is not the same as 'c'.

Finding anagrams in a word list

I have a word list and a file containing a number of anagrams. These anagrams are words found in the word list. I need to develop an algorithm to find the matching words and produce them in an output file. The code I have developed so far has only worked for the first two words. In addition, I can't get the code to play nice with strings containing numbers anywhere in it. Please tell me how I can fix the code.
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main (void)
{
int x = 0, y = 0;
int a = 0, b = 0;
int emptyx, emptyy;
int match = 0;
ifstream f1, f2;
ofstream f3;
string line, line1[1500], line2[50];
size_t found;
f1.open ("wordlist.txt");
f2.open ("file.txt");
f3.open ("output.txt");
while (f1.eof() == 0)
{
getline (f1, line);
line1[x] = line;
x++;
}
while (f2.eof() == 0)
{
getline (f2, line);
line2[y] = line;
y++;
}
//finds position of last elements
emptyx = x-1;
emptyy = y-1;
//matching algorithm
for (y = 0; y <= emptyy; y++)
{
for (x = 0; x <= emptyx; x++)
{
if (line2[y].length() == line1[x].length())
{
for (a = 0; a < line1[x].length(); a++)
{
found = line2[y].find(line1[x][a]);
if (found != string::npos)
{
match++;
line2[y].replace(found, 1, 1, '.');
if (match == line1[x].length())
{
f3 << line1[x] << ", ";
match = 0;
}
}
}
}
}
}
f1.close();
f2.close();
f3.close();
return 0;
}
Step 1: Build an index with a key of the sorted characters in each word in the wordlist and with the value being the the word.
act - cat
act - act
dgo - dog
...
aeeilnppp - pineapple
....
etc...
Step 2: For each anagram you want to find, sort the characters in your anagram word, and then match against the index to retrieve all words from index with matching sorted key.
Trying to improve Mitch Wheat's solution:
Storing both sorted order and the word is really not necessary - store only the sorted string for every word in list.
Anyways, when we read a word from the file we have to sort it to find if it is equal to the sorted string - and the index is indexed on sorted string, so it will not help anyways.
Build a 'position independent' hash with words in word list - also store the sorted string in the hash.
For every word in file, get the 'position independent' hash and check in hashtable.
If hit, sort and compare to every sorted string stored at this position in hash (collisions!).
Thoughts?

To remove garbage characters from a string using regex

I want to remove characters from a string other then a-z, and A-Z. Created following function for the same and it works fine.
public String stripGarbage(String s) {
String good = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyz";
String result = "";
for (int i = 0; i < s.length(); i++) {
if (good.indexOf(s.charAt(i)) >= 0) {
result += s.charAt(i);
}
}
return result;
}
Can anyone tell me a better way to achieve the same. Probably regex may be better option.
Regards
Harry
Here you go:
result = result.replaceAll("[^a-zA-Z0-9]", "");
But if you understand your code and it's readable then maybe you have the best solution:
Some people, when confronted with a
problem, think "I know, I'll use
regular expressions." Now they have
two problems.
The following should be faster than anything using regex, and your initial attempt.
public String stripGarbage(String s) {
StringBuilder sb = new StringBuilder(s.length());
for (int i = 0; i < s.length(); i++) {
char ch = s.charAt(i);
if ((ch >= 'A' && ch <= 'Z') ||
(ch >= 'a' && ch <= 'z') ||
(ch >= '0' && ch <= '9')) {
sb.append(ch);
}
}
return sb.toString();
}
Key points:
It is significantly faster use a StringBuilder than string concatenation in a loop. (The latter generates N - 1 garbage strings and copies N * (N + 1) / 2 characters to build a String containing N characters.)
If you have a good estimate of the length of the result String, it is a good idea to preallocate the StringBuilder to hold that number of characters. (But if you don't have a good estimate, the cost of the internal reallocations etc amortizes to O(N) where N is the final string length ... so this is not normally a major concern.)
Searching testing a character against (up to) 3 character ranges will be significantly faster on average than searching for a character in a 62 character String.
A switch statement might be faster especially if there are more character ranges. However, in this case it will take many more lines of code to list the cases for all of the letters and digits.
If the non-garbage characters match existing predicates of the Character class (e.g. Character.isLetter(char) etc) you could use those. This would be a good option if you wanted to match any letter or digit ... rather than just ASCII letters and digits.
Other alternatives to consider are using a HashSet<Character> or a boolean[] indexed by character that were pre-populated with the non-garbage characters. These approaches work well if the set of non-garbage characters is not known at compile time.
This regex works:
result=s.replace(/[^A-Z0-9a-z]/ig,'');
s being the string passed to you function and result is the string with alphanumeric and numbers only.
I know this post is old, but you can shorten Stephen C's answer a little by using the System.Char structure.
public String RemoveNonAlphaNumeric(String value)
{
StringBuilder sb = new StringBuilder(value);
for (int i = 0; i < value.Length; i++)
{
char ch = value[i];
if (Char.IsLetterOrDigit(ch))
{
sb.Append(ch);
}
}
return sb.ToString();
}
Still accomplishes the same thing in a more compact fashion.
The Char has some really great functions for checking text. Here are some for your future reference.
Char.GetNumericValue()
Char.IsControl()
Char.IsDigit()
Char.IsLetter()
Char.IsLower()
Char.IsNumber()
Char.IsPunctuation()
Char.IsSeparator()
Char.IsSymbol()
Char.IsWhiteSpace()
this works:
public static String removeGarbage(String s) {
String r = "";
for ( int i = 0; i < s.length(); i++ )
if ( s.substring(i,i+1).matches("[A-Za-z]") ) // [A-Za-z0-9] if you want include numbers
r = r.concat(s.substring(i, i+1));
return r;
}
(edit: although it's not so efficient)
/**
* Remove characters from a string other than ASCII
*
* */
private static StringBuffer goodBuffer = new StringBuffer();
// Static initializer for ACSII
static {
for (int c=1; c<128; c++) {
goodBuffer.append((char)c);
}
}
public String stripGarbage(String s) {
//String good = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyz";
String good = goodBuffer.toString();
String result = "";
for (int i = 0; i < s.length(); i++) {
if (good.indexOf(s.charAt(i)) >= 0) {
result += s.charAt(i);
}
else
result += " ";
}
return result;
}