C++: check whether a word is spelled correctly - c++

I'm looking for an easy way to check whether a certain string is a correctly-spelled English word. For example, 'looked' would return True while 'hurrr' would return False. I don't need spelling suggestions or any spelling-correcting features. Just a simple function that takes a string and returns a boolean value.
I could do this easily with Python using PyEnchant, but it seems you have to compile the library yourself if you want to use it with MS Visual C++.

PyEnchant is based on Enchant, which is a C library providing C and C++ interfaces. So you can just use that for C++. The minimal example will be something like this:
#include <memory>
#include <cstdio>
#include "enchant.h"
#include "enchant++.h"
int main ()
{
try
{
enchant::Broker *broker = enchant::Broker::instance ();
std::auto_ptr<enchant::Dict> dict (broker->request_dict ("en_US"));
const char *check_checks[] = { "hello", "helllo" };
for (int i = 0; i < (sizeof (check_checks) / sizeof (check_checks[0])); ++i)
{
printf ("enchant_dict_check (%s): %d\n", check_checks[i],
dict->check (check_checks[i]) == false);
}
} catch (const enchant::Exception &) {
return 1;
}
}
For more examples/tests, see their SVN repository.

If you want to implement such function on your own, you'll need a database to query in order to find out whether a given word is valid (usually a plain text file is enough, like /usr/share/dict/words on Linux).
Otherwise you could rely upon a third party spellcheck library that does just that.

You could take one of the GNU dictionaries out there (like /usr/share/dict/words as mentioned) and build it into an appropriate data structure that'll give you fast lookup and membership checking depending on your performance needs, something like a directed acyclic word graph or even just a trie might be sufficient.

You'd need a word list, for starters. (/usr/share/dict/words maybe?)
You should read your word list into a std::set. Then a correct-spelling test consists simply of checking all the user input words for whether or not they are in the set.

bool spell_check(std::string const& str)
{
std::cout << "Is '" << str << "' spelled correctly? ";
std::string input;
std::getline(input);
return input[0] == 'y' || input[0] == 'Y';
}

Related

Having issues with basic a basic string operation [c++, visual studio 2013]

[RESOLVED]
I'm reading in lines from a .txt file that stores scrambled up radio messages, not that it matters. One of the lines is called "it" and some of it's opening characters are numbers, the numbers are followed by a '/'. I'm trying to export these first few numbers into a string(called "s1") so it can be later be used as a single integer, here's the code that is supposed to do that:
for (i = 0; i < it.find('/'); i++)
{
s1[i] = it[i];
}
cout << s1;
but i get a string subscript range error message, what did i screw up?
EDIT
Issue is resolved now, thank you for helping out an absolute newbie :D My mistake was not knowing how strings work, for an actual answer from someone who understands the issue find Ben Voigt's replies. Correct code is: s1+= it[i];
You most likely attempted to assign outside of the allocated memory for s1.
Assuming it and s1 are std::string:
for (auto c : it)
{
if (c == '/')
{
break;
}
s1 += c;
}
cout << s1;
I'm assuming the line you originally have is of the type std::string.
The error message "subscript out of range" basically tells that you are trying to access a invalid position of an array, in your case, a string. This commonly happens when the number between brackets [ ] is out of range of the container.
One way to avoid this type of bug is to use iterator. By using iterators, you can traverse through the entire container without manually calling operator[].
Consider the following code for a simple idea, this would separate the part before / from the original string and save it to the new variable:
#include <iostream>
#include <string>
int main() {
std::string s = "42/fortytwo";
std::string result = "";
for (std::string::iterator it=s.begin(); it!=s.end(); ++it) {
if (*it == '/') break;
result += *it;
}
std::cout << result;
return 0;
}
Note:
To use iterators, a class must have correctly implemented * (unary operator), !=, ++, begin(), end() operators.
For a more consistence usage, a const iterator can be used.

How to implement diacritics in c++?

I need a help getting words from a .txt file which also contains diacritics. (So there are words containing ěščř etc. Btw that's czech diacritics if that helps.)
My function gets words I type, but it won't get words I type in console containing diacritics.
I think I have to set something in my Microsoft Visual c++ 2010 but I'm not sure what and where. In case I'm wrong, there's the function.
bool find(char typedword[50])
{
bool found = false;
char * word = new char [50];
fstream dictionary;
dictionary.open("Dictionary.txt", ios::in);
while (dictionary >> word)
{
if (strcmp(typedword, word) == 0)
{
found = true;
break;
}
}
dictionary.close();
if (found == true)
return true;
else
return false;
}
Thank you for all your help!
You need locale support, so that sequences of combining characters and the composite equivalent compare equal.
The portable way is setlocale and use strcoll instead of strcmp.
The Windows way is to use CompareStringEx (which automatically uses OS locale settings) instead of strcmp. NormalizeString may also be helpful.

How to set the argv[ ] to be case-insensitive in a Win32 Console Application?

Solved below.
Original problem:
How can I get comparisons against argv[] to be case-insensitive? Here is a code fragment:
if (std::string(argv[2]) == "HKCU") //Si escriben HKCU
{
cout << "Has escrito HKCU" << endl;
}
else //Si no escriben la clave
{
cout << "Debes especificar HCKU o HKLM" << endl;
}
If I pass the parameter "hkcu" the test does not work, I have to type "HKCU". If I compare for either "HKCU" or "hkcu" in the program either string will work.
EDIT: I had to use _stricmp (Using VS2013) this way:
if (_stricmp(argv[2], "HKCU") == 0)
You need to rethink your question. You don't want the 'string' to be case insensitive, but much rather your comparison to realize, that HKCU is the same as Hkcu or hKcU.
For this end, there's a number of options, one of which is the already mentioned function stricmp.
Prototype is:
#include <string.h>
int stricmp(const char *string1, const char *string2);
Meaning, you'd use it like:
if(stricmp(argv[2], "HKCU") == 0) {
}
Another option is the strcasecmp function which operates similarly.
Hope this helps.
If boost is an option then you can use iequals
if (boost::iequals(std::string(argv[2]), "HKCU")) {
...
}
Another option is to just use strcasecmp
if (0 == strcasecmp(argv[2], "HKCU")) {
...
}
argv[] in a C/C++ program is just a string as passed in by the shell. You have to change the comparison in your program to be case-insensitive.

check if a pointer has some string c++

I am not good with c++ and I cannot find this anywhere, please apologize me if it is a bad question. I have a pointer and I want to know if some names store in this pointer begins with some specific string. As in python something like (maybe it is a bad example):
if 'Pre' in pointer_name:
This is what I have:
double t = 0;
for (size_t i =0; i < modules_.size(); ++i){
if(module_[i].name() == "pre"){ // here is what I want to introduce the condition
if (modules_[i].status() == 2){
std::cout << module_[i].name() << "exists" << std::endl;
}
}
}
The equivalent of Python 'Pre' in string_name is:
string_name.find("Pre") != std::string::npos // if using string
std::strstr(pointer_name, "Pre") // if using char*
The equivalent of Python string_name.startswith('Pre') ("begins with some specific string") is:
string_name.size() >= 3 && std::equal(string_name.begin(), string_name.begin() + 3, "Pre"); // if using string
string_name.find("Pre") == 0 // less efficient when it misses, but shorter
std::strncmp(pointer_name, "Pre", 3) == 0 // if using char*
In two of those cases, in practice, you might want to avoid using a literal 3 by measuring the string you're searching for.
Check std::string::find, there are enough good examples. If you are using c-style string, use strstr.
You can use the algorithm header file to do most of things usually one liners in python.
In this case though it might be just easier to use string find method .
If your name variable is of type std::string then you can use name().compare("Pre") == 0 for string comparison.
EDIT: Seems I misunderstood the question, for contains you can use string find, as other said.
Using C style strings, char * is not recommended in C++. They are error prone.

Read file and extract certain part only

ifstream toOpen;
openFile.open("sample.html", ios::in);
if(toOpen.is_open()){
while(!toOpen.eof()){
getline(toOpen,line);
if(line.find("href=") && !line.find(".pdf")){
start_pos = line.find("href");
tempString = line.substr(start_pos+1); // i dont want the quote
stop_pos = tempString .find("\"");
string testResult = tempString .substr(start_pos, stop_pos);
cout << testResult << endl;
}
}
toOpen.close();
}
What I am trying to do, is to extrat the "href" value. But I cant get it works.
EDIT:
Thanks to Tony hint, I use this:
if(line.find("href=") != std::string::npos ){
// Process
}
it works!!
I'd advise against trying to parse HTML like this. Unless you know a lot about the source and are quite certain about how it'll be formatted, chances are that anything you do will have problems. HTML is an ugly language with an (almost) self-contradictory specification that (for example) says particular things are not allowed -- but then goes on to tell you how you're required to interpret them anyway.
Worse, almost any character can (at least potentially) be encoded in any of at least three or four different ways, so unless you scan for (and carry out) the right conversions (in the right order) first, you can end up missing legitimate links and/or including "phantom" links.
You might want to look at the answers to this previous question for suggestions about an HTML parser to use.
As a start, you might want to take some shortcuts in the way you write the loop over lines in order to make it clearer. Here is the conventional "read line at a time" loop using C++ iostreams:
#include <fstream>
#include <iostream>
#include <string>
int main ( int, char ** )
{
std::ifstream file("sample.html");
if ( !file.is_open() ) {
std::cerr << "Failed to open file." << std::endl;
return (EXIT_FAILURE);
}
for ( std::string line; (std::getline(file,line)); )
{
// process line.
}
}
As for the inner part the processes the line, there are several problems.
It doesn't compile. I suppose this is what you meant with "I cant get it works". When asking a question, this is the kind of information you might want to provide in order to get good help.
There is confusion between variable names temp and tempString etc.
string::find() returns a large positive integer to indicate invalid positions (the size_type is unsigned), so you will always enter the loop unless a match is found starting at character position 0, in which case you probably do want to enter the loop.
Here is a simple test content for sample.html.
<html>
<a href="foo.pdf"/>
</html>
Sticking the following inside the loop:
if ((line.find("href=") != std::string::npos) &&
(line.find(".pdf" ) != std::string::npos))
{
const std::size_t start_pos = line.find("href");
std::string temp = line.substr(start_pos+6);
const std::size_t stop_pos = temp.find("\"");
std::string result = temp.substr(0, stop_pos);
std::cout << "'" << result << "'" << std::endl;
}
I actually get the output
'foo.pdf'
However, as Jerry pointed out, you might not want to use this in a production environment. If this is a simple homework or exercise on how to use the <string>, <iostream> and <fstream> libraries, then go ahead with such a procedure.