How to find the longest word in a text document?

How to find the longest word in a text document? - c++

I'm looking for a way to find the longest word ( base on length ) in a text document using STL and boost.
Here is my solution. However, it wasn't good at all, there were too many operations( token, sort .. ). Is there any simpler way to solve this problem?
// utility and memory
#include <utility>
#include <functional>
#include <memory>
#include <locale>
#include <string>
// data structure and algorithm
#include <stack>
#include <vector>
#include <list>
#include <set>
#include <map>
#include <deque>
#include <list>
#include <bitset>
#include <algorithm>
#include <iterator>
// numeric
#include <complex>
#include <numeric>
#include <valarray>
// input/output
#include <iostream>
#include <iomanip>
#include <ios>
#include <iosfwd>
#include <streambuf>
#include <sstream>
// standard C
#include <cctype>
#include <cmath>
#include <climits>
#include <cstdlib>
#include <ctime>
#include <cassert>
#include <cstring>
// boost
#include <boost/tokenizer.hpp>
int main() {
std::string str = "Test, test!, test...string";
boost::char_separator<char> sep( ",!,.-" );
boost::tokenizer<boost::char_separator<char> > tokens( str, sep );
std::vector<std::string> res;
std::copy( tokens.begin(), tokens.end(), std::back_inserter( res ) );
std::sort( res.begin(), res.end(), []( const std::string& l, const std::string& r ) { return l.length() > r.length(); } );
std::cout << "longest : " << *res.begin() << "\n";
return 0;
}
Best regards,
Chan

You can use std::max_element. Just give to it iterator pair and the comparator which you've already written.

That's one way, and it works, but there's one problem with it: it runs in O(n lg n), which is the complexity of the sorting algorithm.
An O(n) trivial algorithm would be to scan the tokens one by one and keep track of the longest word at each step. At the end of the loop you'll have one of the longest words in the set. I said one because there may be more than one words with the same length. This way you'd only capture the first you encounter.
You might want to modify the algorithm in order to keep track of all the longest words seen at each step.

Rather than storing and sorting the entire array of word tokens, you can just find the length of each token and compare it with the max_length seen so far. This will reduce your space requirements and also eliminate sorting computational complexity.

Related

Why does including several libraries throw Runtime Error(SIGILL)?

I was getting SIGILL Runtime Error on one of my codes.
But then i noticed that just changing the libraries used made it run normally.
Previous code(throws Runtime Error on C++14):
#pragma GCC target("avx2")
#include <iostream>
#include <algorithm>
#include <vector>
#include <set>
#include <map>
#include <queue>
#include <stack>
#include <list>
#include <chrono>
#include <random>
#include <cstdlib>
#include <cmath>
#include <ctime>
#include <cstring>
#include <iomanip>
Modified version(Gave AC on C++14):
#include<bits/stdc++.h>
What might be the reason for this?

Where to get _fileno and _O_U16TEXT?

I'm trying to print the text "Ääkkösiä ruutuun." to console with c++. I have windows 7 and am using Code::Blocks editor. Searching on the subject I found that maybe these sort of lines would help
_setmode(_fileno(stdout), _O_U16TEXT);
wstring s{L"Ääkkösiä ruutuun."};
wcout<<s<<endl;
But when I try to compile it, I get the error: _fileno was not declared in this scope.
I have all these includes:
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <map>
#include <set>
#include <stdexcept>
#include <cmath>
#include <sstream>
#include <fstream>
#include <codecvt>
#include <locale>
#include <fcntl.h>
#include <io.h>
#include <stdio.h>
#include <cstdio>
#include <ostream>
what am I missing?
Also, one other thing I tried was locale, but then locale::empty wasn't found! Why doesn't my c++ have anything in it?
EDIT
Here is a picture of what my program is doing now.
It prints out just the first letter (Ä). What happens to the rest?
Ok, it seems that setmode sets it so that only one letter gets printed. (Even trying to print normal texts with multiple commands, just results in a single letter.) Without it the scandinavian letters don't print correctly, thought. They look like this:

The answer you found is for Visual Studio, not Code::Blocks.
While the C standard specifies what should in in <stdio.h>, it only specifies a minimum. Implementors may add their own functions, and should do so using an _ (underscore prefix). This is why you should NOT use that prefix. You don't know what you'll break. Microsoft clearly signaled their non-standard extensions using the correct prefix.
The answer is tagged C++, but C++ inherits the contents of <stdio.h> from C.

The line
setlocale(LC_CTYPE, ".OCP");
works!
A complete example:
#include <iostream>
#include <fstream>
#include <string>
#include <sstream>
#include <fstream>
using namespace std;
wstring readFile(const char* filename) {
wifstream wif(filename);
locale myLoc("");
//locale utf8_locale(locale(), new gel::stdx::utf8cvt<true>);
wif.imbue(myLoc);
basic_stringstream<wchar_t> wss;
wss << wif.rdbuf();
return wss.str();
}
int main() {
setlocale(LC_CTYPE, ".OCP");
wstring contents = readFile("test.txt");
wcout<<L"Does anything get printed out at all???"<<endl;
//wcout <<contents<<endl;
wstring s{L"Ääkkösiä ruutuun."};
wcout<<s<<endl;
wcout<<L"Näkyykö äkköset?"<<endl;
return 0;
}
The text read from file (utf-8) still doesn't print correctly, though.
It should be
Hei!
Täällä on kaksi riviä.
ä's go awry there.
Output:

std::getline get error in JNI

I am using jni and want to read file from path, i used:
while (std::getline(file, str)) {
...
}
but it get an error : Function 'getline' could not be resolved, i have added:
#include <vector>
#include <string.h>
#include <jni.h>
#include <fstream>
#include <opencv2/core/core.hpp>
#include <opencv2/imgproc/imgproc.hpp>
#include <opencv2/features2d/features2d.hpp>
#include <iostream>
and they are Ok. How do i resolved this problem? Please help me.

use:
#include <string>
instead of:
#include <string.h>
The latter one is the C variant and will not have the std namespace. Also see:
Difference between <string> and <string.h>?

boost::trim_right_if and null characters

I have the following code:
#include <boost/algorithm/string/classification.hpp>
#include <boost/algorithm/string/trim.hpp>
#include <boost/assign/list_of.hpp>
#include <string>
#include <vector>
int main()
{
std::vector<char> some_vec = boost::assign::list_of('1')('2')('3')('4')('5')('\0')('\0');
std::string str(some_vec.begin(), some_vec.end());
boost::trim_right_if(str, boost::is_any_of("\0"));
}
I think that in str should be "12345", but there's "12345\0\0". Why and how can I solve it?

This code works
boost::trim_right_if(str, boost::is_any_of(boost::as_array("\0") );
The trick is to use boost::as_array

I do not know this functions boost::is_any_of but the fact that its argument is a string literal it seems it considers "\0" as an empty set of characters (en empty string literal). So the algorithm trims nothing.
It is only my supposition.

How to load Multidimensional array values into vector?

This is part of the code (header and the main part):
#include <iostream>
#include <sstream>
#include <string>
#include <gl\GL.h>
#include <gl\GLU.h>
#include <glut.h>
#include <RassHost.h>
#include <api\iomap.h>
#include <api\iotrans.h>
#include <api\cgeometry.h>
#include <vector>
using namespace std;
int main()
{
cout << "Enter IP: " << endl;
getline(cin, server_ip);
enum(KEY_L = 'A', KEY_R = 'D', KEY_RUN = 'WW', KEY_JUMP='SPACE');
typedef OBJECT_3D_SYS_TYPES_NUM OBJECT3D_RCN_TYPE;
OBJECT3D_RCN_TYPE _psyObjects[][] = getPsyhicsPartObjects();
vector<OBJECT3D_RCN_TYPE> _objects;
//I would like to load _psyObjects[][] into vector<OBJECT3D_RCN_TYPE> _objects;
Server::StartGame(Server::getIP(), 8888, "-r run", false);
system("pause");
return 0;
}
Is it possible to copy _psyObjects values into vector<OBJECT3D_RCN_TYPE>?
I want to control the multidimensional array with vector api, if it is possible.
Thanks!

You'll need to create a vector of vectors:
vector< vector<OBJECT3D_RCN_TYPE> > _objects;
Then just fill it like a normal vector.
I'd post more code, but you need to know the dimensions of the array, and I can't see those from the code.

You could also use a Boost::multi_array. It's api is like std::vector's, but possibly similar enough to meet your needs.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to find the longest word in a text document? - c++

You can use std::max_element. Just give to it iterator pair and the comparator which you've already written.

Rather than storing and sorting the entire array of word tokens, you can just find the length of each token and compare it with the max_length seen so far. This will reduce your space requirements and also eliminate sorting computational complexity.

Related

Why does including several libraries throw Runtime Error(SIGILL)?

Where to get _fileno and _O_U16TEXT?

std::getline get error in JNI

boost::trim_right_if and null characters

How to load Multidimensional array values into vector?

Categories

Resources