Can anyone help me make this function more efficient - c++

So I am trying to sort through an unordered_map container. The container reads input from a file which is a list of people. Each line in the file will be like rCB, bIA, and this will be stored as an element in the map. The second string in each element acts as a pointer to the next person in the list, so later on it will appear again in a new line, in this case:bIA,TDV.
So far I can sort through in order by creating an unordered_map iterator and using the second string in the find function for the iterator to go to the next element. My problem is going the other way. I am able to sort through the opposite way but the way i have implemented my solution means that it takes a very long time to eventually sort through, as we have input files of 3 million people.
list<string> SortEast(unordered_map<string, string> &TempUMap, unordered_map<string, string>::iterator IT, list<string> &TempList)
{
IT = TempUMap.begin();
while (TempList.size() != (TempUMap.size() + 1))
{
if (IT->second == TempList.front())
{
TempList.emplace_front(IT->first);
IT = TempUMap.begin();
}
IT++;
}
return TempList;
}
I've tried to make this more efficient but I cannot think of how. If i could find the value that would go at the start of the list I could sort in order starting with that value, but again I dont know how I would find this value easily.
Any help would be appreciated.
EDIT:
A sample of one of our input is:
rBC,biA
vnN,CmR
CmR,gnz
Dgu,OWn
lnh,Dgu
OWn,YMO
YMO,SIZ
XbL,Cjj
TDV,jew
iVk,vnN
wTb,rBC
jew,sbE
sbE,iVk
Cjj,wTb
AGn,XbL
gnz,SMz
biA,TDV
SIZ,uvD
SMz,lnh
This is only 20 people. In this case AGn is the first value and uvD is the last. The output I end up with is:
AGn
XbL
Cjj
wTb
rBC
biA
TDV
jew
sbE
iVk
vnN
CmR
gnz
SMz
lnh
Dgu
OWn
YMO
SIZ
uvD
As this file starts with rBC, that is the point at which i need to sort backwards

Can you not simply do something like this:
vector<string> orderAllTheNames(const unordered_map<string, string>& input, const string& begin)
{
vector<string> result;
result.reserve(input.size());
string current = begin;
result.push_back(current);
while(result.size() < input.size())
{
current = input[current];
result.push_back(std::move(current));
}
return result;
}
I may have missed some details as I typed this on my phone. You can add some pointers and/or std::moves if you're worried about too many copies flying around.
I guess it's the same as your solution, but without the awkward list and emplace_front.

Related

Moving object to front of vector pair string c++

I have a question about moving an Object to the front in a vector pair. I already read that one post but I didnt know how to use it with vector pairs or better said how to do it with strings. I dont know how to point with an iterator at the position I need because in my code I also need a count number to compare the two strings or how is it possible to compare something like if(it==ite){..} (i would need 2 loops for my examples so I would name the other one "ite" just as an example.
I have a vector<pair<string,string>> Dictionary and a vector string Text where I go through the dictionary and try to find the same word from the text in the dictionary. I already accomplished that but now I need to move the element that I found in the dictionary to the front and delete it from its old position.
I´m not sure how to do it with vector pairs.
Here´s my code to understand what I mean:
for(size_t i=0;i<Text.size();i++){
for(size_t j=0;j<Duden.size();j++){
if(Text[i]==Duden[j].first){
uebersetzung.push_back(Duden[j].first);
if(Duden[j].first.length()<4){
uebersetzung.push_back(" ");}
if(Duden[j].first.length()<8){
uebersetzung.push_back("\t");} // These are only so it looks cleaner at the end
uebersetzung.push_back("\t\t\t\t: ");
uebersetzung.push_back(Duden[j].second);
uebersetzung.push_back("\n");
// Now here should be the code to rotate the vector so the found element is now at the first position and not at the found position
break;
}
}
}
"Duden" here is dictionary if you´re wondering. I want to swap position of the elements from the dictionary so if the word occurs again in the text it doesnt need to go trough the whole dictionary again and instead just find it directly in the first positions.
How can I accomplish that with using rotate? Or do I need to do it with erase and insert because rotate doesnt work with pairs of vectors?
Consider making use of algorithms in the STL library. To use them, you have to get familiar with iterators. Then you can use something like std::rotate and it does not really matter what is in your vector.
For example your code can be refactored in the following way:
for (auto const& word : Text) {
auto it = std::find(Duden.begin(), Duden.end(), [&word](auto const& entry) {
return entry.first == word;
});
if(it == Duden.end()) {
continue;
}
generate_translation(uebersetzung, *it);
std::rotate(Duden.begin(), it, it+1);
}
The function generate_translation(std::vector<string>&, std::pair<string, string> const&) should be an own function to make the code more readable.
Something like
void generate_translation(std::vector<string>& uebersetzung,
std::pair<string, string> const& entry)
{
uebersetzung.push_back(entry.first);
if(entry.first.length() < 4){
uebersetzung.push_back(" ");
}
if(entry.first.length() < 8){
uebersetzung.push_back("\t");
} // These are only so it looks cleaner at the end
uebersetzung.push_back("\t\t\t\t: ");
uebersetzung.push_back(entry.second);
uebersetzung.push_back("\n");
}

how to find set of distinct strings from a given string after cyclic shifts?

I am solving a [QUESTION][1] in Codeforces where the problem statement asks me to find the set of all distinct strings from a given string after cyclic shifts.
like for example :
Given string :"abcd"
the output should be 4 ("dabc","cdab", "bcda", "abcd")[note:"abcd" is also counted]
So
t=s[l-1];
for(i=l-1;i>0;i--)
{
s[i]=s[i-1];
}
s[0]=t;
I applied above method for length - 1 times for all possible strings but I am unable to find the distinct ones,
is there any STL function to do this?
You may use the following:
std::set<std::string>
retrieve_unique_rotations(std::string s)
{
std::set<std::string> res;
res.insert(s);
if (s.empty()) {
return res;
}
for (std::size_t i = 0, size = s.size() - 1; i != size; ++i) {
std::rotate(s.begin(), s.begin() + 1, s.end());
res.insert(s);
}
return res;
}
Demo
Not sure about STL specific functions, however a general solution could be to have all shifted strings in a list. Then you sort the list and then you iterate over the list elements. When the current element is different to the last, increment the counter.
There is probably a solution that is less memory intensive. For short strings this solution should be sufficient.
You can use vector for making a list after rotating by using vector.push_back("string"). Before each push, You can check if it already exists by using something like:
if (std::find(vector.begin(), vector.end(), "string") != v.end())
{
increment++;
vector.push_back("string");
}
Or else you can count the elements in the end by vector.size(); and remove increment++.
Hope this helps

Reading from unordered_multiset results in crash

While refactoring some old code a cumbersome multilevel-map developed in-house was replaced by an std::undordered_multiset.
The multilevel-map was something like [string_key1,string_val] . A complex algorithm was applied to derive the keys from string_val and resulted in duplicate string_val being stored in the map but with different keys.
Eventually at some point of the application the multilevel-map was iterated to get the string_val and its number of occurrences.
It replaced was an std::unordered_multilevelset and string_val are just inserted to it. It seems much simpler than having an std::map<std::string,int> and checking-retrieving-updating the counter for every insertion.
What I want to do retrieve the number of occurrences of its inserted element, but I do not have the keys beforehands. So I iterate over the buckets but my program crashes upon creation of the string.
// hash map declaration
std::unordered_multiset<std::string> clevel;
// get element and occurences
for (size_t cbucket = clevel->bucket_count() - 1; cbucket != 0; --cbucket)
{
std::string cmsg(*clevel->begin(cbucket));
cmsg += t_str("times=") + \
std::to_string(clevel->bucket_size(cbucket));
}
I do not understand what is going on here, tried to debug it but I am somehow stack( overflown ?) :) . Program crashes in std::string cmsg(*it);
You should consider how multiset actually works as a hashtable. For example reading this introduction you should notice that hash maps actually preallocate their internal buckets , and the number of buckets is optimized.
Therefore if you insert element "hello" , you will probably get a number of buckets already created, but only the one corresponding to hash("hello") will actually have an element that you may dereference. The rest will be let's say invalid.
Dereferencing the iterator to the begin of every bucket results in SEGV which is your case here.
To remedy this situation you should check every time that begin is not past the end.
for (size_t cbucket = clevel->bucket_count() - 1; cbucket != 0; --cbucket)
{
auto it = clevel->begin(cbucket);
if (it != clevel->end(cbucket))
{
std::string cmsg(*it);
cmsg += t_str("times=") + \
std::to_string(clevel->bucket_size(cbucket));
}
}

Need suggestion to improve speed for word break (dynamic programming)

The problem is: Given a string s and a dictionary of words dict, determine if s can be segmented into a space-separated sequence of one or more dictionary words.
For example, given
s = "hithere",
dict = ["hi", "there"].
Return true because "hithere" can be segmented as "leet code".
My implementation is as below. This code is ok for normal cases. However, it suffers a lot for input like:
s = "aaaaaaaaaaaaaaaaaaaaaaab", dict = {"aa", "aaaaaa", "aaaaaaaa"}.
I want to memorize the processed substrings, however, I cannot done it right. Any suggestion on how to improve? Thanks a lot!
class Solution {
public:
bool wordBreak(string s, unordered_set<string>& wordDict) {
int len = s.size();
if(len<1) return true;
for(int i(0); i<len; i++) {
string tmp = s.substr(0, i+1);
if((wordDict.find(tmp)!=wordDict.end())
&& (wordBreak(s.substr(i+1), wordDict)) )
return true;
}
return false;
}
};
It's logically a two-step process. Find all dictionary words within the input, consider the found positions (begin/end pairs), and then see if those words cover the whole input.
So you'd get for your example
aa: {0,2}, {1,3}, {2,4}, ... {20,22}
aaaaaa: {0,6}, {1,7}, ... {16,22}
aaaaaaaa: {0,8}, {1,9} ... {14,22}
This is a graph, with nodes 0-23 and a bunch of edges. But node 23 b is entirely unreachable - no incoming edge. This is now a simple graph theory problem
Finding all places where dictionary words occur is pretty easy, if your dictionary is organized as a trie. But even an std::map is usable, thanks to its equal_range method. You have what appears to be an O(N*N) nested loop for begin and end positions, with O(log N) lookup of each word. But you can quickly determine if s.substr(begin,end) is a still a viable prefix, and what dictionary words remain with that prefix.
Also note that you can build the graph lazily. Staring at begin=0 you find edges {0,2}, {0,6} and {0,8}. (And no others). You can now search nodes 2, 6 and 8. You even have a good algorithm - A* - that suggests you try node 8 first (reachable in just 1 edge). Thus, you'll find nodes {8,10}, {8,14} and {8,16} etc. As you see, you'll never need to build the part of the graph that contains {1,3} as it's simply unreachable.
Using graph theory, it's easy to see why your brute-force method breaks down. You arrive at node 8 (aaaaaaaa.aaaaaaaaaaaaaab) repeatedly, and each time search the subgraph from there on.
A further optimization is to run bidirectional A*. This would give you a very fast solution. At the second half of the first step, you look for edges leading to 23, b. As none exist, you immediately know that node {23} is isolated.
In your code, you are not using dynamic programming because you are not remembering the subproblems that you have already solved.
You can enable this remembering, for example, by storing the results based on the starting position of the string s within the original string, or even based on its length (because anyway the strings you are working with are suffixes of the original string, and therefore its length uniquely identifies it). Then, in the beginning of your wordBreak function, just check whether such length has already been processed and, if it has, do not rerun the computations, just return the stored value. Otherwise, run computations and store the result.
Note also that your approach with unordered_set will not allow you to obtain the fastest solution. The fastest solution that I can think of is O(N^2) by storing all the words in a trie (not in a map!) and following this trie as you walk along the given string. This achieves O(1) per loop iteration not counting the recursion call.
Thanks for all the comments. I changed my previous solution to the implementation below. At this point, I didn't explore to optimize on the dictionary, but those insights are very valuable and are very much appreciated.
For the current implementation, do you think it can be further improved? Thanks!
class Solution {
public:
bool wordBreak(string s, unordered_set<string>& wordDict) {
int len = s.size();
if(len<1) return true;
if(wordDict.size()==0) return false;
vector<bool> dq (len+1,false);
dq[0] = true;
for(int i(0); i<len; i++) {// start point
if(dq[i]) {
for(int j(1); j<=len-i; j++) {// length of substring, 1:len
if(!dq[i+j]) {
auto pos = wordDict.find(s.substr(i, j));
dq[i+j] = dq[i+j] || (pos!=wordDict.end());
}
}
}
if(dq[len]) return true;
}
return false;
}
};
Try the following:
class Solution {
public:
bool wordBreak(string s, unordered_set<string>& wordDict)
{
for (auto w : wordDict)
{
auto pos = s.find(w);
if (pos != string::npos)
{
if (wordBreak(s.substr(0, pos), wordDict) &&
wordBreak(s.substr(pos + w.size()), wordDict))
return true;
}
}
return false;
}
};
Essentially one you find a match remove the matching part from the input string and so continue testing on a smaller input.

C++ Set Erase Entry Question

I encountered a problem here. I'm using C++ multiset. This is the test file.
Score: 3-1
Ben
Steven
Score: 1-0
Ben
Score: 0-0
Score: 1-1
Cole
Score: 1-2
Ben
I'm using while loop and ifstream (fin1) to read in from the test file above.
multiset<string, less<string> > myset;
while(!fin1.eof())
{
fin1 >> scoreName;
if(scoreName == "Score:")
{
//calculates number of matches played
}
else
{
goalCheck = scoreName.substr(1,1);
if(goalCheck == "-")
{
string lGoal, rGoal;
lGoal = scoreName.substr(0,1);
rGoal = scoreName.substr(2,1);
int leftGoal, rightGoal;
leftGoal = atoi(lGoal.c_str());
rightGoal = atoi(rGoal.c_str());
if(leftGoal > rightGoal) //if team wins
{
//some computations
}
else if(leftGoal < rightGoal) //if team loses
{
//computations
}
else if(leftGoal == rightGoal) //if team draws
{
//computations
}
else
{
myset.insert(myset.begin(), scoreName);
}
}
}
I'm inserting all names into myset (regardless of wins/loses/draws) in my last else statement. But I only require the names of those matches who won/draw.
Those names whose matches lost will not be included in myset. In the test file above, there's only one match that lost (1-2) and I wanted to remove "Ben". How can I do that?
I tried to use myset.erase(), but I'm not sure how to get it point to Ben and remove it from myset.
Any help is much appreciated.
Thanks.
If I understand what you're trying to do, I think it would be easier to remember whether the team had won, drawn or lost when you read the "Score" line and only insert the following lines (ie. "Ben") if the team hasn't lost. Inserting everyone and then erasing ones you didn't want seems overcomplicated to me :)
For reference: If you do need to erase, you would use the find() member to locate an entry matching a given key, and then erase() on the returned iterator to delete it (after checking that find() didn't return an iterator equal to end(), which means the item wasn't found).
Also, you shouldn't pass begin() to insert(). The iterator is a hint as to where the map might insert the item; in practice that's rarely useful. There is an overload of that function which takes only one argument, the item to insert.
Create a Score class. Add it a non-member operator>>() so you can parse it easily. Then it will be easy for you decide whether to insert a Score object into the set or not:
if( scr.gained - scr.lost >= 0 )
myset.insert(myset.begin(), scr);