Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I'm sorry if the title is undescriptive, I'm not sure how to summarize my issue into a few words.
I'm looking to find which characters are physically near other characters on my QWERTY (UK, but I don't mind if you provide information specific to US) keyboard.
e.g:
charsNearChar('j') // OUTPUT -> U,I,H,K,N,M.
I can't seem to wrap my head around any solutions beside switch cases for each individual character, any help is appreciated!
There is no (simple) calculation that you could perform to get the list of adjacent keys. You simply need to use an explicitly written list of adjacent keys for each key.
any solutions beside switch cases
You don't need switch cases. What you're essentially asking for is a graph where nodes are keys and edges are to "adjacent" keys.
There are many ways to represent graphs. For your use case, perhaps an easy to understand, and reasonably fast choice is to use an associative map from key to its adjacency list (a vector of chars, or a string):
std::unordered_map<char, std::string> {
{'J', "UIHKNM"},
....
};
Since you limit the functionality to alphanumeric keys, they have an interesting property of being in a hexagonal grid. Such grid could well be represented with a 2D matrix:
char grid[][] = {
"123...",
"QWE...",
"ASD...",
"ZXC...",
};
This representation has less repetition, and the adjacency lists can be generated form this matrix with an algorithm.
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I have a data that consists of DNA-sequences, where the words represented as kmers of length 6, and the sentences represented as DNA-sequences. Each DNA-sequence has 80 kmers (words)
The list of kmers I have is around 130,000 kmers, but after removing the duplicate elements, I would have 4500 kmers only. So, this huge gap confused me in regarding removing the duplicate kmers or not. My question is, is it recommended in this case to remove the duplicated kmers in the word2vec algorithm?
Thanks.
Without an example, it's unclear what you mean by "removing the duplicate elements". (Does that mean, when the same token appears twice in a row? Or twice in one "sentence"? Or, as I'm not familiar with what your data looks like in this domain, something else entirely?)
That you say there are 130,000 tokens in the vocabulary, but then 4,500 later, is also confusing. Typically the "vocabulary" size is the number of unique tokens. Removing duplicate tokens couldn't possibly change the number of unique tokens encountered.
In the usual domain of word2vec, natural language, words don't often repeat one-after-another. To the extent they sometimes might – as in say the utterance "it's very very hot in here" – it's not really an important enough case that I've noticed anyone commenting about handling that "very very" differently than any other two words.
(If a corpus had some artificially-duplicated full-sentences, it might be the case that you'd want to try discarding the exact-duplicate-sentences. Word2vec benefits from a variety of different usage-examples. Repeating the same sentence 10 times essentially just overweights those training examples – it's not nearly as good as 10 contrasting, but still valid, examples of the same words' usage.)
You're in a different domain that's not natural language, with different co-occurrence frequencies, and different end-goals. Word2vec might prove useful, but it's unlikely any general rules-of-thumb or recommendations from other domains will be useful. You should test things both ways, evaluate the results on your ultimate task in a robust repeatable way, and choose based on what you discover.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
So we know that the basic structures which form the backbone of C++ algorithms are:
trees set
queue
linkedlist
array
vector map
unordered_map and pair.
My question is which data structure is suitable for which application.For instance I know that for Database indexing and searching preferred choices are B+ tree and Hash table.Can anyone shed some more light on this,
This is not only a C++ problem, but also an algorithm question. It maybe too broad, but I can give you some advice.
set and map: They are ordered container, it is used for a both manytimes-insert-and-read structure. It can finish insert delete read in O(logn) time.
vector: used for something like dynamic array or a structure you will frequently push_back at it, and if no other reason, you should use it.
deque: much like vector, but it can also finish push_front in O(1) time
list: used for a structure you need to frequently insert, but less random access
unordered_map and unordered_set: look for hash table
array: used for a structure whose size is fixed.
pair and tuple: bind many object into one struct. Nothing special
Beside all of this, there are also some container meeting other requirement, you can serach them.
e.g. any and optional
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
Please describe (without implementation!) algorithm (possibly fastest)
which receives string of n letters and positive integer k as
arguments, and prints the most frequent substring of length k (if
there are multiple such substrings, algorithm prints any one of them).
String is composed of letters "a" and "b". For example: for string
ababaaaabb and k=3 the answer is "aba", which occurs 2 times (the fact
that they overlap doesn't matter). Describe an algorithm, prove its
correctness and calculate its complexity.
I can use only most basic functions of C++: no vectors, classes, objects etc. I also don't know about strings, only char tables. Can someone please explain to me what the algorithm would be, possibly with implementation in code for easier understanding? That's question from university exam, that's why it's so weird.
A simple solution is by trying all possible substrings from left to right (i.e. starting from indices i=0 to n-k), and comparing each to the next substrings (i.e. starting from indices j=i+1 to n-k).
For every i-substring, you count the number of occurrences, and keep a trace of the most frequentso far.
As a string comparison costs at worst k character comparisons, and you will be performing (n-k-1)(n-k)/2 such comparisons and the total cost is of order O(k(n-k)²). [In fact the cost can be lower because some of the string comparisons may terminate early, but I an not able to perform the evaluation.]
This solution is simple but probably not efficient.
In theory you can reduce the cost using a more efficient string matching algorithm, such as Knuth-Morris-Pratt, resulting in O((n-k)(n+k)) operations.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
One of my friend has been asked with a question in an interview:
The best possible way to search for a given value among N unsorted numbers in a array.
If the array is unsorted, you need to perform a linear scan of the list. This examines (worst case) every element in the array. Such a search is O(n).
Sorting won't help here, since the best sorts run in O(n log n).
If it was sorted I'd say std::binary_search, but for unsorted, just go with std::find (unless the container you use has a member find; if it does, then use that as it is probably faster).
It depends:
-if you just want to search a single value, the answer given by bush is enough.
-if you know what you want to perform repeated "query", it could be better to perform before some kind of preprocessing,in order to find fastly these value.If you want to know only if a value is contained in the array, you can use structures like hashset,bloom filter,etc..
In other cases,you would like to know also position of items inside the array. In this scenario,you can consider to use an hashmap
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
randomly came across this:
Develop an algorithm to compare two sentences to see if
they match or not. The key aspect of these sentences is that
the words could be in any order (e.g. "california is hot" and "
hot is california" are two sentences that would match).
any ideas?
Parse each sentence into words, use space as delimiters.
Add all std::string words to a std::vector<std::string>, then sort.
Use the ==operator to compare the two vectors for equality.
Perhaps put words into a std::map<string, int> and count up the element each time you find a word on the one side, and down on the other side, then iterate over the map and check that all entries are zero. [This assumes that "california is hot hot" isn't supposed to be the same as "hot is california", in which case you need a bit more logic, to only count words the first time you see them on each side]
Or put each word in each sentence into a std::vector<string>, then sort each vector and compare the two vectors. Again, strategy changes if the sentence needs to be recognised regardless of the number of times each word is seen.