Search and retrieve patterns from a list

Search and retrieve patterns from a list - regex

Let's say I have a list of patterns such like ['AB', ')', '%%', '<.*>'].
I need to search for one of them forward or backward, starting from the cursor position.
Once the first one is found, how do I retrieve its index in the list? I.e, how do I know which one it is?
[EDIT]: the thing is that I actually have two lists of the same size. Once the first match is found in one direction, I'll need to search the corresponding one in the other direction.
PLUS, each pattern is associated with a certain precedence (its index in the list), which I need to retrieve once it is found.
(The overall idea is to build something that would be able to answer this question, with custom delimiters and operators.)

Got it: the searchpos function with the 'p' flag allows you to retrieve the position and the id of the match in for a compound pattern, see :help searchpos.

Related

How to move to next match

I can search in the list of entries like this:
How do I proceed to the next match?

Just use usual Arrow Up / Arrow Down keys for navigating between speed search matches.
(just in case) NOTE: Such search is performed on already opened/expanded nodes only -- it will not search in closed nodes or expand them for you.
https://www.jetbrains.com/help/webstorm/2016.3/speed-search-in-the-tool-windows.html
P.S. https://stackoverflow.com/a/24929227/783119

How to add matches from a search to a list without doing a substitute?

Example:
let hits = []
:5s/regex-search/\=join(add(hits, submatch(0)))/g
This add all the matches in line 5 to a list.
However it does also a substitute in the text.
I tried to add the 'n' flag after the 'g'
but that doesn't add the matches to the list.
Is there any way to resolve my problem?

Almost there. First I don't think you need the join. Second, add returns the list with the match added. So you can just select the last element of the list to be the replaced element. (This makes it seem like nothing got replaced)
s/regex-search/\=add(hits,submatch(0))[-1]/g

With a recent enough Vim version, you can prevent that the actual substitution does take place (and messes up your undo-branches), while the expression on the right side of an :s command is still being evaluated.
You need at least Vim patch Vim patch 7.3.627 and then you can simply use
:s/foobar/\=add(hits, submatch(0))/gn

Vim - Reversed search hotkeys mapping

Good day,
I have found a strange quirk in Vim that I can't explain the cause of, so I will describe it to the best of my abilities.
If there is a word that appears multiple times in a file I am editing, I can highlight all instances of it by moving the cursor over the word, and hitting the pound key (ie: SHIFT+3 ==> #). I can then navigate to the next occurrence of this word by hitting 'N' (ie: SHIFT+n), and the previous instance by hitting 'n'.
However, if I perform a search for a word (eg: "int") by using the search command (ie: /int), using 'N' searches backwards, and 'n' searches forwards, resulting in opposite mappings compared to when I use the # key. Is there something I'm doing wrong? I'm using a minimalist VIMRC at the moment.
Thank you.

No, it's correct. / searches forward, ? backwards (similarly * searches for cword forwards and # backwards). And n redos the search in the same direction and N the opposite direction. It's relative to the initial search method.

Comparing the contents of two lists in prolog

I am having some kind of homework and I am stuck to one point. I am given some facts like those:
word([h,e,l,lo]).
word([m,a,n]).
word([w,o,m,a,n]). etc
and I have to make a rule so that the user will input one list of letters and I should compare the list with the words I have and correct any possible mistakes. Here is the code I am using if the first letter is in the correct place:
mistake_letter([],[]).
mistake_letter([X|L1],[X|L2]):-
word([X|_]),
mistake_letter(L1,L2).
The problem is I don't know how to move to the next letter in the word fact. The next time the backtrack will run it will use the head of the word while I would like to use the second letter in the list. Any ideas on how to solve this?
I am sorry for any grammatical mistakes and I appreciate your help.

In order to move to the next letter in the word fact, you need to make the word from the fact a third argument, and take it along for the ride. In your mistake_letter/2, you will pick words one by one, and call mistake_letter/3, passing the word you picked along, like this:
mistake_letter(L1,L2):-
word(W),
mistake_letter(L1,L2,W).
The you'll need to change your base case to do something when the letters in the word being corrected run out before the letters of the word that you picked. What you do depends on your assignment: you could backtrack mistake_letter([],[],[])., declare a match mistake_letter([],[],_)., attach word's tail to the correction mistake_letter([],W,W). or do something else.
You also need an easy case to cover the situation when the first letter of the word being corrected matches the first letter of the word that you picked:
mistake_letter([X|L1],[X|L2],[X|WT]):-
mistake_letter(L1, L2, WT).
Finally, you need the most important case: what to do when the initial letters do not match. This is probably the bulk of your assignment: the rest is just boilerplate recursion. In order to get it right, you may need to change mistake_letter/3 to mistake_letter/4 to be able to calculate the number of matches, and later compare it to the number of letters in the original word. This would let you drop "corrections" like [w,o,r,l,d] --> [h,e,l,l,o] as having only 20% of matching letters.

Tokenize the text depending on some specific rules. Algorithm in C++

I am writing a program which will tokenize the input text depending upon some specific rules. I am using C++ for this.
Rules
Letter 'a' should be converted to token 'V-A'
Letter 'p' should be converted to token 'C-PA'
Letter 'pp' should be converted to token 'C-PPA'
Letter 'u' should be converted to token 'V-U'
This is just a sample and in real time I have around 500+ rules like this. If I am providing input as 'appu', it should tokenize like 'V-A + C-PPA + V-U'. I have implemented an algorithm for doing this and wanted to make sure that I am doing the right thing.
Algorithm
All rules will be kept in a XML file with the corresponding mapping to the token. Something like
<rules>
<rule pattern="a" token="V-A" />
<rule pattern="p" token="C-PA" />
<rule pattern="pp" token="C-PPA" />
<rule pattern="u" token="V-U" />
</rules>
1 - When the application starts, read this xml file and keep the values in a 'std::map'. This will be available until the end of the application(singleton pattern implementation).
2 - Iterate the input text characters. For each character, look for a match. If found, become more greedy and look for more matches by taking the next characters from the input text. Do this until we are getting a no match. So for the input text 'appu', first look for a match for 'a'. If found, try to get more match by taking the next character from the input text. So it will try to match 'ap' and found no matches. So it just returns.
3 - Replace the letter 'a' from input text as we got a token for it.
4 - Repeat step 2 and 3 with the remaining characters in the input text.
Here is a more simple explanation of the steps
input-text = 'appu'
tokens-generated=''
// First iteration
character-to-match = 'a'
pattern-found = true
// since pattern found, going recursive and check for more matches
character-to-match = 'ap'
pattern-found = false
tokens-generated = 'V-A'
// since no match found for 'ap', taking the first success and replacing it from input text
input-text = 'ppu'
// second iteration
character-to-match = 'p'
pattern-found = true
// since pattern found, going recursive and check for more matches
character-to-match = 'pp'
pattern-found = true
// since pattern found, going recursive and check for more matches
character-to-match = 'ppu'
pattern-found = false
tokens-generated = 'V-A + C-PPA'
// since no match found for 'ppu', taking the first success and replacing it from input text
input-text = 'u'
// third iteration
character-to-match = 'u'
pattern-found = true
tokens-generated = 'V-A + C-PPA + V-U' // we'r done!
Questions
1 - Is this algorithm looks fine for this problem or is there a better way to address this problem?
2 - If this is the right method, std::map is a good choice here? Or do I need to create my own key/value container?
3 - Is there a library available which can tokenize string like the above?
Any help would be appreciated
:)

So you're going through all of the tokens in your map looking for matches? You might as well use a list or array, there; it's going to be an inefficient search regardless.
A much more efficient way of finding just the tokens suitable for starting or continuing a match would be to store them as a trie. A lookup of a letter there would give you a sub-trie which contains only the tokens which have that letter as the first letter, and then you just continue searching downward as far as you can go.
Edit: let me explain this a little further.
First, I should explain that I'm not familiar with these the C++ std::map, beyond the name, which makes this a perfect example of why one learns the theory of this stuff as well as than details of particular libraries in particular programming languages: unless that library is badly misusing the name "map" (which is rather unlikely), the name itself tells me a lot about the characteristics of the data structure. I know, for example, that there's going to be a function that, given a single key and the map, will very efficiently search for and return the value associated with that key, and that there's also likely a function that will give you a list/array/whatever of all of the keys, which you could search yourself using your own code.
My interpretation of your data structure is that you have a map where the keys are what you call a pattern, those being a list (or array, or something of that nature) of characters, and the values are tokens. Thus, you can, given a full pattern, quickly find the token associated with it.
Unfortunately, while such a map is a good match to converting your XML input format to a internal data structure, it's not a good match to the searches you need to do. Note that you're not looking up entire patterns, but the first character of a pattern, producing a set of possible tokens, followed by a lookup of the second character of a pattern from within the set of patterns produced by that first lookup, and so on.
So what you really need is not a single map, but maps of maps of maps, each keyed by a single character. A lookup of "p" on the top level should give you a new map, with two keys: p, producing the C-PPA token, and "anything else", producing the C-PA token. This is effectively a trie data structure.
Does this make sense?
It may help if you start out by writing the parsing code first, in this manner: imagine someone else will write the functions to do the lookups you need, and he's a really good programmer and can do pretty much any magic that you want. Writing the parsing code, concentrate on making that as simple and clean as possible, creating whatever interface using these arbitrary functions you need (while not getting trivial and replacing the whole thing with one function!). Now you can look at the lookup functions you ended up with, and that tells you how you need to access your data structure, which will lead you to the type of data structure you need. Once you've figured that out, you can then work out how to load it up.

This method will work - I'm not sure that it is efficient, but it should work.
I would use the standard std::map rather than your own system.
There are tools like lex (or flex) that can be used for this. The issue would be whether you can regenerate the lexical analyzer that it would construct when the XML specification changes. If the XML specification does not change often, you may be able to use tools such as lex to do the scanning and mapping more easily. If the XML specification can change at the whim of those using the program, then lex is probably less appropriate.
There are some caveats - notably that both lex and flex generate C code, rather than C++.
I would also consider looking at pattern matching technology - the sort of stuff that egrep in particular uses. This has the merit of being something that can be handled at runtime (because egrep does it all the time). Or you could go for a scripting language - Perl, Python, ... Or you could consider something like PCRE (Perl Compatible Regular Expressions) library.

Better yet, if you're going to use the boost library, there's always the Boost tokenizer library -> http://www.boost.org/doc/libs/1_39_0/libs/tokenizer/index.html

You could use a regex (perhaps the boost::regex library). If all of the patterns are just strings of letters, a regex like "(a|p|pp|u)" would find a greedy match. So:
Run a regex_search using the above pattern to locate the next match
Plug the match-text into your std::map to get the replace-text.
Print the non-matched consumed input and replace-text to your output, then repeat 1 on the remaining input.
And done.

It may seem a bit complicated, but the most efficient way to do that is to use a graph to represent a state-chart. At first, i thought boost.statechart would help, but i figured it wasn't really appropriate. This method can be more efficient that using a simple std::map IF there are many rules, the number of possible characters is limited and the length of the text to read is quite high.
So anyway, using a simple graph :
0) create graph with "start" vertex
1) read xml configuration file and create vertices when needed (transition from one "set of characters" (eg "pp") to an additional one (eg "ppa")). Inside each vertex, store a transition table to the next vertices. If "key text" is complete, mark vertex as final and store the resulting text
2) now read text and interpret it using the graph. Start at the "start" vertex. ( * ) Use table to interpret one character and to jump to new vertex. If no new vertex has been selected, an error can be issued. Otherwise, if new vertex is final, print the resulting text and jump back to start vertex. Go back to (*) until there is no more text to interpret.
You could use boost.graph to represent the graph, but i think it is overly complex for what you need. Make your own custom representation.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js