Is there a spelling library that recognizes incomplete words? [closed] - c++

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
My use case is the following: given a string, recognize all valid words starting at the beginning of the string. For example:
blueberryqqq
should output:
blue
blueberry
To do that, I have a dictionary structure that uses a trie<char>. For example, if my dictionary consists only of the two words above, it would like this:
b->l->u->e->\0
->b->e->r->r->y->\0
When I investigate my input string, the spell-checking process can tell me, as I go from letter to letter whether:
I'm on the path to a valid word
I have found a valid word
I am not on the path to a valid word
Note that these are flags and both 1 and 2 can be true at the same time. With that approach, I can efficiently find both blue and blueberry in one go and stop trying immediately when I reach the y. Continuing with the example, here's what happens as I go from letter to letter:
b:1, l:1, u:1, e:1|2, b:1, e:1, r:1, r:1, y:2
When I see 1|2, I know that "blue" is valid word but I also know to keep going further down the string because my dictionary tells me there are more words possible. Once I reach the y, I stop. Quite efficient as I visit each letter only once for all valid words and I stop spell-checking as soon as the dictionary tells me there is no point in going further. Perfect!
My problem is that my dictionary trie is built from /usr/share/dict/words and that file does not contain the plural form of "bluberry" which is "blueberries" and in general won't contain all the "derivatives" of all the words. So if the input string is blueberriesqqq, I will only get blue as valid.
If I were to use a spell-checking library like aspell or hunspell, as far I as I can tell, I would need to spellcheck all sub-strings individually! e.g. b, bl, blu, etc. Quite inefficient! Not only that, but I wouldn't know when to stop checking. e.g. How do I know there aren't any words that start with blueberriesqq?
So, my question becomes: is there a spell-checking library out there that would accommodate my use case?
Note that spelling suggestions wouldn't cut it. Passing blueb to aspell does not return any spelling suggestions which start with blueb. Thus, I would end my search even though there is still the possibility of more valid words down the line.

Related

how to remove string before second ":" in notepad++ [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 5 years ago.
Improve this question
I want to get last two strings email#email.com:namesurname from those strings. I know how to remove the last item after : with :.* but how can i do that for first also for those below? Just give me a recommendation if anyone can.
jobapplication:::2017-05-29:email#email.com:namesurname
also like this one:
skills:email#email.com:namesurname
I dont have idea to start it and there are around 3200 job applications.
Use the regular expression ^.+\:(?=\w+\#) to find unwanted string then replace all matches by empty string.
Have you considered recording a quick macro in which you do it once, replace or whatever, then press home home and down arrow to advance to the next line? Then you could do Run For Rest of File and it'd be done. (Make a backup first. ;)) I find the quick macro feature of Notepad++ comes in handy for this kind of thing, and easier (for some of us) to remember how to use than arcane regexes.

Notepad++ ignoring end delimiter of RegEx [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 9 years ago.
Improve this question
I have a JSON file. There is some information I want to delete and the process would be quite tedious to be done manually, and incredibly quick to perform through RegEx.
I want to find matches starting by, let's say, "abc" (including quotes), composed by any set of characters (including conflicting ones like brackets), and ending by , (the comma character), new line and " (left quote character).
Although RegEx is not my best strength, I have read several questions that could be related, like this one, and tried out several patterns, being this the one in which I believe the most:
"abc"(.*),^"
But it doesn't work properly. It starts fine, but the part after the (.*) is completely ignored, so the rest of the text in the document is selected instead of only what I requested.
^ doesn't mean newline. It's a "zero-length anchor" that matches the position before the first character of a line.
You want something like
"abc"(.*),\r?\n"

Transliteration between different writing systems [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I need to learn how to change a transliteration of a text to another writing system. Apparently the best way would somehow involve regular expressions and perl, probably from command line? I've been using regular expressions earlier in Notepad++ and TextWrangler, so I know some basics already. If there is some really good (and relatively easy and customizable) way to do this in Ruby or something else, I can start learning that as well. There is a constant need to transliterate linguistic sample texts in my field in Uralic linguistics, where many different variants of transliteration systems are used. So it is worth investing some time.
So the material I have now consists of lines with a sentence on each line. Some lines have other data like numbers, but those should stay as they are. I want to keep the punctuation marks as they are, this is just about converting one set of unicode letter characters to another. I searched the site but a lot was about converting from ascii to unicode and so on - this is not the problem here.
So the original text is like this (in broad Finno-Ugric Transcription):
mödis ivan velöććyny pećoraö ščötövödnej kurs vylö.
And I would need it in a form like this:
мӧдiс иван велӧччыны печораӧ щӧтӧвӧднэй курс вылӧ.
This continues for some thousand lines.
There is a clear correspondence between characters used, but it is sometimes complex and involves dealing first with some digraphs and consonant + vowel combinations, etc. As you see from the example, in some situations latin i corresponds to cyrillic и but in some positions can remain as i. Different texts have different solutions, so I would need to adjust the rules in each case. I understand I would need to run a long series of regular expressions in a very specific order to make it work. This order I will figure out myself, but I need to know into what kind of tool I have feed these rules in and how to do it.
I also have often situations where I would like to have the original sentence and transliterated one separated by a tab, so that the lines would have a form like this:
mödis ivan velöććyny pećoraö ščötövödnej kurs vylö. мӧдiс иван
велӧччыны печораӧ щӧтӧвӧдней курс вылӧ.
Of course there are many more questions, but after learning these basics I think I can move forward independently. Learning this would help me a lot. Thanks in advance!
Niko

How to use regex to find if a text contains specific words? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions must demonstrate a minimal understanding of the problem being solved. Tell us what you've tried to do, why it didn't work, and how it should work. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I am using a file search tool which can use regex to find files which contain certain text. My regex skills are pretty simple. (I am going to assume the file is treated like a single text with some line breaks)
Let's say I want to find files which contain these 3 words: route, boy & skill.
How to create two regex's, one to search for those words where each word needs to be a whole word (white space before or after, at beggining or end of line), and another regex where one or more words could be part of another word (like substring function)?
Update
I am not interested in regex tutorials and testers. If I need one, I certainly can google for one and find dozens. This is a regex that I simply can't create but which I will use over and over in that tool. Maybe regex doesn't support what I want and a regex expert can tell me that's the case. So no amount of regex tutorials and testers is going to help. I appreciate the links but they are not going to help me here.
Try following regular expression:
(?=.*\broute\b)(?=.*\bboy\b)(?=.*\bskill\b)

Regex to get string till it hits a comma [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions concerning problems with code you've written must describe the specific problem — and include valid code to reproduce it — in the question itself. See SSCCE.org for guidance.
Closed 9 years ago.
Improve this question
Let's say I have a string like this: "1367,14,243,540"(will always have 4 number and only numbers, no decimal places and always separated by comma)
How should the regex look like that would allow me to pick/filter out/return lets say 243 from the string?
here is your regex if you insist on a regex /\d+/g (g here is for multiple selections in js, use matches with Microsoft framework) though you can use split (example using js):
var v='123,333,445,67';
console.log('split:');
console.log(v.split(',').map(function(n){return parseInt(n);}));
console.log('\nregex:');
console.log(v.match(/\d+/g).map(function(n){return parseInt(n);}));
jsfiddle
the numbers will be returned in an array, you can use the index to access the desired one, let's say 2.
note: split is faster than regex, you can test the difference in performance using jsperf.com
Edit: For those who are interested in the performance difference, check this link.
note2: map here is just for parsing the strings into integers, you can remove it if you want to keep them as strings.
try
^([[:digit:]]+,){2}([[:digit:]]+)
your desired number is in capture group #2.
As one of the comments says, you shouldn't really use a regex in this case. Always try to use the appropriate tool for the job, and in this case the regex is HUGE overkill.
Your problem is solved easily as this
$sourceString = "1367,14,243,540";
$numbers = explode(",", $sourceString);
$neededNumber = $numbers[2];
You just need to describe your string:
^(\d+,){2}(\d+)
"From the start, number followed by comma appears two times, then another number."
You can pick the number of the second group, i.e. \2 or $2.