splitting this line using boost or regex - c++

I am new to C++ and I am struggling to get this string to split. Looking to create a multimap that has book as the key. Noun and the definition after the "-=>>" would be a pair and so would verb and its definition. Here is the strings I can not seem to split correctly:
book|noun -=>> A set of pages.|verb -=>> To arrange something on a particular date.
bookable|adjective -=>> Can be ordered.
This is the code I am trying. I figured out this code is not properly loading the multimap, because when I print parts[0] both names are put into the same index. It seems regex is a simpler solution, but after plugging away at this for the last couple of hours I need some help.
while (getline(myfile, line)) {
string delimiters("|-=>>.");
vector<string> parts;
boost::split(parts, line, boost::is_any_of(delimiters));
name = parts[0];
partOfSpeech = parts[1];
definition = parts[2];
dictionary.emplace(make_pair(name, make_pair(partOfSpeech, definition)));
}
Any guidance or feedback is much appreciated

First split your line by |. The first resulting part is your name. Then got through all the other resulting parts and divide them into halves by -=>>. This will give you the first (partOfSpeech) and second (definition) halves of your pairs.

Related

Swift 3: extract regex matches with non matching parts

I want to analyze a string by many different patterns for numbers, dates and other strings. So I have an array of patterns I want to check in that order.
let patterns = [... "\\d{6}", "\\d{4}", "\\d" ] // to be extended :-)
let s = "IMG_123456_2006.10.03-13.52.59 Testfile_2009_5"
Starting with the first item in pattern I need a search in string s. If found, the string should be split in found parts e.g. "2006" and "2009" and the non matching parts. The remaining parts will be searched with the next pattern and so on. Assuming I already had the pattern defined for time/date in the middle which should be placed at the first item, the splitted string should look like:
"IMG_", "123456", "_", "2006.10.03-13.52.59", " Testfile_", "2009", "_", "5"
Can I use a build in functionality of regex.matches, or do I have to write everything by my own?
I already been able to find a match. But then I have to use the ranges to split the string and do it again and again for the remaining parts until no further matches are indicated. This will need a lot more calculations than I would expect using the results in match.numberOfRanges. Any small solutions available?

find a pattern in string and remove that pattern of the string from excel cells without touching the pattern in the middle of the string

I have a column which has "--" pattern in the beginning, middle and end of the string. For example:
-- myString
my -- String
myString --
I want to find these two types of cells
-- myString
myString --
and remove the "--" pattern, so it will look fine! I am an amateur user of excel but can use functions if you suggest me. It should be possible with find and use the results of the Find in Replace functions, but I do not know how to pass the results to Replace.
Please note: The answer should take care all the cells in the column, which are hundreds. One solution for changing all, not one solution for one cell.
EDIT: Just reread the request, per instruction from Gary'sStudent. This will remove all instances of "--", not only those at the beginning/end.
If the data is in A1, use the following formula:
=SUBSTITUTE(A1,"--","")
With data in A1 in B1 enter:
=IF(LEFT(A1,2)="--",MID(A1,3,9999),IF(RIGHT(A1,2)="--",MID(A1,1,LEN(A1)-2),A1))
OK, I found the answer. The answer from #Dubison helped me to find the right answer.
If the left two characters in this cell is "--" and the last two characters are "--" the substitute the "--" with "", else to nothing.
=IF(LEFT(A1,2)="--",SUBSTITUTE(A1,"--",""),IF(RIGHT(A1,2)="--",SUBSTITUTE(A1,"--",""), A1))
This will be pretty much the same with previous answers, only using simpler logic. If your strings first or last character = "-" do nothing, else replace "--" with "".
=IF(LEFT(A1,1)="-",A1,IF(RIGHT(A1,1)="-",A1, SUBSTITUTE(A1,"--","")))
UPDATE:
I noticed that I have misread the question. Above code will remove the "--" only if it is in the middle. However original question was to remove "--" only if it is at the beginning or at the end. So formula should be:
=IF(OR(LEFT(A1,2)="--",RIGHT(A1,2)="--"),SUBSTITUTE(A1,"--",""),A1)

Extract string of numbers from URL using regex PIG

I'm using PIG to generate a list of URLs that have been recently visited. In each of the URLs, there is a string of numbers that represents the product page visited. I'm trying to use a regex_extract_all() function to extract just the string of numbers, which vary in length from 6-8. The string of digits can be found directly after jobs2/view/ and usually ends with +&cd but sometimes they may end with ).
Here are a few example URLs:
(http://a.com/search?q=cache:QD7vZRHkPQoJ:ca.xyz.com/jobs2/view/17069404+&cd=1&hl=en&ct=clnk&gl=ca)
(http://a.com/search?q=cache:G9323j2oNbAJ:ca.xyz.com/jobs2/view/5977065+&cd=1&hl=en&ct=clnk&gl=ca)
(http://a.com/search?q=cache:aNspmG11qAJ:hk.xyz.com/jobs2/view/16988928+&cd=2&hl=zh-TW&ct=clnk&gl=hk)
(http://a.com/search?q=cache:aNspmG11AJ:hk.xyz.com/jobs2/view/16988928+&cd=2&hl=zh-TW&ct=clnk&gl=hk)
(http://a.com/search?q=cache:aNspmG11qAJ:hk.xyz.com/jobs2/view/16988928+&cd=2&hl=zh-TW&ct=cl k&gl=hk)
Here is the current regex I am using:
J = FOREACH jpage GENERATE FLATTEN(REGEX_EXTRACT_ALL(TEXTCOLUMN, '\/view\/(\d+)\+\&')) as (output:chararray)
I have also tried other forms such as:
'[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]', 'view.([0-9]+)', 'view\/([\d]+)\+',
'[0-9][0-9][0-9]+', and
'[0-9][0-9][0-9]*'; none of which work.
Can anybody assist here or have another way of going about it?
Much appreciated,
MM
Reason for"Unexpected character 'D'" is, you need to put double backslash instead of single backslash. eg just replace [\d+] to [\\d+]
Here your solution, please validate all your inputs strings
input.txt
http://a.com/search?q=cache:QD7vZRHkPQoJ:ca.xyz.com/jobs2/view/17069404+&cd=1&hl=en&ct=clnk&gl=ca
http://a.com/search?q=cache:G9323j2oNbAJ:ca.xyz.com/jobs2/view/5977065+&cd=1&hl=en&ct=clnk&gl=ca
http://a.com/search?q=cache:aNspmG11qAJ:hk.xyz.com/jobs2/view/16988928+&cd=2&hl=zh-TW&ct=clnk&gl=hk
http://a.com/search?q=cache:aNspmG11AJ:hk.xyz.com/jobs2/view/16988928+&cd=2&hl=zh-TW&ct=clnk&gl=hk
http://a.com/search?q=cache:aNspmG11qAJ:hk.xyz.com/jobs2/view/16988928+&cd=2&hl=zh-TW&ct=clk&gl=hk
http://a.com/search?q=cache:aNspmG11qAJ:hk.xyz.com/jobs2/view/16988928)=2&hl=zh-TW&ct=clk&gl=hk
http://webcache.googleusercontent.com/search?q=cache:http://my.linkedin.com/jobs2/view/9919248
Updated Pigscript:
A = LOAD 'input.txt' as line;
B = FOREACH A GENERATE REGEX_EXTRACT(line,'.*/view/(\\d+)([+|&|cd|)?]+)?',1);
dump B;
(17069404)
(5977065)
(16988928)
(16988928)
(16988928)
(16988928)
I'm not familiar with PIG, but this regex will match your target:
(?<=/jobs2/view/)\d+
By using a (non-consuming) look behind, the entire match (not just a group of the match) is your number.

Capitalize first letter of words in a string

I'm having trouble figuring out how to transform a string into camel case in groovy. Say I start out with a string that looks like "1-800 FOO.BAR". Ultimately, I want this to turn into "1800FooDotBar". I've been able to get 1800FOODotBar by doing the following:
String str = "1-800 FOO.BAR"
String tempStr = str.replaceAll(/(?i)\.com/, "DotCom")
String newStr = tempStr.replaceAll(/\\W/, "")
I'm just not sure how to get rid of those capital letters in the middle. I've come across some information about a capitalize() method that should be able to help, but I'm just not familiar enough with Groovy to know how to use it. I think I need to split the string into individual strings for each word and then capitalize the first letter of each of those strings, but then how do I build the end result back up? I know that similar questions have been asked, but I'm just not seeing how to take that information and make complete Groovy code from it. Thanks in advance!
Very roughly:
String str = "1-800 FOO.BAR"
println str.replaceAll(/\./, " Dot ").split(/[^\w]/).collect { it.toLowerCase().capitalize() }.join("")
=> 1800FooDotBar

How to replace a string between two substrings in a string in VC++/MFC?

Say I have a CString object strMain="AAAABBCCCCCCDDBBCCCCCCDDDAA";
I also have two smaller strings, say strSmall1="BB";
strSmall2="DD";
Now, I want to replace all occurence of strings which occur between strSmall1("BB") and strSmall2("DD") in strMain, with say "KKKKKKK"
Is there a way to do it without Regex. I cannot use regex as adding another file to the project is prohibited.
Is there a way in VC++/MFC to do it? Or any easy algorithm you can point me to?
int length = strMain.GetLength();
int begin = strMain.Find(strSmall1, 0) + strSmall1.GetLength();
int end = strMain.Find(strSmall2, 0);
CStringT left = strMain.Left(begin);
CStringT right = strMain.Right(length - end);
strMain = left + "KKKKKKK" + right
The easiest way is probably to handle the replacement recursively. Search for the starting delimiter and the ending delimiter. If you find them, put together a new string consisting of the string up to the starting delimiter, followed by the replacement string, followed by the return from recursively doing the replacement in the remainder of the string following the ending delimiter.
That, of course, assumes you want to replace all the occurrences in the main string -- if you only want to replace the first one, John Weldon's solution (for one example) will work quite nicely.
psudocode:
loop over string
if curlocation matches string strsmall1 save index break
loop over remaining string
replace till curlocation matches string strsmall2
Extra credit:
What will the next assignment be?
My answer:
Speed it up by jumping the length of strsmall1 and strsmall2 in loop iterations