Printing elements of a tuple - c++

I am trying to print the elements of a tuple returned by a function where I am comparing the elements of a vector of addresses to those in a database. The fields are: 32-bit int representing the address, int for prefix matching, string containing ASN, string containing matching address, string containing the original address being queried.
for (auto itr = IPs.begin(); itr != IPs.end(); itr++) {
tuple<int,int,string,string,string> entry = Compare(*itr, database);
string out = get<3>(entry) + "/" + to_string(get<1>(entry)) + " " + get<2>(entry) + " " + get<4>(entry) + "\n";
cout << out;
}
I want each line of the output to look like this:
"{prefix}/{# bits of prefix} {ASN} {address}\n"
However, the output looks like this:
12.105.69.1528 15314
12.125.142.190 6402
57.0.208.2450 6085
208.148.84.30 4293
208.148.84.16 4293
208.152.160.797 5003
192.65.205.2509 5400
194.191.154.806 2686
199.14.71.79 1239
199.14.70.79 1239
The expected output is:
12.105.69.144/28 15314 12.105.69.152
12.125.142.16/30 6402 12.125.142.19
57.0.208.244/30 6085 57.0.208.245
208.148.84.0/30 4293 208.148.84.3
208.148.84.0/24 4293 208.148.84.16
208.152.160.64/27 5003 208.152.160.79
192.65.205.248/29 5400 192.65.205.250
194.191.154.64/26 2686 194.191.154.80
199.14.71.0/24 1239 199.14.71.79
199.14.70.0/24 1239 199.14.70.79
The part that confuses me the most is the fact that when I print each element on separate lines by replacing each separator with line breaks, it prints the elements correctly:
12.105.69.144
28
15314
12.105.69.152
12.125.142.16
30
6402
12.125.142.19
57.0.208.244
30
6085
57.0.208.245
208.148.84.0
30
4293
208.148.84.3
208.148.84.0
24
4293
208.148.84.16
208.152.160.64
27
5003
208.152.160.79
192.65.205.248
29
5400
192.65.205.250
194.191.154.64
26
2686
194.191.154.80
199.14.71.0
24
1239
199.14.71.79
199.14.70.0
24
1239
199.14.70.79
I suppose that I could just write another function that formats the line breaks into the correct format afterwards, but I am curious about what is causing this. Any ideas?

Could you provide a little more code, so it can be debugged to precisely track the problem?
I think tuple and get are used correctly.
I guess the problem is in the content of strings or at least in the string returned by `get<2>(entry).
Here is a little example which shows what might be wrong
std::string aa = "AAAAA\r"; //"\r" is extra character in aa string
std::string bb = "bbb";
std::cout << aa + " " + bb; //output is " bbbA" not "AAAAA bbb"
The problem obviously doesn't occur when each strings is printed separately in each line.
Double check if string returned by get<X> doesn't contain any special characters or contain OSX end of line mixed with Linux or Windows end of line

Related

Stata Regex for 'standalone' numbers in string

I am trying to remove a specific pattern of numbers from a string using the regexr function in Stata. I want to remove any pattern of numbers that are not bounded by a character (other than whitespace), or a letter. For example, if the string contained t370 or 6-test I would want those to remain. It's only when I have numbers next to each other.
clear
input id str40 string
1 "9884 7-test 58 - 489"
2 "67-tty 783 444"
3 "j3782 3hty"
end
I would like to end up with:
ID string
1 7-test
2 67-tty
3 j37b2 3hty
I've tried different regex statements to find when numbers are wrapped in a word boundary: regexr(string, "\b[0-9]+\b", ""); in addition to manually adding the white space " [0-9]+" which will only replace if the pattern occurs in the middle, not at the start of a string. If it's easier to do this without regex expressions that's fine, I was just trying to become more familiar.
Following up on the loop suggesting from the comments, you could do something like the following:
clear
input id str40 string
1 "9884 7-test 58 - 489"
2 "67-tty 783 444"
3 "j3782 3hty"
end
gen N_words = wordcount(string) // # words in each string
qui sum N_words
global max_words = r(max) // max # words in all strings
split string, gen(part) parse(" ") // split string at space (p.s. space is the default)
gen string2 = ""
forval i = 1/$max_words {
* add in parts that contain at least one letter
replace string2 = string2 + " " + part`i' if regexm(part`i', "[a-zA-Z]") & !missing(string2)
replace string2 = part`i' if regexm(part`i', "[a-zA-Z]") & missing(string2)
}
drop part* N_words
where the result would be
. list
+----------------------------------------+
| id string string2 |
|----------------------------------------|
1. | 1 9884 7-test 58 - 489 7-test |
2. | 2 67-tty 783 444 67-tty |
3. | 3 j3782 3hty j3782 3hty |
+----------------------------------------+
Note that I have assumed that you want all words that contain at least one letter. You may need to adjust the regexm here for your specific use case.

Regex extract number between characters

I have a returned string formatted as below:
PR ER
89
>
from which the number can be extracted by using \n(\d+), but sometimes it returns:
23 PR P 10000>
Or, it could be something like:
23
PR P
10000
>
In these scenarios, how can I extract the number 10000 between PR and >?
This might work for you:
\d+(?=\s*>)
It looks for any sequence of digits followed by any number of whitespaces and a '>'
For java if you need
String str = "23 PR P 10000>";
Pattern reg = Pattern.compile("(\\d+)");
Matcher m = reg.matcher(str);
while (m.find()){
System.out.println("group : " + m. group() + " - start :" + m.start() + " - end :" + m.end());
}
i might just answer this myself
\d+\n>
worked!
thanks all

Printing duplicate strings and how many times they appear in a file C++

Here is the question I have to solve and the code I've written so far.
Write a function named printDuplicates that accepts an input stream and an output stream as parameters.
The input stream represents a file containing a series of lines. Your function should examine each line looking for consecutive occurrences of the same token on the same line and print each duplicated token along how many times it appears consecutively.
Non-repeated tokens are not printed. Repetition across multiple lines (such as if a line ends with a given token and the next line starts with the same token) is not considered in this problem.
For example, if the input file contains the following text:
hello how how are you you you you
I I I am Jack's Jack's smirking smirking smirking smirking smirking revenge
bow wow wow yippee yippee yo yippee yippee yay yay yay
one fish two fish red fish blue fish
It's the Muppet Show, wakka wakka wakka
My expected result should be:
how*2 you*4
I*3 Jack's*2 smirking*5
wow*2 yippee*2 yippee*2 yay*3
\n
wakka*3
Here is my function:
1 void printDuplicates(istream& in, ostream& out)
2 {
3 string line; // Variable to store lines in
4 while(getline(in, line)) // While there are lines to get do the following
5 {
6 istringstream iss(line); // String stream initialized with line
7 string word; // Current word
8 string prevWord; // Previous word
9 int numWord = 1; // Starting index for # of a specific word
10 while(iss >> word) // Storing strings in word variable
11 {
12 if (word == prevWord) ++numWord; // If a word and the word 13 before it are equal add to word counter
14 else if (word != prevWord) // Else if the word and the word before it are not equal
15 {
16 if (numWord > 1) // And there are at leat two copies of that word
17 {
18 out << prevWord << "*" << numWord << " "; // Print out "word*occurrences"
19 }
20 numWord = 1; // Reset the num counter variable for next word
21 }
22 prevWord = word; // Set current word to previous word, loop begins again
23 }
24 out << endl; // Prints new line between each iteration of line loop
25 }
26 }
My result thus far is:
how*2
I*3 Jack's*2 smirking*5
wow*2 yippee*2 yippee*2
I have tried adding (|| iss.eof()), (|| iss.peek == EOF), etc inside the nested else if statement on Line 14, but I am unable to figure this guy out. I need some way of knowing I'm at the end of the line so my else if statement will be true and try to print the last word on the line.

Error in writing output file through AWK scripting

I have a AWK script to write specific values matching with specific pattern to a .csv file.
The code is as follows:
BEGIN{print "Query Start,Query End, Target Start, Target End,Score, E,P,GC"}
/^\>g/ { Query=$0 }
/Query =/{
split($0,a," ")
query_start=a[3]
query_end=a[5]
query_end=gsub(/,/,"",query_end)
target_start=a[8]
target_end=a[10]
}
/Score =/{
split($0,a," ")
score=a[3]
score=gsub(/,/,"",score)
e=a[6]
e=gsub(/,/,"",e)
p=a[9]
p=gsub(/,/,"",p)
gc=a[12]
printf("%s,%s,%s,%s,%s,%s,%s,%s\n",query_start, query_end,target_start,target_end,score,e,p,gc)
}
The input file is as follows:
>gi|ABCDEF|
Plus strand results:
Query = 100 - 231, Target = 100 - 172
Score = 20.92, E = 0.01984, P = 4.309e-08, GC = 51
But I received the output in a .csv file as provided below:
100 0 100 172 0 0 0 51
The program failed to copy the values of:
Query end
Score
E
P
(Note: all the failed values are present before comma (,))
Any help to obtain the right output will be great.
Best regards,
Amit
As #Jidder mentioned, you don't need to call split() and as #jaypal mentioned you're using gsub() incorrectly, but also you don't need to call gsub() at all if you just include , in your FS.
Try this:
BEGIN {
FS = "[[:space:],]+"
OFS = ","
print "Query Start","Query End","Target Start","Target End","Score","E","P","GC"
}
/^\>g/ { Query=$0 }
/Query =/ {
query_start=$4
query_end=$6
target_start=$9
target_end=$11
}
/Score =/ {
score=$4
e=$7
p=$10
gc=$13
print query_start,query_end,target_start,target_end,score,e,p,gc
}
That work? Note the field numbers are bumped out by 1 because when you don't use the default FS awk no longer skips leading white space so there's an empty field before the white space in your input.
Obviously, you are not using your Query variable so the line that populates it is redundant.

Ruby - Unable to count the array values?

I have a program :
Question : Input a number of integer of 2 digit only , and in the out-put it should show the all input values BUT loop should stop on 42 :
example
input
1
2
87
42
99
output
1
2
87
my code
a = []
5.times do |i|
a[i] = Integer(gets.chomp)
end
a.each do |e|
break if e == '42'
puts e
end
Few things to change. First of all gets will give you a string together with \n at the end, so you need to change it to gets.chomp to remove it.
Now your loop should look like this:
a.each do |e|
break if e == '42'
puts e
end
However ruby's array has much butter function which is perfect for what you want:
puts a.take_while {|e| e != '42'}
Additional notes:
Note that it is operating on strings rather than numbers. You might need to validate the input at some point and convert it into integer values.
5.times do|i| - the |i| bit is obsolete.