Building a Directed Graph from a file - c++

I need to read in a file and build a directed graph from it. I am working in C++. The file looks like this:
SFA SLC 700 59
SFO LV 420 39
LAX LV 231 23
LV SLC 362 29
LAX SFO 344 39
LAX SLC 581 57
SFA SFO 679 67
SFO SLC 605 19
PHX DEN 586 65
LV PHX 256 21
DEN SFA 1026 72
DEN LAX 844 69
SLC DEN 379 49
SLC SJC 585 29
SJC SFO 51 19
The first line means there is a flight from SFA to SLC which is 700 miles and cost $59 and each line follows this formula. I am having a really hard time figuring out a good way to do this. Any help would be so much appreciated. Thank you in advance.

I suggest using Lemon, see tutorial: http://lemon.cs.elte.hu/pub/tutorial/a00011.html
Generally speaking you separate structure (the graph) and the data. So in case of lemon you would read each line, split it into 4 fields (delimiter being the whitespace). During the read of the file you should also maintain the hash or map (e.g. std::unordered_map) to quickly lookup destinations already in the graph (or use graph API to find them but that would be slower).
So:
ListDigraph g;
ListDigraph::NodeMap<std::string> gDestinations(g);
ListDigraph::ArcMap<int> gCosts(g);
ListDigraph::ArcMap<int> gDistances(g);
std::unordered_map<std::string, ListDigraph::Node> destinations;
And then for each row:
//first read the line, split it be whitespace into 4 fields, e.g. into these
std::string destFrom, destTo;
int distance, cost;
//first the structure
auto itFrom = destinations.insert(destFrom, g.addNode()).first; //dest is the first or second field in your file
auto itTo = destinations.insert(destTo, g.addNode()).first; //dest is the first or second field in your file
ListDigraph::Arc route = g.addArc(*itFrom, *itTo);
//then the data
gDestinations[*itFrom] = destFrom; //well this could be done once if the place exists already, this s for brevity
gDestinations[*itTo] = destTo; //dtto
gDistances[route] = distance; //distance is the third field
gCosts[route] = cost; //cost is the fourth field
And that is it. See the tutorial and Lemon documentation how to use graph algorithms and manipulate the graph etc.

Related

Is there a way to put a section of a line at the start of every subsequent line using regular expressions?

I have a text file in which there is a line with the category and then all items of that category in lines below it. This is followed by 2 empty lines and then the title of the next category and more items in the category. I want to know how I could use regular expressions (specifically with Notepad++) in order to put the category at the start of each of the item's lines so I can save the file as a CSV or TAB file.
I started by isolating one of the categories as such:
Городищенский поссовет 1541
Арабовщина 535
Болтичи 11
Бриксичи 59
Великое Село 160
Гарановичи 34
Грибовщина 3
Душковцы 5
Зеленая 182
Кисели 97
Колдычево 145
Конюшовщина 16
Микуличи 31
Мостытычи 18
Насейки 5
Новоселки 45
Омневичи 53
Поручин 43
Пруды 24
Станкевичи 42
Ясенец 33
I then got as far as getting to be finding for
(.+)(поссовет)(\t\d{4}\r\n)(^.*$\r\n)
and replacing with
$1$2\t$4
which makes the first line
Арабовщина 535
turn into
Городищенский поссовет Арабовщина 535
which is what I want to happen to the rest of the lines but I couldn't get any farther.

How to determine the number of filled drums, and the room left in each drum

Not quite a homework problem, but it may as well be:
You have a long list of positive integer values stored in column A. These are packets in unit U.
A Drum can fit up to 500 U, but you cannot break up packets.
How many drums are required for any given list of values in column A?
This does not have to be the most efficient answer, processing in row order is absolutely fine.
I Think you should be able to solve this with a formula, but the closest I got was
=CEILING(SUM(A1:A1000)/500;1)
Of course, this breaks up packets.
Additionally, this problem requires me to be able to find the room left in each drum used, but emphasis for this question should remain on just the number required.
This cannot be done with a single simple formula. Each drum and packet needs to be counted. However contrary to my comment, for this particular problem a spreadsheet works well, and there is no need for a macro.
First, set B2 to 500 for use in other formulas. If column A is not yet filled, use the formula =RANDBETWEEN(1,B$2) to add some values.
Column C is the main formula that determines how full each drum is. Set C2 to =A2. C3 is =IF(C2+A3>B$2,A3,C2+A3). Fill C3 down to fill the remaining rows.
For column D, use =IF(C2+A3>B$2,B$2-C2,""). However the last row of column D is shorter: =B$2-C21 and change 21 to whatever the last row is.
Finally in column E we find the answer, which is simply =COUNT(D2:D21).
Packets Drum Size How Full Room left in each drum used Number of filled drums
------- --------- -------- --------------------------- ----------------------
206 500 206 294 13
309 309
68 377
84 461 39
305 305 195
387 387 113
118 118
8 126 374
479 479 21
492 492 8
120 120
291 411 89
262 262
108 370 130
440 440 60
88 88
100 188
102 290 210
478 478 22
87 87 413
For OpenOffice Calc, use semicolons ; instead of commas , in formulas.

How to mix regex and string in value.replace in OpenRefine / GoogleRefine?

I'm just trying to add "+33 " and remove the first "0" in a phone number like 04 35 73 84 93 (in France) to get +33 4 35 73 84 93 in a database of contacts where a field contains only the phone number.
I tried :
value.replace(/^'0'/,'+33 ')
There is no error, but the result is the same as the original.
I thought it would be very simple (I am a beginner with Open Refine), but it seems I am missing a bigger thing here!
Anyone can help? I searched quite a lot and this seems so simple that no one is speaking about it!

Working with files I/O for beginners [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Hi all I am working on a school beginners project using files I/O in C++,
This program consist of two parts:
1) reading and processing a student data file, and writing the results to a student report file
2) modifying part 1 to calculate some statistics and writing them to another file.
For this assignment, you will be reading one input file and writing out two other files.
Your program will be run using the referenced student data file.
Part 1 Detail
Read in the student data file. This 50 record file consists of a (8-digit numeric) student id, 8 assignment's points, midterm points, final points and lab exercise points. You must again follow the syllabus specifications for the determination of the letter grade, this time, processing 50 student grades. Extra credit points are not applicable for this assignment. You will write the input student data and the results of the processing into a student report file that looks like the output shown below. In addition to the input student data, the report should contain the "total" of the assignment grades, the total and percent of all points achieved, and the letter grade. You may assume that the input data file does not contain any errant data.
The file looks like the one below:
The file that we need to read from is hyperlinked here
The student report output file should look like this:
The Student Report Output File
Student --- Asignment Grades -- Ass Mid Fin LEx Total Pct Gr
-------- -- -- -- -- -- -- -- -- --- --- --- --- ----- --- --
56049257 16 16 20 16 12 15 12 20 115 58 123 59 355 89 B+
97201934 19 15 13 19 16 12 13 18 113 72 101 55 341 85 B
93589574 13 16 19 19 18 12 6 14 111 58 108 50 327 82 B
85404010 17 19 19 19 19 10 17 19 129 70 102 58 359 90 A-
99608681 11 15 19 19 17 10 16 19 116 42 117 57 332 83 B
84918110 11 20 18 17 12 8 12 19 109 46 122 31 308 77 C
89307179 16 16 19 18 14 17 15 19 120 56 117 52 345 86 B
09250373 15 15 18 18 11 18 17 19 120 44 106 51 321 80 B-
91909583 12 14 16 19 20 11 20 16 117 66 92 50 325 81 B-
...
Part 2 Detail
Write a summary report file that contains the average total points and average percent for all students. Also, display the number of A's, B's, C's, D's and F's for the students. Your summary output file should look something like this:
The average total points = ???
The average percent total = ??
The number of A's = ??
The number of B's = ??
The number of C's = ??
The number of D's = ??
The number of F's = ??
Additional requirements
All files must be checked for a successful open. They should also be closed when you are finished with them.
Make sure you write the student id with a leading 0, if appropriate (i.e. the 8th id).
Add headings to your output report file. They should be aligned and correctly identify the column data.
Do not use global variables, except for constants, in your solution.
For part 1 How do I duplicate the file and format it to add the headings above it and the grades at the end of each file into the new duplicated file??
Any help in this matter would be appreciated
thanks in advance.
Engineering is all about converting a large complex problem into many smaller, easy to solve, problems.
Here is how I would start.
1.) Open input file.
2.) Read one line from input file.
3.) Break the input string from one line into values.
4.) Close input file.
5.) Open output file.
6.) Write results to output file.
References:
1.)File I/O
2.)std::string
3.)File I/O C
Now you're pretty much there. Take it one step at a time.

Reading txt files with various dimensions as input for k-means algorithm program

I'm trying to develop a C++ k-means program that will be reading text files. The problem is that the text files are not uniform.
For example, data1.txt looks like
0.1
3.0
0.7
0.5
0.2
1.5
1.6
and data3.txt looks like
33 37 53
35 36 52
34 37 53
35 37 51
34 38 52
33 38 51
33 39 52
33 37 52
34 37 52
34 39 52
I'm thinking I should store the data in a vector somehow. What's the best way to develop this without forcing the user to tell the program how many dimensions each entry in the text file has? I want the user to call the program like
program data_.txt #
Where data_.txt = any text file and # = the number of clusters
You can use boost::split() function (or this method) in order to get the number of entries in every line. Then, if you now how many entries there are, you also know how many dimensions you're dealing with. This also allows you to implement simple input validation (constant number of columns in a file).
There is also good method with stringstream, which allows to split by any whitespace. You could use the following:
#include <sstream>
#include <vector>
#include <iterator>
template<typename T>
std::vector<T> split(const std::string& line) {
std::istringstream is(line);
return std::vector<T>(std::istream_iterator<T>(is), std::istream_iterator<T>());
}
Usage:
std::string line = "1.2 3.4 5.6e7";
std::vector<double> vec = split<double>(line);