Sorting a file with 55K rows and varying Columns - c++

I want to find a programmatic solution using C++.
I have a 900 files each of 27MB size. (just to inform about the enormity ).
Each file has 55K rows and Varying columns. But the header indicates the columns
I want to sort the rows in an order w.r.t to a Column Value.
I wrote the sorting algorithm for this (definitely my newbie attempts, you may say).
This algorithm is working for few numbers, but fails for larger numbers.
Here is the code for the same:
basic functions I defined to use inside the main code:
int getNumberOfColumns(const string& aline)
int ncols=0;
istringstream ss(aline);
string s1;
while(ss>>s1) ncols++;
return ncols;
vector<string> getWordsFromSentence(const string& aline)
istringstream ss(aline);
string tstr;
while(ss>>tstr) words.push_back(tstr);
return words;
bool findColumnName(vector<string> vs, const string& colName)
vector<string>::iterator it = find(vs.begin(), vs.end(), colName);
if ( it != vs.end())
return true;
else return false;
int getIndexForColumnName(vector<string> vs, const string& colName)
if ( !findColumnName(vs,colName) ) return -1;
else {
vector<string>::iterator it = find(vs.begin(), vs.end(), colName);
return it - vs.begin();
////////// I like the Recurssive functions - I tried to create a recursive function
///here. This worked for small values , say 20 rows. But for 55K - core dumps
void sort2D(vector<string>vn, vector<string> &srt, int columnIndex)
vector<double> pVals;
for ( int i = 0; i < vn.size(); i++) {
vector<string>meancols = getWordsFromSentence(vn[i]);
srt.push_back(vn[max_element(pVals.begin(), pVals.end())-pVals.begin()]);
if (vn.size() > 1 ) {
vn.erase(vn.begin()+(max_element(pVals.begin(), pVals.end())-pVals.begin()) );
vector<string> vn2 = vn;
//cout<<srt[srt.size() -1 ]<<endl;
sort2D(vn2 , srt, columnIndex);
Now the main code:
for ( int i = 0; i < TissueNames.size() -1; i++)
for ( int j = i+1; j < TissueNames.size(); j++)
//string fname = path+"/gse7307_Female_rma"+TissueNames[i]+"_"+TissueNames[j]+".txt";
//string fname2 = sortpath2+"/gse7307_Female_rma"+TissueNames[i]+"_"+TissueNames[j]+"Sorted.txt";
string fname = path+"/gse7307_Male_rma"+TissueNames[i]+"_"+TissueNames[j]+".txt";
string fname2 = sortpath2+"/gse7307_Male_rma"+TissueNames[i]+"_"+TissueNames[j]+"4Columns.txt";
BioInputStream fin(fname);
string aline;
replace (aline.begin(), aline.end(), '"',' ');
string headerline = aline;
vector<string> header = getWordsFromSentence(aline);
int pindex = getIndexForColumnName(header,"p-raw");
int xcindex = getIndexForColumnName(header,"xC");
int xeindex = getIndexForColumnName(header,"xE");
int prbindex = getIndexForColumnName(header,"X");
string newheaderline = "X\txC\txE\tp-raw";
BioOutputStream fsrt(fname2);
int newpindex=3;
while ( getline(fin, aline) ){
replace (aline.begin(), aline.end(), '"',' ');
istringstream ss2(aline);
string tstr;
tstr = ss2.str().substr(tstr.length()+1);
vector<string> words = getWordsFromSentence(tstr);
string values = words[prbindex]+"\t"+words[xcindex]+"\t"+words[xeindex]+"\t"+words[pindex];
sort2D(AllLinesInFile, SortedLines,newpindex);
for ( int si = 0; si < SortedLines.size(); si++)
cout<<"["<<i<<","<<j<<"] = "<<SortedLines.size()<<endl;
can some one suggest me a better way of doing this?
why it is failing for larger values. ?
The primary function of interest for this query is Sort2D function.
thanks for the time and patience.

I'm not sure why your code is crashing, but recursion in that case is only going to make the code less readable. I doubt it's a stack overflow, however, because you're not using much stack space in each call.
C++ already has std::sort, why not use that instead? You could do it like this:
// functor to compare 2 strings
class CompareStringByValue : public std::binary_function<string, string, bool>
CompareStringByValue(int columnIndex) : idx_(columnIndex) {}
bool operator()(const string& s1, const string& s2) const
double val1 = stringToDouble(getWordsFromSentence(s1)[idx_]);
double val2 = stringToDouble(getWordsFromSentence(s2)[idx_]);
return val1 < val2;
int idx_;
To then sort your lines you would call
std::sort(vn.begin(), vn.end(), CompareByStringValue(columnIndex));
Now, there is one problem. This will be slow because stringToDouble and getWordsFromSentence are called multiple times on the same string. You would probably want to generate a separate vector which has precalculated the values of each string, and then have CompareByStringValue just use that vector as a lookup table.
Another way you can do this is insert the strings into a std::multimap<double, std::string>. Just insert the entries as (value, str) and then read them out line-by-line. This is simpler but slower (though has the same big-O complexity).
EDIT: Cleaned up some incorrect code and derived from binary_function.

You could try a method that doesn't involve recursion. if your program crashes using the Sort2D function with large values, then your probably overflowing the stack (danger of using recursion with a large number of function calls). Try another sorting method, maybe using a loop.

sort2D crashes because you keep allocating an array of strings to sort and then you pass it by value, in effect using O(2*N^2) memory. If you really want to keep your recursive function, simply pass vn by reference and don't bother with vn2. And if you don't want to modify the original vn, move the body of sort2D into another function (say, sort2Drecursive) and call that from sort2D.
You might want to take another look at sort2D in general, since you are doing O(N^2) work for something that should take O(N+N*log(N)).

The problem is less your code than the tool you chose for the job. This is purely a text processing problem, so choose a tool good at that. In this case on Unix the best tool for the job is Bash and the GNU coreutils. On Windows you can use PowerShell, Python or Ruby. Python and Ruby will work on any Unix-flavoured machine too, but roughly all Unix machines have Bash and the coreutils installed.
Let $FILES hold the list of files to process, delimited by whitespace. Here's the code for Bash:
for FILE in $FILES; do
echo "Processing file $FILE ..."
tail --lines=+1 $FILE |sort >$FILE.tmp
mv $FILE.tmp $FILE


C++ - checking a string for all values in an array

I have some parsed text from the Vision API, and I'm filtering it using keywords, like so:
if (finalTextRaw.find("File") != finalTextRaw.npos)
LogMsg("Found Menubar");
E.g., if the keyword "File" is found anywhere within the string finalTextRaw, then the function is interrupted and a log message is printed.
This method is very reliable. But I've inefficiently just made a bunch of if-else-if statements in this fashion, and as I'm finding more words that need filtering, I'd rather be a little more efficient. Instead, I'm now getting a string from a config file, and then parsing that string into an array:
string filterWords = GetApp()->GetFilter();
std::replace(filterWords.begin(), filterWords.end(), ',', ' '); ///replace ',' with ' '
vector<int> array;
stringstream ss(filterWords);
int temp;
while (ss >> temp)
array.push_back(temp); ///create an array of filtered words
And I'd like to have just one if statement for checking that string against the array, instead of many of them for checking the string against each keyword I'm having to manually specify in the code. Something like this:
if (finalTextRaw.find(array) != finalTextRaw.npos)
LogMsg("Found filtered word");
Of course, that syntax doesn't work, and it's surely more complicated than that, but hopefully you get the idea: if any words from my array appear anywhere in that string, that string should be ignored and a log message printed instead.
Any ideas how I might fashion such a function? I'm guessing it's going to necessitate some kind of loop.
Borrowing from Thomas's answer, a ranged for loop offers a neat solution:
for (const auto &word : words)
if (finalTextRaw.find(word) != std::string::npos)
// word is found.
// do stuff here or call a function.
break; // stop the loop.
As pointed out by Thomas, the most efficient way is to split both texts into a list of words. Then use std::set_intersection to find occurrences in both lists. You can use std::vector as long it is sorted. You end up with O(n*log(n)) (with n = max words), rather than O(n*m).
Split sentences to words:
auto split(std::string_view sentence) {
auto result = std::vector<std::string>{};
auto stream = std::istringstream{};
std::istream_iterator<std::string>(), std::back_inserter(result));
return result;
Find words existing in both lists. This only works for sorted lists (like sets or manually sorted vectors).
auto intersect(std::vector<std::string> a, std::vector<std::string> b) {
std::sort(a.begin(), a.end());
std::sort(b.begin(), b.end());
auto result = std::vector<std::string>{};
b.cbegin(), b.cend(),
return result;
Example of how to use.
int main() {
const auto result = intersect(split("hello my name is mister raw"),
split("this is the final raw text"));
for (const auto& word: result) {
// do something with word
Note that this makes sense when working with large or undefined number of words. If you know the limits, you might want to use easier solutions (provided by other answers).
You could use a fundamental, brute force, loop:
unsigned int quantity_words = array.size();
for (unsigned int i = 0; i < quantity_words; ++i)
std::string word = array[i];
if (finalTextRaw.find(word) != std::string::npos)
// word is found.
// do stuff here or call a function.
break; // stop the loop.
The above loop takes each word in the array and searches the finalTextRaw for the word.
There are better methods using some std algorithms. I'll leave that for other answers.
Edit 1: maps and association
The above code is bothering me because there are too many passes through the finalTextRaw string.
Here's another idea:
Create a std::set using the words in finalTextRaw.
For each word in your array, check for existence in the set.
This reduces the quantity of searches (it's like searching a tree).
You should also investigate creating a set of the words in array and finding the intersection between the two sets.

C++ - Get the "difference" of 2 strings like git

I'm currently working on a project which includes a Win32 console program on my Windows 10 PC and an app for my Windows 10 Mobile Phone. It's about controlling the master and audio session volumes on my PC over the app on my Windows Phone.
The "little" problem I have right now is to get the "difference" between 2 strings.
Let's take these 2 strings for example:
std::string oldVolumes = "MASTER:50:SYSTEM:50:STEAM:100:UPLAY:100";
std::string newVolumes = "MASTER:30:SYSTEM:50:STEAM:100:ROCKETLEAGUE:80:CHROME:100";
Now I want to compare these 2 strings. Lets say I explode each string to a vector with the ":" as delimiter (I have a function named explode to cut the given string by the delimiter and write the string before into a vector).
Good enough. But as you can see, in the old string there's UPLAY with the value 100, but it's missing in the new string. Also, there are 2 new values (RocketLeague and Chrome), which are missing in the old one. But not only the "audio sessions/names" are different, the values are different too.
What I want now is for each session, which is in both strings (like master and system), to compare the values and if the the new value is different to the old one, I want to append this change into another string, like:
std::string volumeChanges = "MASTER:30"; // Cause Master is changed, System not
If there's a session in the old string, but not in the new one, I want to append:
std::string volumeChanges = "MASTER:30:REMOVE:UPLAY";
If there's a session in the new one, which is missing in the old string, I want to append it like that:
The volumeChanges string is just to show you, what I need. I'll try to make a better one afterwards.
Do you have any ideas of how to implement such a comparison? I don't need a specific code example or something, just some ideas of how I could do that in theory. It's like GIT at least. If you make changes in a text file, you see in red the deleted text and in green the added one. Something similar to this, just with strings or vectors of strings.
Lets say I explode each string to a vector with the ":" as delimiter (I have a function named explode to cut the given string by the delimiter and write the string before into a vector).
I'm going to advise you further extend that logic to separate them into property objects that discretely maintain a name + value:
struct property {
std::string name;
in32_t value;
bool same_name(property const& o) const {
return name ==;
bool same_value(property const& o) const {
return value == o.value;
bool operator==(property const& o) const {
return same_name(o) && same_value(o);
bool operator<(property const& o) const {
if(!same_name(o)) return name <;
else return value < o.value;
This will dramatically simplify the logic needed to work out which properties were changed/added/removed.
The logic for "tokenizing" this kind of string isn't too difficult:
std::set<property> tokenify(std::string input) {
bool finding_name = true;
property prop;
std::set<property> properties;
while (input.size() > 0) {
auto colon_index = input.find(':');
if (finding_name) { = input.substr(0, colon_index);
finding_name = false;
else {
prop.value = std::stoi(input.substr(0, colon_index));
finding_name = true;
if(colon_index == std::string::npos)
input = input.substr(colon_index + 1);
return properties;
Then, the function to get the difference:
std::string get_diff_string(std::string const& old_props, std::string const& new_props) {
std::set<property> old_properties = tokenify(old_props);
std::set<property> new_properties = tokenify(new_props);
std::string output;
//We first scan for properties that were either removed or changed
for (property const& old_property : old_properties) {
auto predicate = [&](property const& p) {
return old_property.same_name(p);
auto it = std::find_if(new_properties.begin(), new_properties.end(), predicate);
if (it == new_properties.end()) {
//We didn't find the property, so we need to indicate it was removed
output.append("REMOVE:" + + ':');
else if (!it->same_value(old_property)) {
//Found the property, but the value changed.
output.append(it->name + ':' + std::to_string(it->value) + ':');
//Finally, we need to see which were added.
for (property const& new_property : new_properties) {
auto predicate = [&](property const& p) {
return new_property.same_name(p);
auto it = std::find_if(old_properties.begin(), old_properties.end(), predicate);
if (it == old_properties.end()) {
//We didn't find the property, so we need to indicate it was added
output.append("ADD:" + + ':' + + ':' + std::to_string(new_property.value) + ':');
//The previous loop detects changes, so we don't need to bother here.
if (output.size() > 0)
output = output.substr(0, output.size() - 1); //Trim off the last colon
return output;
And we can demonstrate that it's working with a simple main function:
int main() {
std::string diff_string = get_diff_string("MASTER:50:SYSTEM:50:STEAM:100:UPLAY:100", "MASTER:30:SYSTEM:50:STEAM:100:ROCKETLEAGUE:80:CHROME:100");
std::cout << "Diff String was \"" << diff_string << '\"' << std::endl;
Which yields an output (according to
Which, although the contents are in a slightly different order than your example, still contains all the correct information. The contents are in different order because std::set implicitly sorted the attributes by name when tokenizing the properties; if you want to disable that sorting, you'd need to use a different data structure which preserves entry order. I chose it because it eliminates duplicates, which could cause odd behavior otherwise.
In this particular instance, you could do it as follows:
Split the old and new strings by the delimiter, and store the results in a vector.
Loop over the vector with the old data. Look for each word in the vector with new data: e.g. find("MASTER").
If not found add "REMOVE:MASTER" to your results.
If found, compare the numbers and add it to the results if it has been changed.
The added string can be found by looping over the new string and searching for the words in the old string.
I suggest that you enumerate some features (in your case for example: UPLAY present, REMOVE is present, ...)
for every one of those assign a weight if the two strings differs for the given feature.
At the end sum up weights for the features presents in one string and absent in the other and get a number.
This number should represent what you are looking for.
You can adjust weights until you are satisfied with the result.
Maybe my answer will give you some new thoughts. In fact, by tweaking the current code, you can find all the missing words.
std::vector<std::string> splitString(const std::string& str, const char delim)
std::vector<std::string> out;
std::stringstream ss(str);
std::string s;
while (std::getline(ss, s, delim)) {
return out;
std::vector<std::string> missingWords(const std::string& first, const std::string& second)
std::vector<std::string> missing;
const auto firstWords = splitString(first, ' ');
const auto secWords = splitString(second, ' ');
size_t i = 0, j = 0;
for(; i < firstWords.size();){
auto findSameWord = std::find(secWords.begin() + j, secWords.end(), firstWords[i]);
if(findSameWord == secWords.end()) {
} else {
j = distance(secWords.begin(), findSameWord);
return missing;

Working with big text files

I have a file in following format:
The number between brackets presents id of the entity. There're around 4500 entites. I need to parse through all entites and pick the ones matching my parameters and values. Size of file is around 20mb. My first approach was to reading file line by line and storing them in a struct array like:
struct Component{
std::string parameter;
std::string value;
struct Entity{
std::string id;
std::list<Component> components;
std::list<Entity> g_entities;
But this approach took enormous amount of memory and was very slow. I've also tried storing only the ones that match my parameters/values. But that also was really slow and took quite some memory. Ideally i would like to store all data in memory so that i won't have to load the file everytime i need to filter my parameters/values if it's possible with reasonable amount of memory usage.
Edit 1:
I read file line by line:
std::ifstream readTemp(filePath);
std::stringstream dataStream;
dataStream << readTemp.rdbuf();
while (std::getline(dataStream, line)){
if (line.find('[') != std::string::npos){
// Create Entity
Entity entity;
// Set entity id = line.substr(line.find('[') + 1, line.find(']') - 1);
// Read all lines until EnumEnd=0
while (1){
std::getline(dataStream, line);
// Break loop if end of entity
if (line.find("EnumEnd=0") != std::string::npos){
if (CheckMatch(entity))
Component comp;
int pos_eq = line.find('=');
comp.parameterId = line.substr(0, pos_eq);
comp.value = line.substr(pos_eq + 1);
PS: After your edit. and Comment concerning memory consumption
500MB / 20MB = 25.
If each line is 25 chars long, the memory consumption looks ok.
OK you could use a look-up table for mapping parameter-names to numbers.
If the names-set is small, this will save the consumption up to 2 times.
Your data structure could look like this:
std::map<int, std::map<int, std::string> > my_ini_file_data;
std::map<std::string, int> param_to_idx;
(provided the parameter names within sections (entities as you call it) are not unique)
Putting the data is:
std::string param = "Param";
std::string value = "Val";
int entity_id = 0;
if ( param_to_idx.find(param) == param_to_idx.end() )
param_to_idx[param] = param_to_idx.size();
my_ini_file_data[entity_id][ param_to_idx[param] ] = value;
getting the data is:
value = my_ini_file_data[entity_id][ param_to_idx[param] ];
If the values-set is also considerably smaller than the number of entries,
you could even map values to numbers:
std::map<int, std::map<int, int> > my_ini_file_data;
std::map<std::string, int> param_to_idx;
std::map<std::string, int> value_to_idx;
std::map<int, std::string> idx_to_value;
Putting the data is:
std::string param = "Param";
std::string value = "Val";
int entity_id = 0;
if ( param_to_idx.find(param) == param_to_idx.end() )
param_to_idx[param] = param_to_idx.size();
if ( value_to_idx.find(value) == value_to_idx.end() )
int idx = value_to_idx.size();
value_to_idx[value] = idx;
idx_to_value[idx] = value;
my_ini_file_data[entity_id][ param_to_idx[param] ] = value_to_idx[value];
getting the data is:
value = idx_to_value[my_ini_file_data[entity_id][ param_to_idx[param] ] ];
Hope, this helps.
Initial answer
Concerning memory, I wouldn't care unless you have a kind of embedded system with very small memory.
Concerning the speed, I could give you some suggestions:
Find out, what is the bottleneck.
Use std::list! Using std::vector you re-initialize the memory each time the vector grows. If for some reason you need a vector at the end, create the vector reserving the requires number of entries, which you'll get by calling list::size()
Write a while loop, there you only call getline. If this alone is
already slow, read the entire block at once, create a reader-stream
out of the char* block and read line by line from the stream.
If the speed of the simple reading is OK, optimize your parsing code. You
can reduce the number of find-calls by storing the position. e.g.
int pos_begin = line.find('[]');
if (pos_begin != std::string::npos){
int pos_end = line.find(']');
if (pos_end != std::string::npos){ = line.substr(pos_begin + 1, pos_begin - 1);
// Read all lines until EnumEnd=0
while (1){
std::getline(readTemp, line);
// Break loop if end of entity
if (line.find("EnumEnd=0") != std::string::npos){
if (CheckMatch(entity))
Component comp;
int pos_eq = line.find('=');
comp.parameter= line.substr(0, pos_eq);
comp.value = line.substr(pos_eq + 1);
Depending on how big your entities are, check if CheckMatch is slow. The smaller the entities, the slower the code - in this case.
You can use less memory by interning your params and values, so as not to store multiple copies of them.
You could have a map of strings to unique numeric IDs, that you create when loading the file, and then just use the IDs when querying your data structure. At the expense of possibly slower parsing initially, working with these structures afterwards should be faster, as you'd only be matching 32-bit integers rather than comparing strings.
Sketchy proof of concept for storing each string once:
#include <unordered_map>
#include <string>
#include <iostream>
using namespace std;
int string_id(const string& s) {
static unordered_map<string, int> m;
static int id = 0;
auto it = m.find(s);
if (it == m.end()) {
m[s] = ++id;
return id;
} else {
return it->second;
int main() {
// prints 1 2 2 1
cout << string_id("hello") << " ";
cout << string_id("world") << " ";
cout << string_id("world") << " ";
cout << string_id("hello") << endl;
The unordered_map will end up storing each string once, so you're set for memory. Depending on your matching function, you can define
struct Component {
int parameter;
int value;
and then your matching can be something like myComponent.parameter == string_id("some_key") or even myComponent.parameter == some_stored_string_id. If you want your strings back, you'll need a reverse mapping as well.

"String Iterators Incompatible" error message when running bubble sort program

I am a beginner trying to bubble sort a vector of objects in C++. My goal is to sort the vector by member variables of each object element's member variable. So in the end, I would like the attributes off all the vector elements to be the same, just sorted in a different order. When I run the program, I get the following message:
Here is my code:
void sortInventory(vector<Vehicle> &carList)
bool swap;
Vehicle temp;
swap = false;
for (int count = 0; count < carList.size(); count++)
transform(carList[count].getVIN().begin(), carList[count].getVIN().end(), carList[count].getVIN().begin(), ::tolower);
if (carList[count].getVIN() > carList[count + 1].getVIN())
temp = carList[count];
carList[count] = carList[count + 1];
carList[count + 1] = temp;
swap = true;
} while (swap);
Here is my class declaration:
class Vehicle
string VIN;
string getVIN();
void setVIN(string);
Here is my class implementation:
string Vehicle::getVIN()
{ return VIN; }
void Vehicle::setVIN(string input)
{ VIN = input; }
By the way, I am aware that I am not using efficient methods, but I am just starting to learn the language and I am learning to write the code.
I asked a question similar to this here. However, none of the answers got me to where I wanted to go, although I feel like I am going in the right direction.
This line of code attempts to convert the string for the VIN into lowercase text, but fails:
Each call to getVIN() results in a separate string instance. Since the iterators are not from the same string instance, the failure is the result.
You don't show how you populate your carList, but one possible way to fix this is to save the VIN in lowercase at the time you save the VIN in the carList.
As jxh says, your transform line fails because you are making iterators to separate string objects. Why not try making the transform a separate routine?
If you want to be fancy you can define it inside the sort routine as a lambda function. Or you can just make it a separate routine defined separately.
// returns a lower case version of the string
std::string lower_case(std::string VIN_number){
auto begin = std::begin(VIN_number);
auto end = std::end(VIN_number);
// Your code acting on one fixed string
std::transform(begin, end, begin, ::tolower);
return VIN_number;
Then when you do your comparison, do something like
if ( lower_case(carList[count].getVIN()) > lower_case(carList[count + 1]).getVIN()) )

Key Value Pair implementation C

I have a .txt file that stores student names along with two of their best marks. If a student for some reason, i.e. dropping out of course, fails to pass a course, then no marks are recorded.
My file looks like this
Samuel= 90.5, 95.9
Bill= 25.2, 45.3
Anthony= 99.9, 12.5
Basically, Tim, Mark and Rob failed the course and hence their marks are not stored. Also to differentiate between a failed mark and a pass mark, I have used the = symbol. Basically, I want to store all the names into memory alongside their associated values.
This is my implementation, however it is flawed in the sense that I have declared a double *marks[2] array to store all six marks, when clearly it will only store 3. I am having trouble storing the values into the double array.
This is my code...
istream& operator>> (istream& is, Students& student)
student.names = new char*[6];
for (int i=0; i<10; i++)
student.names[i] = new char[256];
student.marks[i] = new double[2];
is.getline(student.names[i], sizeof(student.names));
for (int j=0; j < 256; j++)
if((student.names[i][j] == '='))
int newPos = j + 1;
for (int k = newPos; k < 256; k++)
student.names[i][k - newPos] = student.names[k];
How can I go about storing the values of the students with the valid marks? Please, no use of vectors or stringstreams, just pure C/C++ char arrays
You have a few options, you could use a struct like so
struct Record {
std::string name;
double marks[2];
And then stick that into something like std::vector<Record> or an array of them like
Records *r = new Records[1000];
You could also keep three different arrays (either automatically allocated or dynamically allocated or even std::vector), one to hold the name, two to hold the marks.
In each case you would just indicate a fail by some thing like the marks being zero.
Also, you can use
std::string name;
double first, second;
std::cin >> name;
if (name[name.size() - 1] == '=')
std::cin >> first >> second;
And this will parse the input like you want it to for a single line. Once you've done that you can wrap the whole thing in a loop while sticking the values you get into some sort of data structure that I already described.
Hope that gives you a few ideas on where to go!
Here's a strategy:
First of all you need to implement a struct to hold the key-value pair, I suggest the following:
struct Student {
char name[30];
double marks[2];
Note that you can give the dimension of the char array inside the struct if you know that the length will never be higher. (which is given here)
Now what you need is to know how many lines are in your ifstream, you could make a loop of is.getline() calls to get there. (don't forget to call is.clear() and is.seekg(0) when finished, to be at the beginning for the real loop)
When you know how many lines are in your ifstream you can use dynamically cast the Array of your struct with the actual length of your file:
Student * students = new Student[lineCount]; // line count of is
As you can see, there's no need to have a std::vector to hold the values. Consider that the getline() loop may be an overkill just to get the line count, alternatively you could give a length to Students at compile-time by making an array with a length that will never be overpassed.
(e.g. Student students[128];)
Now you need to parse the lines, i'd suggest you make a loop like the following (line by line):
// int parseLine ( char* line, char* name, double* marks ) { ...
bool hasMarks=false;
int iLine=0; // Line pos iterator
int iName=0; // Name pos iterator
char mk1Str[4]; // String buffer, Mark 1
char mk2Str[4]; // String buffer, Mark 2
for(int iMark=0;iMark<4;iMark++)
// ^^ You can harcode the offsets (2,8) since they don't change
Now what you need is to parse the marks to double values, for this you could use the atof() function that works with char*. The bool hasMarks helps you know if a student has defined marks, if not, you could define dummy values like -1 for the mark fields of your struct.
I think this works quite well for your case...