Implementing Parallel Stream to get all the student details

Implementing Parallel Stream to get all the student details - list

I wrote a service to fetch the complete details of students of all class. The service is working fine, but the issue is that it is taking too long to fetch the complete details. My code is as given below
List<StudentsDetails> allStudentsDetails = Lists.newArrayList();
List<ClassDetails> allClassDetailsDetails = getAllClassDetails();
allClassDetailsDetails.forEach(classDetailsDetails-> {
StudentsDetails studentsDetails = new StudentsDetails();
studentsDetails.setClassName(classDetailsDetails.getClassName());
List<Student> allStudents = studentService.getAllStudentsByClass(classDetailsDetails.getClassName());
studentsDetails.setAllStudents(allStudents);
allStudentsDetails.add(studentsDetails);
});
My question
Is it good to use Java8 parallel stream in this scenario, does that really improves the performance, also how can I implement parallel stream in this approach.

This is how you parallelize the operation, but the performance benefits cannot be guaranteed, as it also depends on the external database.
Function<ClassDetails, StudentsDetails> fetchStudentDetials = classDetailsDetails -> {
StudentsDetails studentsDetails = new StudentsDetails();
studentsDetails.setClassName(classDetailsDetails.getClassName());
List<Student> allStudents = studentService.getAllStudentsByClass(classDetailsDetails.getClassName());
studentsDetails.setAllStudents(allStudents);
return studentsDetails;
};
List<StudentsDetails> allStudentsDetails = allClassDetailsDetails
.parallelStream()
.map(fetchStudentDetials)
.collect(toList());

The code slow comes from DB access.
First, You should index your DB.
Second, if your student database is not very high, I recommend you to get it in one shot:
List<Student> findStudentByClassNameIn(List<String> classNames);
Then map your students to your class.
// Map class name with its student
Map<String, List<Student>> stdMap = listStudents().stream().collect(groupingBy(Student::getClassName), toList());
List<StudentsDetails> allStudentsDetails = allClassDetailsDetails.stream()
.parallel()
.map(c -> new StudentDetails(stdMap.get(c.getClassName()))
.collect(toList());
Third, if your student data is too large to get in one shot, then break down to get for example 30 classes data at once.

Try my library AbacusUtil. It should be tens of times faster than the code in original question:
List<StudentsDetails> allStudentsDetails = StreamEx.of(allClassDetailsDetails).parallel(maxThreadNum) // genernally maxThreadNum = 30 is a good choice.
.map(c -> new StudentsDetails(c.getClassName(), studentService.getAllStudentsByClass(c.getClassName())))
.toList();

Related

How can I get my price vector to = values inside my json array

Hello I am pretty new to json parsing and parsing in general so I am wondering what is the best way I can assaign the correct values for the price of the underlying stock I am looking at. Below is an example of the code I am working with and comments next to them showing kinda what Im confused about
Json::Value chartData = IEX::stocks::chart(symbolSearched);
int n = 390;
QVector<double> time(n), price(n);
//Time and Date Setup
QDateTime start = QDateTime(QDate::currentDate());
QDateTime local = QDateTime::currentDateTime();
QDateTime UTC(local);
start.setTimeSpec(Qt::UTC);
double startTime = start.toTime_t();
double binSize = 3600*24;
time[0] = startTime;
price[0] = //First market price of the stock at market open (930AM)
for(int i = 0; i < n; i++)
{
time[i] = startTime + 3600*i;
price[i] = //Stores prices of specific company stock price all the way until 4:30PM(Market close)
}
the charData is the json output with all the data,
.
I am wondering how I can get the various values inside the json and store them, and also since its intraday data how can I get it where it doesnt store p[i] if there is no data yet due to it being early in the day, and what is the best way to update this every minute so it continously reads in real time data?

Hope I understood correctly (correct me if not) and you just want to save some subset of json data to your QVector. Just iterate through all json elements:
for (int idx = 0; index < chartData.size(); ++idx) {
time[idx] = convert2Timestamp(chartData[idx]["minute"]);
price[idx] = convert2Price(chartData[idx]["high"], chartData[idx]["low"],
chartData[idx]["open"], chartData[idx]["close"], chartData[idx]["average"]);
}
Then you should define what is the logic of convert2Timestamp (how would you like to store the time information) and the logic of convert2Price - how would you like to store the price info, only highest/lowest, only the closing value, maybe all of these numbers grouped together in a structure/class.
Then if you want to execute similar logic every minute to update your locally recorded data, maybe instead of price[idx] = /* something */ you should push additional items that are new to your vector.
If there is a possibility that some of the json keys might not exist, in JsonCPP you could provide a default value e.g. elem.get(KEY, DEFAULT_VAL).

Machine Learning Algorithm using recursion

I am currently working on a very beginners version of the ID3 machine learning algorithm. I am stuck on how to recursively call my build_tree function to actually make the rest of the decision tree and output it in a nice format. I have calculated gains, entropies, gain ratios, etc. but I have no clue how to integrate recursion into my function.
I am given a data set, which after doing all the calculations mentioned above, have split it into two datasets. Now I need to be able to recursively call it until both the left and right data sets become pure [which can easily be checked by a function i wrote called dataset.is_pure()], all while keeping track of the threshold at each node. I know that all my calculations and split methods are working as I have done individuual testing on them. It is just the recursive part that I am having trouble with.
Here is my build_tree function that I am having a recursion nightmare with. I am currently working in a linux environment with the g++ compiler. The code I have right now compiles, but when run gives me a segmentation error. Any and all help would be greatly appreciated!
struct node
{
vector<vector<string>> data;
double atrb;
node* parent;
node* left = NULL;
node* right = NULL;
node(node* parent) : parent(parent) {}
};
node* root = new node(NULL);
void build_tree(node* current, dataset data_set)
{
vector<vector<string>> l_d;
vector<vector<string>> r_d;
double global_entropy = calc_entropy(data_set.get_col(data_set.n_col()-1));
int best_col = this->get_best_col(data_set, global_entropy);
hash_map selected_atrb(data_set.n_row(), data_set.truncate(best_col));
double threshold = get_threshold(selected_atrb, global_entropy);
cout << threshold << "\n";
split_data(threshold, best_col, data_set, l_d, r_d);
dataset right_data(r_d);
dataset left_data(l_d);
right_data.delete_col(best_col);
left_data.delete_col(best_col);
if(left_data.is_pure())
return;
else
{
node* new_left = new node(current);
new_left->atrb = threshold;
current->left = new_left;
new_left->data = l_d;
return build_tree(new_left, left_data);
}
if(right_data.is_pure())
return;
else
{
node* new_right = new node(current);
new_right->atrb = threshold;
current->right = new_right;
new_right->data = r_d;
return build_tree(new_right, right_data);
}
}
id3(dataset data)
{
build_tree(root, data);
}
};
This is only a part of my class. If you wish to see any other code, just let me know!

Regards,
I will explain to you with pseudocodigo how the reclusive function works, I will also leave you the code that you make in javascript for the implementation of said algorithm.
Before going into detail, I will mention certain concepts and classes you use.
Attribute: Characteristic of the data set, it is usually the name of a column of the data set.
Class: Decision characteristic, it is generally of binary value and usually it is always the last column of the data set.
Value: Possible value of the attribute in the data set, for example (Sunny, Cloudy, Rainy)
Tree: classes that have a number of nodes associated with each other.
Node: Entity in charge of storing the attribute (question), also has a list with the arcs.
Arc: Contains the value of an attribute and has an attribute that will contain the following child node.
Leaf : Contains a class. This node is the result of a decision, for example (Yes or No).
Best feature: Attribute with the highest information gain.
Function to create the tree from a set of data:
Obtain the values of a class.
Evaluate if there is only one type of class in the data set, for example (Yes).   
If true, then we create a Leaf object and return this object
Obtain the information gain of each current attribute.
Choose the attribute with the highest information gain.
Create a node with the best feature.
Obtain the values of the best feature.
Iterate the list of those values.
Filter the list, so that there are only records with the value that we are iterating (save it in a variable temporary)
Create an Arc with this value.
     - Assign the following attribute to the Arc: (Here comes the recursion) call again the same only function that you send (the filtered list of records, the class, the list of attributes without the best feature, the list of general attributes without the attributes of the best feature)
Add the arc to the node.
Return the node.
This would be the segment of code that is responsible for creating the tree
let crearArbol = (ejemplosLista, clase, atributos, valores) => {
let valoresClase = obtenerValoresAtributo(ejemplosLista, clase);
if (valoresClase.length == 1) {
autoIncremental++;
return new Hoja(valoresClase[0], autoIncremental);
}
if (atributos.length == 0) {
let claseDominante = claseMayoritaria(ejemplosLista);
return new Atributo();
}
let gananciaAtributos = obtenerGananciaAtributos(ejemplosLista, valores, atributos);
let atributoMaximo = atributos[maximaGanancia(gananciaAtributos)];
autoIncremental++;
let nodo = new Atributo(atributoMaximo, [], autoIncremental);
let valoresLista = obtenerValoresAtributo(ejemplosLista, atributoMaximo);
valoresLista.forEach((valor) => {
let ejemplosFiltrados = arrayDistincAtributos(ejemplosLista, atributoMaximo, valor);
let arco = new Arco(valor);
arco.sigNodo = crearArbol(ejemplosFiltrados, clase, [...eliminarAtributo(atributoMaximo, atributos)], [...eliminarValores(atributoMaximo, valores)]);
nodo.hijos.push(arco);
});
return nodo;
};
Unfortunately, the code is only in Spanish.
This is the repository that contains my project with this implementation Source code of id3

How do I return value at the end of iteration on list using lambda, Java

Trying to iterate over a list of Objects to get the value of the last parameter from the iteration.
long lastSeen = 0L;
for(Object o : list) {
lastSeen = o.getLastSeenId();
}
// will make use of the lastSeen.
I cant do the same with lambda,
long lastSeen = 0L;
list.stream().forEach(o-> {
lastSeen = o.getLastSeenId();
});
as I will end with this compile-time.
Local variable lastSeenId defined in an enclosing scope must be final
or effectively final
I could do this to read the largest, but don't want to:
Set<Long> set = new HashSet<>();
list.stream().forEach(o-> {
set.add(mergeUser.getLastSeenId());
});
Is there a better way?
Reason to do this is to monitor the last seen value loop processed(if in case of an exception occurred to terminate).

I don't see any reason why you need to iterate the entire source to just to retrieve the last id. This could be accomplish without iteration:
long lastSeen = list.size() > 0? list.get(list.size()-1).getLastSeenId() : 0L;
However, if you plan to do some other stuff in each iteration not just the aforementioned then I'd stick with your current imperative approach. Attempting to use streams here is not a good fit and doesn't gain you anything.

First reduce the set of elements to get the last element from the Stream. Then use map to get the desired value from the last element after the reduction.
final long lastSeenValue = objects.stream()
.reduce((first, second) -> second)
.map(LastSeen::getLastSeenId)
.orElse(null);
However in my opinion, streams might not be a better choice for your problem statement. In that case don't hesitate to fall back to imperative approach.

In case you would like to keep using Stream, you may use AtomicLong to hold the lastSeenValue.
final AtomicLong lastSeen = new AtomicLong(0);
list.stream().forEach(o-> {
lastSeen.set(o.getLastSeenId());
});

First of all, I would not iterate the list for fetching the last element.
If in case, if you want to stream and get the last element, "one.util.StreamEx" has a better way to do that:
StreamEx.of(list)
.collect(MoreCollectors.last())
.get();

How do I access data in a class stored in a node through queue pointers?

In my current assignment I'm having trouble figuring out how I can access this particular piece of data.
To start off, the assignment calls for data to be pulled from a file to simulate a store operating normally. The only thing is, the data being pulled is about the customer. Specifically it's when the customer enters a queue and how long it takes for the cashiers to process their order.
Right now how I have customer data stored in an array of classes.
for(int i = 0; i < entries; i++)
{
/* !!!!!IMPORTANT!!!!!
* The way this program extracts file data assumes that the file will
follow a specific format
* results will vary heavily based on how the file is set up
*
* before docking points please make sure that the file follows the format
*
* "Number of entries"
* "Customer Number", "Arrival Time", "Service Time"
* "Customer Number", "Arrival Time", "Service Time"
* "Customer Number", "Arrival Time", "Service Time"
*/
xfiles >> dataEntry;
fileData[i].setNumber(dataEntry);
//inserts a number from the file into the "number" variable in the customer class.
xfiles >> dataEntry;
fileData[i].setArrival(dataEntry);
//inserts a number from the file into the "arrival" variable in the customer class.
xfiles >> dataEntry;
fileData[i].setServTime(dataEntry);
//inserts a number from the file into the "servTime" variable in the customer class.
}
xfiles.close();
It's not included with the code but there is a line that takes in account for entries earlier in the program.
In my next block I have to queue and process customers simultaneously through a period of time. I have an idea how what I should do to queue them but I'm not too sure how I should proceed on processing them. For what I know right now I believe I might want to do a conditional statement to check if a certain customer from the array have been queued or not.
The piece of data I'm currently trying to access would be the arrival time that was stored in the class.
So something like,
fileData[i].returnArrival();
but since that class is stored in a queue I'm not sure how I would be able to access it.
Right now how I have everything queued is
for(int x = 0; x < 570; x++)
{
if(cusTime == x)
{
if(scully.isFull() = false)
scully.enqueue(fileData[cusTime]);
else if(mulder.isFull() = false)
mulder.enqueue(fileData[cusTime]);
else if(skinner.isFull() = false)
skinner.enqueue(fileData[cusTime]);
else
cout << "queues are full, disposing..\n";
}
cusTime++;
}
At first I thought it would be something like
scully.returnFront()->temp->returnClass()->fileData.returnArrival();
but I'm unsure about since temp is only a pointer declared within the queue class.
There was another suggestion from a friend of mine who suggested it would probably be something like this instead, but I ended up getting segmentation errors when I ran the code.
scully.returnFront()->returnClass().returnArrival();

I think it should be the following:
scully.returnFront().returnArrival()
Because you enqueue the items from your array. Thus, returnFront() retrieves an item on which your methods should be possible.

After discussing it a bit with the professor and TA the cause of the problem was that the return front function was returning a pointer making it more difficult to access the data within the node. A solution was to have the return front function return the Class associated with the data and have the return statement return a pointer that's pointing to the class function that returns the class stored in the node.
so
Node *returnFront();
was changed to
Customer returnFront();
changes within the function was
return front;
to
return front->returnClass();
these changes made it easier to access the Customer class data from inside the main file. So I was able to instantiate a new place holding variable for the class.
Customer scullyTemp;
And after that store the data from inside the class that was stored in the node through an assignment statement.
scullyTemp = scully.returnFront();
scullyTemp.returnArrival();
It might be a little more complicated than it needs to be, but for now it does what I need it to do.

priority_queue becomes extremely slow in debug mode

I am currently writing an A* pathfinding algorithm for a game and came across a very strange performance problem regarding priority_queue's.
I am using a typical 'open nodes list', where I store found, but yet unprocessed nodes. This is implemented as an STL priority_queue (openList) of pointers to PathNodeRecord objects, which store information about a visited node. They are sorted by the estimated cost to get there (estimatedTotalCost).
Now I noticed that whenever the pathfinding method is called, the respective AI thread gets completely stuck and takes several (~5) seconds to process the algorithm and calculate the path. Subsequently I used the VS2013 profiler to see, why and where it was taking so long.
As it turns out, the pushing to and popping from the open list (the priority_queue) takes up a very large amount of time. I am no expert in STL containers, but I never had problems with their efficiency before and this is just weird to me.
The strange thing is that this only occurs while using VS's 'Debug' build configuration. The 'Release' conf. works fine for me and the times are back to normal.
Am I doing something fundamentally wrong here or why is the priority_queue performing so badly for me? The current situation is unacceptable to me, so if I cannot resolve it soon, I will need to fall back to using a simpler container and inserting it to the right place manually.
Any pointers to why this might be occuring would be very helpful!
.
Here is a snippet of what the profiler shows me:
http://i.stack.imgur.com/gEyD3.jpg
.
Code parts:
Here is the relevant part of the pathfinding algorithm, where it loops the open list until there are no open nodes:
// set up arrays and other variables
PathNodeRecord** records = new PathNodeRecord*[graph->getNodeAmount()]; // holds records for all nodes
std::priority_queue<PathNodeRecord*> openList; // holds records of open nodes, sorted by estimated rest cost (most promising node first)
// null all record pointers
memset(records, NULL, sizeof(PathNodeRecord*) * graph->getNodeAmount());
// set up record for start node and put into open list
PathNodeRecord* startNodeRecord = new PathNodeRecord();
startNodeRecord->node = startNode;
startNodeRecord->connection = NULL;
startNodeRecord->closed = false;
startNodeRecord->costToHere = 0.f;
startNodeRecord->estimatedTotalCost = heuristic->estimate(startNode, goalNode);
records[startNode] = startNodeRecord;
openList.push(startNodeRecord);
// ### pathfind algorithm ###
// declare current node variable
PathNodeRecord* currentNode = NULL;
// loop-process open nodes
while (openList.size() > 0) // while there are open nodes to process
{
// retrieve most promising node and immediately remove from open list
currentNode = openList.top();
openList.pop(); // ### THIS IS, WHERE IT GETS STUCK
// if current node is the goal node, end the search here
if (currentNode->node == goalNode)
break;
// look at connections outgoing from this node
for (auto connection : graph->getConnections(currentNode->node))
{
// get end node
PathNodeRecord* toNodeRecord = records[connection->toNode];
if (toNodeRecord == NULL) // UNVISITED -> path record needs to be created and put into open list
{
// set up path node record
toNodeRecord = new PathNodeRecord();
toNodeRecord->node = connection->toNode;
toNodeRecord->connection = connection;
toNodeRecord->closed = false;
toNodeRecord->costToHere = currentNode->costToHere + connection->cost;
toNodeRecord->estimatedTotalCost = toNodeRecord->costToHere + heuristic->estimate(connection->toNode, goalNode);
// store in record array
records[connection->toNode] = toNodeRecord;
// put into open list for future processing
openList.push(toNodeRecord);
}
else if (!toNodeRecord->closed) // OPEN -> evaluate new cost to here and, if better, update open list entry; otherwise skip
{
float newCostToHere = currentNode->costToHere + connection->cost;
if (newCostToHere < toNodeRecord->costToHere)
{
// update record
toNodeRecord->connection = connection;
toNodeRecord->estimatedTotalCost = newCostToHere + (toNodeRecord->estimatedTotalCost - toNodeRecord->costToHere);
toNodeRecord->costToHere = newCostToHere;
}
}
else // CLOSED -> evaluate new cost to here and, if better, put back on open list and reset closed status; otherwise skip
{
float newCostToHere = currentNode->costToHere + connection->cost;
if (newCostToHere < toNodeRecord->costToHere)
{
// update record
toNodeRecord->connection = connection;
toNodeRecord->estimatedTotalCost = newCostToHere + (toNodeRecord->estimatedTotalCost - toNodeRecord->costToHere);
toNodeRecord->costToHere = newCostToHere;
// reset node to open and push into open list
toNodeRecord->closed = false;
openList.push(toNodeRecord); // ### THIS IS, WHERE IT GETS STUCK
}
}
}
// set node to closed
currentNode->closed = true;
}
Here is my PathNodeRecord with the 'less' operator overloading to enable sorting in priority_queue:
namespace AI
{
struct PathNodeRecord
{
Node node;
NodeConnection* connection;
float costToHere;
float estimatedTotalCost;
bool closed;
// overload less operator comparing estimated total cost; used by priority queue
// nodes with a higher estimated total cost are considered "less"
bool operator < (const PathNodeRecord &otherRecord)
{
return this->estimatedTotalCost > otherRecord.estimatedTotalCost;
}
};
}

std::priority_queue<PathNodeRecord*> openList
I think the reason is that you have a priority_queue of pointers to PathNodeRecord.
and there is no ordering defined for the pointers.
try changing it to std::priority_queue<PathNodeRecord> first, if it makes a difference then all you need is passing on your own comparator that knows how to compare pointers to PathNodeRecord, it will just dereference the pointers first and then do the comparison.
EDIT:
taking a wild guess about why did you get an extremely slow execution time, I think the pointers were compared based on their address. and the addresses were allocated starting from one point in memory and going up.
and so that resulted in the extreme case of your heap (the heap as in data structure not the memory part), so your heap was actually a list, (a tree where each node had one children node and so on).
and so you operation took a linear time, again just a guess.

You cannot expect a debug build to be as fast as a release optimized one, but you seems to do a lot of dynamic allocation that may interact badly with the debug runtime.
I suggest you to add _NO_DEBUG_HEAP=1 in the environment setting of the debug property page of your project.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Implementing Parallel Stream to get all the student details - list

Related

How can I get my price vector to = values inside my json array

Machine Learning Algorithm using recursion

How do I return value at the end of iteration on list using lambda, Java

How do I access data in a class stored in a node through queue pointers?

priority_queue becomes extremely slow in debug mode

Categories

Resources