How can you simulate a SQL join in C++ using STL and or Boost - c++

How can you simulate a SQL join between two dynamic data sets ( i.e. data is obtained at runtime ) using c++.
Example Table A is a 2D vector of vectors ( any STL or Boost data structure is OK ) of students, their names and course numbers. Table B is a 2D vector of vectors ( any STL or Boost data structure is ok) of course number, description and room numbers
//Table A
// Columns: StudentID FirstName LastName CourseNum
std::vector<std::string> a1 = boost::assign::list_of("3490")( "Saundra")( "Bribiesca")( "F100X");
std::vector<std::string> a2 = boost::assign::list_of("1288")( "Guy")( "Shippy")( "F103X");
std::vector<std::string> a3 = boost::assign::list_of("5383")( "Tia")( "Roache")( "F103X");
std::vector<std::string> a4 = boost::assign::list_of("5746")( "Jamie")( "Grunden")( "F101X");
std::vector<std::string> a5 = boost::assign::list_of("2341")( "Emilia")( "Hankinson")( "F120X");
std::vector<std::vector<std::string > > TableA = boost::assign::list_of(a1)(a2)(a3)(a4)(a5);
//Table B
//Columns: CourseNum CourseDesc Room
std::vector<std::string> b1 = boost::assign::list_of("F100X")("Human Biology")("400B");
std::vector<std::string> b2 = boost::assign::list_of("F103X")("Biology and Society")("500B");
std::vector<std::string> b3 = boost::assign::list_of("F101X")("The Dynamic Earth 340A");
std::vector<std::string> b4 = boost::assign::list_of("F120X")("Glaciers, Earthquakes and Volcanoes")("300C");Earthquakes and Volcanoes");
std::vector<std::vector<std::string > > TableB = boost::assign::list_of(b1)(b2)(b3)(b4);
//Table C ( result of joining A and B ) using TableA[3] and TableB[0] as key
//I want to produce a resultset Table C, like this
Table C
StudentID FirstName LastName Room CourseNum CourseDesc
3490 Saundra Bribiesca 400B F100X Human Biology
1288 Guy Shippy 500B F103X Biology and Society
5383 Tia Roache 500B F103X Biology and Society
5746 Jamie Grunden 340A F101X The Dynamic Earth
2341 Emilia Hankinson 300C F120X Glaciers, Earthquakes and Volcanoes

SQL engines use various different techniques to perform joins, depending what indexes are available (or what hashtables it thinks should be created on the fly).
The simplest though is an O(N*M) nested loop over both tables. So to do an inner join you compare every pair of elements one from A and one from B. When you see a match, output a row.
If you need to speed things up, in this case you could create an "index" of table B on its first column, that is a std::multimap with the first column as the key, and a tuple[*] of the rest of the columns as value. Then for each row in A, look up its third column in the index and output one row per match. If the CourseNum is unique in table B, as seems sensible, then you can use a map rather than a multimap.
Either way gets you from O(N*M) to O((N+M)*logM), which is an improvement unless N (the size of table A) is very small. If your college has very many fewer students than courses, something is badly wrong ;-)
[*] where by "tuple" I mean anything that holds all the values - you've been using vectors, and that will do.

Since only answer doesn't have any code...
Please note that design (vector of strings instead of class makes code unreadable)
int main()
{
map<string,std::vector<vector<string > >::const_iterator> mapB;
for(auto it = TableB.cbegin(); it!=TableB.cend(); ++it)
{
mapB[(*it)[0]]=it;// in first map we put primary key and iterator to tableB where that key is
}
assert(mapB.size()== TableB.size());// how unique is primary key?
for_each(TableA.cbegin(), TableA.cend(),
[&mapB] (const vector<string>& entryA )
{
auto itB= mapB.find(entryA.at(3));
if (itB!=mapB.end()) // if we can make "JOIN" we do it
{
auto entryB = itB->second;
cout << entryA.at(0) << " " << entryA.at(1) << " " << entryA.at(2) << " " << entryB->at(2) << " " << entryB->at(0) << " " << entryB->at(1) << endl;
}
});
}
Output:
C:\STL\MinGW>g++ test.cpp &&a.exe
3490 Saundra Bribiesca 400B F100X Human Biology
1288 Guy Shippy 500B F103X Biology and Society
5383 Tia Roache 500B F103X Biology and Society
5746 Jamie Grunden 340A F101X The Dynamic Earth
2341 Emilia Hankinson 300C F120X Glaciers, Earthquakes and Volcanoes

Related

Esper-Wrong sequence of attributes in query results

I am new to Esper and i am working on Storm-Esper collaboration.Through my main class,i send queries to a bolt which contains esper while the esper-bolt sends the tuple which contain the results to a printer bolt.My problem is that,although the result of a query is correct as for the values,the attribute values are not in the correct order.For example,i have a query which selects attributes from a pilot's table: name,surname,airline and i should have the result in the same order.However i get:name,airline,surname.I have tried everything concerning group by and order by.I suppose it must be an Esper's fault when creating the event's map which contains the attributes-values.I paste the main class code and the esper bolt code where the map is processed.Any idea why is that happening is most welcome!
**mainclass**
.addStatements(("insert into pilotStream " +
"select * " +
"from Log.win:time(120 second) A "))
.addStatements(("insert into employeeStream " +
"select * " +
"from Emp.win:time(120 second) A "))
.addStatements(("insert into CombinedEvent "+
"select tick.pilotName as p_name , " +
"tick.pilotSurname as p_surname , " +
"tick.airline as p_airline " +
"from pilotStream.win:time(120 second) as tick, " +
"employeeStream.win:time(120 second) as rom "+
"where tick.airline = rom.employeeAirline "+
))
**espebolt**
Map<String, Object> emap = (Map<String, Object>) newEvent.getUnderlying();
String Event_name = newEvent.getEventType().getName();
//System.out.println(Event_name);
for (Map.Entry<String, Object> entry : emap.entrySet()) {
// String key = entry.getKey();
String val = String.valueOf(entry.getValue()) ;
//System.out.println(key+" :"+val);
//System.out.println(val);
values.add(val);
}
collector.emit(Event_name, toTuple(newEvent, values, false));
values.removeAll(values);
The result should be : source: Esper-Print:2, stream: CombinedEvent, id: {}, [John, Snow, Lufthansa]
Instead,i get:source: Esper-Print:2, stream: CombinedEvent, id: {}, [John, Lufthansa, Snow]
P.S.The toTuple functions simply gets the values of the attributes through the values list of strings and puts them into a tuple which is emitted to printerbolt.In the espebolt code there is some printing in comments which helped me see that the problem is in the map which esper creates internally.
By default Esper generates Map events. This can be changed into object-array events when setting a configuration or with annotations. Map events use "HashMap" and not "LinkedHashMap". The "HashMap" is not ordered when iterating the key-value pairs but takes much less memory. Object-array is ordered. For ordered access to Map events there is the "EventType" that you can get from a statement which returns you the property names in order.

How to split list of strings into two columns and append it to a spire pdf in c#

This is my list of data type string:
List<string> Questions = new List<string>();
I am appending list of static questions ,answers to the question values from model
Questions .Add("Q1? " + " : " + model.value);
Questions .Add("Q2?" + " : " + model.value);
string[][] datasource = new String[Questions .Count][];
for (int i = 0; i < Questions .Count; i++)
{
datasource[i] = Questions [i].Split(';');
}
Appending it to spire pdf table:
PdfTable table = new PdfTable();
table.DataSource = datasource;
MY output:
What type of tax return does the entity file? : 604 --In single column
Expected output:
column1 column2
What type of tax return does the entity file? 604
Kindly note that there's only one set of data in every element of the array "Questions", thus the table only has a single column. If you want the expected output of two columns, please change your code as:
string[] Questions = { "Q1?:;model.value", "Q2?:;model.value" };

get Unique record among Duplicates Using mapReduce

File.txt
123,abc,4,Mony,Wa
123,abc,4, ,War
234,xyz,5, ,update
234,xyz,5,Rheka,sild
179,ijo,6,all,allSingle
179,ijo,6,ball,ballTwo
1) column1,column2,colum3 are primary Keys
2) column4,column5 are comparision Keys
I have a file with duplicate records like above In this duplicate record i need to get only one record among duplicates based on sorting order.
Expected Output:
123,abc,4, ,War
234,xyz,5, ,update
179,ijo,6,all,allSingle
Please help me. Thanks in advance.
You can try the below code:
data = LOAD 'path/to/file' using PigStorage(',') AS (col1:chararray,col2:chararray,col3:chararray,col4:chararray,col5:chararray);
B = group data by (col1,col2,col3);
C = foreach B {
sorted = order data by col4 desc;
first = limit sorted 1;
generate group, flatten(first);
};
In the above code, you can change the sorted variable to choose the column you would like to consider for sorting and the type of sorting. Also, in case you require more than one record, you can change the limit to greater than 1.
Hope this helps.
Questions isn't soo clear , but I understand this is what you need :
A = LOAD 'file.txt' using PigStorage(',') as (column1,column2,colum3,column4,column5);
B = GROUP A BY (column1,column2,colum3);
C = FOREACH B GENERATE FLATTERN(group) as (column1,column2,colum3);
DUMP C;
Or
A = LOAD 'file.txt' using PigStorage(',') as (column1,column2,colum3,column4,column5);
B = DISTINCT(FOREACH A GENERATE column1,column2,colum3);
DUMP B;

How to search in an List and my List is look like : List<Object> myList = new ArrayList<Object>() [duplicate]

This question already has answers here:
How to filter a Java Collection (based on predicate)?
(29 answers)
Closed 6 years ago.
I want to search in a List and my List is look like
List<Employee> oneEmp= new ArrayList<Employee>();
List<Employee> twoEmp= new ArrayList<Employee>();
oneEmp= [Employee [eid=1001, eName=Sam Smith, eAddress=Bangluru, eSalary=10000000], Employee [eid=0, eName=, eAddress=, eSalary=null], Employee [eid=1003, eName=Amt Lime, eAddress=G Bhagyoday, eSalary=200000], Employee [eid=1004, eName=Ash Wake, eAddress=BMC, eSalary=200000], Employee [eid=1005, eName=Will Smith, eAddress= Delhi, eSalary=200000], Employee [eid=1006, eName=Shya Ymwar, eAddress=Madras, eSalary=50000], Employee [eid=1007, eName=Nag Gam, eAddress=Pune, eSalary=10000000], Employee [eid=1008, eName=Arti, eAddress=Delhi, eSalary=10000000]]
twoEmp= [Employee [eid=0, eName=null, eAddress=null, eSalary=100000], Employee [eid=0, eName=null, eAddress=null, eSalary=50000], Employee [eid=0, eName=null, eAddress=null, eSalary=200000]]
I am using code like this:-
for(Employee two : twoEmp){
for (Iterator<Employee> iterator = oneEmp.iterator(); iterator.hasNext(); ) {
Employee e = iterator.next();
if (e.geteSalary() != null && two.geteSalary() != null && e.geteSalary().compareTo(two.geteSalary()) == 0) {
finalEmpList.add(e);
}
}
}
But this still required 2 for loop
I am using JAVA 1.6
My Employee class has attributes:
//Employee class
int eid;
BigInteger eSalary;
String eName, eAddress;
Now I want to get all the objects in List who's Salary = 10000000
result should be :
[Employee [eid=1001, eName=Sam Smith, eAddress=Bangluru, eSalary=10000000], Employee [eid=1007, eName=Nag Gam, eAddress=Pune, eSalary=10000000], Employee [eid=1008, eName=Arti, eAddress=Delhi, eSalary=10000000],.........................]
I would like to achieve this without using any loop or minimum loop required because data will be large
Yes, it is possible to avoid the loop using streams.
First, consider using a generic collection:
List<Employee> employees = new ArrayList<>():
//add all employees to the list
Now you can use streams to filter your list
List<Employee> filtered = employees.stream()
.filter(emp -> emp.getSalary() == 10000000)
.collect(Collectors.toList());
Edit: Probably Stream library is still using some kind of loop internally but while its implementation is hidden from me I do not worry.
A List is a sequential container, to do any kind of filtering on a list, your only option is to iterate over it.
For the query you mentioned,you can use the Map data structure with a BigInteger type for the key (representing the salary) and a List<Employee> for the mapped value type. This will enable you to look for all the employees that earn a certain salary in constant time without having to iterate over the whole list.
Unfortunately though, this solution can't help you do any other queries like "how many employees earn more than 60000", to preform all types of queries on a large data set you should use a database.
PS: You don't need to use the BigInteger type for the salary, unless you think someone earns more than 2,147,483,647
Something like this should do the trick; iterate over the List, and remove the items which you don't want, leaving only the ones which you do want.
List myList = new ArrayList(); //... add items
[...]
for (Iterator<Employee> iterator = myList.iterator(); iterator.hasNext(); ) {
Employee e = iterator.next();
if (e.getSalary() != 10000000) {
iterator.remove();
}
}
//your list now contains only employees whose salary = 10000000
Edit: And no, you cannot do this without a loop. In order to do this kind of thing, you have to iterate over your Collection using a loop. Even if you use a library or the Java Streams API to do this, it will still use a loop of some sort under the hood. However, this will be quite efficient, even with as large dataset. (How large ? Why do you want to avoid using a loop ?)

C++ Data Parsing query

Hello guys i hope you could enlighten me with this issue i am facing!
Initial Output:
The sample text file is below and the format is as follow (Item Desc:Price:Quantity:Date).
STRAW:10:10:11NOV1991
BARLEY:5.10:5:19OCT1923
CHOCOLATE:50:50:11NOV1991
I am required to print out a daily summary report of the total amt of sales on that day. Based on the sample above, the result will be on 11 NOV 1991 the total amt of sales(2 items) will be 60 while on 19OCT1923 the total amt of sales(1 item) will be 5.
Desired Output:
11Nov1991 Total amt of sales:60
19Oct1923 Total amt of sales:5
My question is, how do i generate the code to show only one unique date with the total amount of sales? I have created a loop to check if that a certain year exist for testing purposes but it isn't working. I want it to be able to iterate through a file and check if a certain year exist and if it exist already, the next vector element that has the same year won't be written to a file but instead only the item price will be added. Below is the code i am trying to implement.
ifstream readReport("DailyReport.txt");
ofstream writeDailyReport("EditedDailyReport.txt");
string temp1 = "";
//Read from empty report.txt
while(getline(readReport,temp1))
{
for(int i=0; i < itemDescVec.size(); i++)
{
stringstream streamYearSS;
streamYearSS << itemDescVec[i].dateYear;
string stringYear = streamYearSS.str();
size_t found1 = temp1.find(stringYear);
//If can find year
if (found1 != string::npos )
{
cout << "Can find" << endl;
}
//If cannot find year
else if (found1 == string::npos )
{
cout << "Cannot find" << endl;
writeDailyReport << itemDescVec[i].itemDescription << ":" << itemDescVec[i].unitPrice << ":"
<< itemDescVec[i].quantity << ":" << itemDescVec[i].dateDay << "/" << itemDescVec[i].dateMonth << "/" << itemDescVec[i].dateYear
<< endl;
}
}
}
readReport.close();
writeDailyReport.close();
remove("DailyReport.txt");
rename("EditedDailyReport.txt", "DailyReport.txt");
It would be better to use your own object to store the data - then you can write, for example, salesRecord.quantity (or salesRecord.getQuantity() if you have written a getter) instead of salesrecord.at(2). This is a lot more readable and is better practice than remembering that 2 = quantity (etc.).
Now as to your actual question... I would say something like this:
Iterate over the list. For each item:
Check if the date of the item has already been added to your new list.
If the date does not already exist, simply add the item in its entireity.
If it does already exist, edit the existing item such that previousQuantity += quantityToAdd
On the sorted list
BARLEY:5.10:5:19OCT1923
STRAW:10:10:11NOV1991
CHOCOLATE:50:50:11NOV1991
this would work as follows:
Try to add barley data - nothing for that date so far, so add it:
19OCT1923 - 5 units
Try to add strawberry data - nothing for that date so far, so add it:
19OCT1923 - 5 units
11NOV1991 - 10 units
Try to add chocolate data - 11NOV1991 already exists, so add 50 to the 10 that's already there:
19OCT1923 - 5 units
11NOV1991 - 60 units