I have a use case where I have a list of values to be fetched from the database and a list of dates for which the values need to be fetched. I want to use akka streams (Flow or Source with GraphDSL) to make a one to many (or many to one) relationship between them so that I fetch each value for each of the dates
For example,
animals = cow, goat, sheep
years=2018, 2019
expected stream output is
cow & 2018
goat & 2018
sheep & 2018
cow & 2019
goat & 2019
sheep & 2019
If you want a product like this, you don't need the Graph DSL.
def animalsAndYears(animals: Source[Animal, NotUsed], years: Source[Year, NotUsed]): Source[(Animal, Year), NotUsed] =
years.flatMapConcat { year =>
animals.map { animal =>
animal -> year
}
}
So:
animalsAndYears(Source(listOfAnimals), Source(listOfYears))
would give you a stream of animal, year tuples. Let's say that you have a function:
def queryDBForAnimalYear(aandy: (Animal, Year)): Future[Seq[Row]] = ???
Then you can get a stream of the rows with:
val parallelism: Int = ??? // How many queries to have in-flight at a time
animalsAndYears(Source(listOfAnimals), Source(listOfYears))
.mapAsync(parallelism) { params => queryDBForAnimalYear(params) }
.mapConcat(identity) // gives you a Source[Row]
Related
My application manages bookings of a user. These bookings are composed by a start_date and end_date, and their current partition in dynamodb is the following:
PK SK DATA
USER#1#BOOKINGS BOOKING#1 {s: '20190601', e: '20190801'}
[GOAL] I would query all reservations which overlap a search time interval as the following:
I tried to find a solution for this issue but I found only a way to query all items inside a search time interval, which solves only this problem:
I decided to make an implementation of it to try to make some change to solve my problem but I didn't found a solution, following you can find my implementation of "query inside interval" (this is not a dynamodb implementation, but I will replace isBetween function with BETWEEN operand):
import { zip } from 'lodash';
const bookings = [
{ s: '20190601', e: '20190801', i: '' },
{ s: '20180702', e: '20190102', i: '' }
];
const search_start = '20190602'.split('');
const search_end = '20190630'.split('');
// s:20190601 e:20190801 -> i:2200119900680011
for (const b of bookings) {
b['i'] = zip(b.s.split(''), b.e.split(''))
.reduce((p, c) => p + c.join(''), '');
}
// (start_search: 20190502, end_search: 20190905) => 22001199005
const start_clause: string[] = [];
for (let i = 0; i < search_start.length; i += 1) {
if (search_start[i] === search_end[i]) {
start_clause.push(search_start[i] + search_end[i]);
} else {
start_clause.push(search_start[i]);
break;
}
}
const s_index = start_clause.join('');
// (end_search: 20190905, start_search: 20190502) => 22001199009
const end_clause: string[] = [];
for (let i = 0; i < search_end.length; i += 1) {
if (search_end[i] === search_start[i]) {
end_clause.push(search_end[i] + search_start[i]);
} else {
end_clause.push(search_end[i]);
break;
}
}
const e_index = (parseInt(end_clause.join('')) + 1).toString();
const isBetween = (s: string, e: string, v: string) => {
const sorted = [s,e,v].sort();
console.info(`sorted: ${sorted}`)
return sorted[1] === v;
}
const filtered_bookings = bookings
.filter(b => isBetween(s_index, e_index, b.i));
console.info(`filtered_bookings: ${JSON.stringify(filtered_bookings)}`)
There’s not going to be a beautiful and simple yet generic answer.
Probably the best approach is to pre-define your time period size (days, hours, minutes, seconds, whatever) and use the value of that as the PK so for each day (or hour or whatever) you have in that item collection a list of the items touching that day with the sort key of the start time (so you can do the inequality there) and you can use a filter on the end time attribute.
If your chosen time period is days and you need to query across a week then you’ll issue seven queries. So pick a time unit that’s around the same size as your selected time periods.
Remember you need to put all items touching that day (or whatever) into the day collection. If an item spans a week it needs to be inserted 7 times.
Disclaimer: This is a very use-case-specific and non-general approach I took when trying to solve the same problem; it picks up on #hunterhacker 's approach.
Observations from my use case:
The data I'm dealing with is financial/stock data, which spans back roughly 50 years in the past up to 150 years into the future.
I have many thousands of items per year, and I would like to avoid pulling in all 200 years of information
The vast majority of the items I want to query spans a time that fits within a year (ie. most items don't go from 30-Dec-2001 to 02-Jan-2002, but rather from 05-Mar-2005 to 10-Mar-2005)
Based on the above, I decided to add an LSI and save the relevant year for every item whose start-to-end time is within a single year. The items that straddle a year (or more) I set that LSI with 0.
The querying looks like:
if query_straddles_year:
# This doesn't happen often in my use case
result = query_all_and_filter_after()
else:
# Most cases end up here (looking for a single day, for instance)
year_constrained_result = query_using_lsi_for_that_year()
result_on_straddling_bins = query_using_lsi_marked_with_0() # <-- this is to get any of the indexes that do straddle a year
filter_and_combine(year_constrained_result, result_on_straddling_bins)
In F#, assume I have a person record as:
type person =
{ LastName: string option
BirthDate: System.DateTime option }
Now, I want to create a list of 100 persons (this fails. Both name and The System.DateTime(...) is incorrect):
let people = [for a in 1 .. 100
do yield {
LastName= Some "LastName"+a
BirthDate = System.DateTime(2012,11,27)
}]
How is this done?
TIA
There are two separate issues with the code, but your general approach is good!
First, Some "LastName"+a is interpereted as (Some "LastName")+a, which is not the right parenthesization. Also a is an int which cannot be automatically turned into a string, so you need to explicitly convert it. The correct version is Some("LastName" + string a).
Second, System.DateTime(2012,11,27) is DateTime, but you need an option. You can fix this just by adding Some and the right parentheses, i.e. Some(System.DateTime(2012,11,27)).
As a bonus, you can reduce do yield to -> (this is just a syntactic sugar to make this kind of thing shorter). I would write:
open System
let people =
[ for a in 1 .. 100 ->
{ LastName= Some ("LastName"+string a)
BirthDate = Some(DateTime(2012,11,27)) } ]
class Year
{
public int YearNumber;
public List<Month> Months = new List<Month>();
}
class Month
{
public int MonthNumber;
public List<Day> Days = new List<Day>();
}
class Day
{
public int DayNumber;
public string Event;
}
So I have a list of Years(list<year> years). How do I get the list (another list) which have the result that has duplicates event on the same day? I mean events can be happen on multiple dates, does not matter, what matters is, to find out if this any of date happens the same event from different year. . Lastly, (filter) only if its occurs more than 3 times. Example, 5 July 2014, 5 July 2017 and 5 July 2019 is 'Abc Festival', which occurs more than 3 times. So u get the date, the event, and the number of counts.
Using just the classes you show we can only group dates, where a "date" is a day in a month:
var query = from y in years
from m in y.Months
from d in m.Days
select new { m.MonthNumber, d.DayNumber }
into date
group date by date
into dateGroup
where dateGroup.Count() > 2
select dateGroup;
select dateGroup;
As you see, the core solution is to build new { m.MonthNumber, d.DayNumber } objects and group them.
This question already has answers here:
How to filter a Java Collection (based on predicate)?
(29 answers)
Closed 6 years ago.
I want to search in a List and my List is look like
List<Employee> oneEmp= new ArrayList<Employee>();
List<Employee> twoEmp= new ArrayList<Employee>();
oneEmp= [Employee [eid=1001, eName=Sam Smith, eAddress=Bangluru, eSalary=10000000], Employee [eid=0, eName=, eAddress=, eSalary=null], Employee [eid=1003, eName=Amt Lime, eAddress=G Bhagyoday, eSalary=200000], Employee [eid=1004, eName=Ash Wake, eAddress=BMC, eSalary=200000], Employee [eid=1005, eName=Will Smith, eAddress= Delhi, eSalary=200000], Employee [eid=1006, eName=Shya Ymwar, eAddress=Madras, eSalary=50000], Employee [eid=1007, eName=Nag Gam, eAddress=Pune, eSalary=10000000], Employee [eid=1008, eName=Arti, eAddress=Delhi, eSalary=10000000]]
twoEmp= [Employee [eid=0, eName=null, eAddress=null, eSalary=100000], Employee [eid=0, eName=null, eAddress=null, eSalary=50000], Employee [eid=0, eName=null, eAddress=null, eSalary=200000]]
I am using code like this:-
for(Employee two : twoEmp){
for (Iterator<Employee> iterator = oneEmp.iterator(); iterator.hasNext(); ) {
Employee e = iterator.next();
if (e.geteSalary() != null && two.geteSalary() != null && e.geteSalary().compareTo(two.geteSalary()) == 0) {
finalEmpList.add(e);
}
}
}
But this still required 2 for loop
I am using JAVA 1.6
My Employee class has attributes:
//Employee class
int eid;
BigInteger eSalary;
String eName, eAddress;
Now I want to get all the objects in List who's Salary = 10000000
result should be :
[Employee [eid=1001, eName=Sam Smith, eAddress=Bangluru, eSalary=10000000], Employee [eid=1007, eName=Nag Gam, eAddress=Pune, eSalary=10000000], Employee [eid=1008, eName=Arti, eAddress=Delhi, eSalary=10000000],.........................]
I would like to achieve this without using any loop or minimum loop required because data will be large
Yes, it is possible to avoid the loop using streams.
First, consider using a generic collection:
List<Employee> employees = new ArrayList<>():
//add all employees to the list
Now you can use streams to filter your list
List<Employee> filtered = employees.stream()
.filter(emp -> emp.getSalary() == 10000000)
.collect(Collectors.toList());
Edit: Probably Stream library is still using some kind of loop internally but while its implementation is hidden from me I do not worry.
A List is a sequential container, to do any kind of filtering on a list, your only option is to iterate over it.
For the query you mentioned,you can use the Map data structure with a BigInteger type for the key (representing the salary) and a List<Employee> for the mapped value type. This will enable you to look for all the employees that earn a certain salary in constant time without having to iterate over the whole list.
Unfortunately though, this solution can't help you do any other queries like "how many employees earn more than 60000", to preform all types of queries on a large data set you should use a database.
PS: You don't need to use the BigInteger type for the salary, unless you think someone earns more than 2,147,483,647
Something like this should do the trick; iterate over the List, and remove the items which you don't want, leaving only the ones which you do want.
List myList = new ArrayList(); //... add items
[...]
for (Iterator<Employee> iterator = myList.iterator(); iterator.hasNext(); ) {
Employee e = iterator.next();
if (e.getSalary() != 10000000) {
iterator.remove();
}
}
//your list now contains only employees whose salary = 10000000
Edit: And no, you cannot do this without a loop. In order to do this kind of thing, you have to iterate over your Collection using a loop. Even if you use a library or the Java Streams API to do this, it will still use a loop of some sort under the hood. However, this will be quite efficient, even with as large dataset. (How large ? Why do you want to avoid using a loop ?)
For example i have erlang record:
-record(state, {clients
}).
Can i make from clients field list?
That I could keep in client filed as in normal list? And how can i add some values in this list?
Thank you.
Maybe you mean something like:
-module(reclist).
-export([empty_state/0, some_state/0,
add_client/1, del_client/1,
get_clients/1]).
-record(state,
{
clients = [] ::[pos_integer()],
dbname ::char()
}).
empty_state() ->
#state{}.
some_state() ->
#state{
clients = [1,2,3],
dbname = "QA"}.
del_client(Client) ->
S = some_state(),
C = S#state.clients,
S#state{clients = lists:delete(Client, C)}.
add_client(Client) ->
S = some_state(),
C = S#state.clients,
S#state{clients = [Client|C]}.
get_clients(#state{clients = C, dbname = _D}) ->
C.
Test:
1> reclist:empty_state().
{state,[],undefined}
2> reclist:some_state().
{state,[1,2,3],"QA"}
3> reclist:add_client(4).
{state,[4,1,2,3],"QA"}
4> reclist:del_client(2).
{state,[1,3],"QA"}
::[pos_integer()] means that the type of the field is a list of positive integer values, starting from 1; it's the hint for the analysis tool dialyzer, when it performs type checking.
Erlang also allows you use pattern matching on records:
5> reclist:get_clients(reclist:some_state()).
[1,2,3]
Further reading:
Records
Types and Function Specifications
dialyzer(1)
#JUST MY correct OPINION's answer made me remember that I love how Haskell goes about getting the values of the fields in the data type.
Here's a definition of a data type, stolen from Learn You a Haskell for Great Good!, which leverages record syntax:
data Car = Car {company :: String
,model :: String
,year :: Int
} deriving (Show)
It creates functions company, model and year, that lookup fields in the data type. We first make a new car:
ghci> Car "Toyota" "Supra" 2005
Car {company = "Toyota", model = "Supra", year = 2005}
Or, using record syntax (the order of fields doesn't matter):
ghci> Car {model = "Supra", year = 2005, company = "Toyota"}
Car {company = "Toyota", model = "Supra", year = 2005}
ghci> let supra = Car {model = "Supra", year = 2005, company = "Toyota"}
ghci> year supra
2005
We can even use pattern matching:
ghci> let (Car {company = c, model = m, year = y}) = supra
ghci> "This " ++ c ++ " " ++ m ++ " was made in " ++ show y
"This Toyota Supra was made in 2005"
I remember there were attempts to implement something similar to Haskell's record syntax in Erlang, but not sure if they were successful.
Some posts, concerning these attempts:
In Response to "What Sucks About Erlang"
Geeking out with Lisp Flavoured Erlang. However I would ignore parameterized modules here.
It seems that LFE uses macros, which are similar to what provides Scheme (Racket, for instance), when you want to create a new value of some structure:
> (define-struct car (company model year))
> (define supra (make-car "Toyota" "Supra" 2005))
> (car-model supra)
"Supra"
I hope we'll have something close to Haskell record syntax in the future, that would be really practically useful and handy.
Yasir's answer is the correct one, but I'm going to show you WHY it works the way it works so you can understand records a bit better.
Records in Erlang are a hack (and a pretty ugly one). Using the record definition from Yasir's answer...
-record(state,
{
clients = [] ::[pos_integer()],
dbname ::char()
}).
...when you instantiate this with #state{} (as Yasir did in empty_state/0 function), what you really get back is this:
{state, [], undefined}
That is to say your "record" is just a tuple tagged with the name of the record (state in this case) followed by the record's contents. Inside BEAM itself there is no record. It's just another tuple with Erlang data types contained within it. This is the key to understanding how things work (and the limitations of records to boot).
Now when Yasir did this...
add_client(Client) ->
S = some_state(),
C = S#state.clients,
S#state{clients = [Client|C]}.
...the S#state.clients bit translates into code internally that looks like element(2,S). You're using, in other words, standard tuple manipulation functions. S#state.clients is just a symbolic way of saying the same thing, but in a way that lets you know what element 2 actually is. It's syntactic saccharine that's an improvement over keeping track of individual fields in your tuples in an error-prone way.
Now for that last S#state{clients = [Client|C]} bit, I'm not absolutely positive as to what code is generated behind the scenes, but it is likely just straightforward stuff that does the equivalent of {state, [Client|C], element(3,S)}. It:
tags a new tuple with the name of the record (provided as #state),
copies the elements from S (dictated by the S# portion),
except for the clients piece overridden by {clients = [Client|C]}.
All of this magic is done via a preprocessing hack behind the scenes.
Understanding how records work behind the scenes is beneficial both for understanding code written using records as well as for understanding how to use them yourself (not to mention understanding why things that seem to "make sense" don't work with records -- because they don't actually exist down in the abstract machine...yet).
If you are only adding or removing single items from the clients list in the state you could cut down on typing with a macro.
-record(state, {clients = [] }).
-define(AddClientToState(Client,State),
State#state{clients = lists:append([Client], State#state.clients) } ).
-define(RemoveClientFromState(Client,State),
State#state{clients = lists:delete(Client, State#state.clients) } ).
Here is a test escript that demonstrates:
#!/usr/bin/env escript
-record(state, {clients = [] }).
-define(AddClientToState(Client,State),
State#state{clients = lists:append([Client], State#state.clients)} ).
-define(RemoveClientFromState(Client,State),
State#state{clients = lists:delete(Client, State#state.clients)} ).
main(_) ->
%Start with a state with a empty list of clients.
State0 = #state{},
io:format("Empty State: ~p~n",[State0]),
%Add foo to the list
State1 = ?AddClientToState(foo,State0),
io:format("State after adding foo: ~p~n",[State1]),
%Add bar to the list.
State2 = ?AddClientToState(bar,State1),
io:format("State after adding bar: ~p~n",[State2]),
%Add baz to the list.
State3 = ?AddClientToState(baz,State2),
io:format("State after adding baz: ~p~n",[State3]),
%Remove bar from the list.
State4 = ?RemoveClientFromState(bar,State3),
io:format("State after removing bar: ~p~n",[State4]).
Result:
Empty State: {state,[]}
State after adding foo: {state,[foo]}
State after adding bar: {state,[bar,foo]}
State after adding baz: {state,[baz,bar,foo]}
State after removing bar: {state,[baz,foo]}