I have a table where the entries are something like this
Row - Column1 - Column2 - Column3 Column4
1 0X0A 1 2 A
2 0X0B 2 2 B
3 0x0C 3 2 C
Now i want to use map in such that i can use column 1 or Column 2 as the key to get the row.
What kind of map i should use to achieve this?
(Note- Table is just for explanation and not the exact requirement)
I thought of using multimap, but that is not going to solve the prob
try multi-index containers from boost.
Define a class similar to pair with a custom comparator that indicates equality if either the first member or the second member match, but not necessarily both. You could then use that class as your key type. You would probably need a particular value for each member that will never be used in your data, to use as default values in your constructor, to avoid keys where only the first member has been initialized occasionally matching on the second member due to leftover data.
You could use one map to map from column 1 to the row and another to map from column 2 to the row. Repeat for as many columns as needed
Related
So if we want a collection of unique items we can use a 'set'.
If we already have a collection of items that we want to dedupe, we could pass them to the set function, or alternatively we could use the distinct or dedupe functions.
What are the situations for using each of these (pros/cons)?
Thanks.
The differences are:
set will create a new set collection eagerly.
distinct will create a lazy sequence with duplicates from the input collection removed. It has an advantage over set if you process big collections and lazyness might save you from eagerly evaluating the input collection (e.g. with take)
dedupe removes consecutive duplicates from the input collection so it has a different semantics than set and distinct. For example it will return (1 2 3 1 2 3) when applied to (1 1 1 2 3 3 1 1 2 2 2 3 3)
Set and lazy seq have different APIs available (e.g. disj, get vs nth) and performance characteristics (e.g. O(log32 n) look up for set and O(n) for lazy seq) and they should be chosen depending on how you would like to use their results.
Additionally distinct and dedupe return a transducer when called without argument.
Consider the fictional data to illustrate my problem, which contains in reality thousands of rows.
Figure 1
Each individual is characterized by values attached to A,B,C,D,E. In figure1, I show 3 individuals for which some characteristics are missing. Do you have any idea how can I get the following completed table (figure 2)?
Figure 2
With the ID in figure 1 I could have used the carryforward command to filling in the values. But since each individual has a different number of rows I don't know how to create the ID.
Edit: All individual share the characteristic "A".
Edit: the existing order of observations is informative.
To detect the change of id, the idea is to compare if the precedent value of char is >= in each rows.
This works only if your data are ordered, but it seems mandatory in your data.
gen id= 1 if (char[_n-1] >= char[_n]) | _n ==1
replace id = sum(id) if id==1
replace id = id[_n-1] if missing(id)
fillin id char
drop _fillin
If an individual as only the characteristics A and C and another individual as only the characteristics D and E, this won't work, but it seems impossible to detect with your data.
I have a question about MDX tuples, I would like to gain some insight on something that seems confusing to me.
Most of the literature I have read talks about tuples being a set of co-ordinates essentially pointing to a cell which contains a measure value. From what I understand a tuple is defined as containing only one distinct member from each dimension. Typically when writing queries we don't specify every member for every dimension we let SSAS engine use the default members and aggregate the measure data accordingly.
Straight out of the adventure works sample OLAP database (cube) "adventure works"
A super simple query that I understand represents a tuple:
SELECT
([Date].[Calendar Quarter of Year].&[CY Q3],[Measures].[Sales Amount]) --Tuple
ON COLUMNS
FROM [Adventure Works]
SS Management studio returns this result
No problem here the tuple specified by the &[CY Q3] member point to the cell containing the displayed measure amount. Clearly a tuple has been returned.
Typically though I use this sort of thing more often:
select
non empty ([Date].[Calendar].[Calendar Quarter],[Measures].[Sales Amount]) --Tuple??
ON COLUMNS
FROM [Adventure Works]
Which returns all the quarter totals across all years for said measure (not a great example but it's just an example):
I see this result as a set because more than one distinct member has been returned from the same dimension (date). In fact, by default all members are being returned if so how can it be a tuple?
So my question is this. The parenthesis around the "tuple" in the query above, indicate to me that I'm selecting a tuple, the query engine processes and a result is returned that to me looks like a set, not because more than one cell value is returned but because more than one member from the date dimension has been used.
The query indicates that a tuple is being selected, and the query engine seems to accept it as one however the result set, includes multiple members from the same dimension and corresponding cell values indicating to me that more than one tuple will be returned --> set.
Also, The query engine throws no error when I treat it as a set and use set functions on it:
select
nonempty({([Date].[Calendar].[Calendar Quarter],[Measures].[Sales Amount])}) --Set
ON COLUMNS
FROM [Adventure Works]
My question is this, Assuming I am correct and that the results do in fact represent a set (a set of tuples denoted by each distinct member instance), why does the query engine allow you to specify parenthesis indicating selection of a tuple to return something that is not a tuple?
This makes more sense to me :
SELECT
nonemptycrossjoin(
{[Date].[Calendar].[Calendar Quarter]}, --Set 1
{[Measures].[Sales Amount]} --Set 2
)
ON COLUMNS
FROM [Adventure Works]
At least this code reflects the result set that's returned Thoughts?
Or is it all just Analysis Services semantics?
Thanks
Unfortunately, MDX is an ambiguous language. From what I've understood, the () notation is a tuple or an operator precedence notation. And:
{...} , {...}
is actually a crossjoin:
{...} * {...}
Then when you specify a " level " where a set is expected, the level.members is defaulted. When you specify a tuple where a set is expected a singleton set is created with this only tuple. So:
( [level], [measures].[amount] )
is equivalent to:
crossjoin( [level].members, { [measures].[amount] } )
The only tuple (a member is a tuple) specified is [Measures].[Amount] which by the way does not use the () notation ;-)
You mention that you typically use this syntax - but I write mdx nearly every day and never use this syntax -
SELECT
NON EMPTY ([Date].[Calendar].[Calendar Quarter],[Measures].[Sales Amount]) ON COLUMNS
FROM [Adventure Works];
I'm a little surprised it runs as it seems to be indicating to analysis services to create a tuple from a level and a measure member.
MarcP mentions that mdx is ambiguous, but I'm more in favour of saying it can be written ambiguously - it unfortunately fails quite graciously most of the time and you get no numbers returned or the wrong numbers - I wish it threw more errors and enforced tighter syntax rules as this might make it more understandable.
Your original script I would just use the * operator rather than typing out the full crossjoin function when you need it - in your script it is much more readable to move measures onto the rows and delete that tuple? Like Marc mentions SSAS's implicit use of the MEMBERS function - I find things more readable to include it explicitly when it is being used:
SELECT
NON EMPTY
[Date].[Calendar].[Calendar Quarter].MEMBERS ON 0,
[Measures].[Sales Amount] ON 1
FROM [Adventure Works];
[Date].[Calendar].[Calendar Quarter]
As Marc mentioned this is actually the same as [Date].[Calendar].[Calendar Quarter].MEMBERS in this case MSDN is your friend (and for mdx unlike some other ms languages I find msdn very good) - here is the definition of the MEMBERS function:
https://msdn.microsoft.com/en-us/library/ms144851.aspx
telling you the return type:
Returns the set of members in a dimension, level, or hierarchy.
nonemptycrossjoin
You mentioned nonemptycrossjoin - this is not needed any more - just a simple crossjoin via * is all that is needed with the last 2 or three versions of SSAS.
I have c/c++ code with embedded SQL for Oracle through Pro*C. Is there any mechanism to get the difference of values of array values and DB column values? For example, say, I have an array like this:
int nums[] = {10,20,35,45};
vector<int> vnums (nums, nums + sizeof(nums) / sizeof(int) );
Now, I have a DB table tbl1 with col1 containing values:
20
40
60
I would like to get the unmatched array values that are not present in tbl1.
So, result should be:
10
35
45
I know one way. I may run the following SQL query:
select col1 from tab1
And store the results in an vector say, vec2.
Now, I see the difference of these two vectors vnums and vec2.
Can you suggest a better way?
Unfortunately, there isn't a way to directly do that and your approach is fine.
What you can do is to wrap the call to the DB with a stream and create a procedure which loading your input values in using something like:
CREATE OR REPLACE TYPE numArray_t AS TABLE OF NUMBER
CREATE OR REPLACE FUNCTION extractDif (p_data IN numArray_t) return numArray_t;
and in the function you will perform a MINUS of the passed numArray_t with the values from your table and stream the differences.
I need to keep data of the following form:
(a,b,1),
(c,d,2),
(e,f,3),
(g,h,4),
(i,j,5),
(k,l,6),
(m,a,7)
...
such that the integers within the data (3rd column) are consecutively ordered and are unique. Also there are 2,954,208,208 such rows. I am searching for a data structure which returns the value of the 3rd column given the value of first two columns e.g.
Given: (i,j) it returns 5
And given the value of 3rd column, first two columns can be retrieved. For example,
Given: 5 it returns (a,b)
Is there some data structure which may help me achieve the same.
My approach towards solving this problem was to use hash-maps..but hash-maps do not turn out to be efficient. Is there some other way out.
The values in the first, second and third column are all of 64-bit.
I have 4GB of RAM.