Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I've been thinking about this for quite a few days and I can't really seem to be finding any answers for b).
It goes like this fellas:
Johnny has taken a very important course and wants a lot of his
friends to find out about his succes by posting on Facebook (// yes
stupid i know) Johnny knows N users represented by numbers from 1 to
N. Between them there are m friendships with the form i,j where i and
j are users; n,m != 0. A user cannot be friends with himself and a
friendship tells us that each user is friends with the other one.
Johnny wants to find out which are the most 'connected' people in his
friends list so that his post will be well spread accros Facebook. For
this, Johnny has to find out the biggest sub-set of well-known users.
In this sub-set, each user has at least k friends, which are also
present in the sub-set (k != 0). Input : N, m and k on the same line,
separated by a single space, also a sequence of 2*m natural numbers
(which can be found in the interval [1,N] Output (standard: a) The
number of friends of each user in order from 1 to N b) The members of
the biggest sub-set of users, having the property that each user in
this set has at least k friends (which, again, can be found in that
specific sub-set). If there is no such sub-set for a given k, print
"NO"
For this problem you can't use any specialised libraries, so i'm stuck
with the standards.
Again, this is concerning the mathematical concept of sets, NOT the C++ specialised set, multiset, etc libraries.
a) is pretty easy but like I said, b) is giving me some trouble.
Examples: 1)
Input: 5 5
2 1 2 5 1 3 2 4 5 1 4
Output:
a) 3 2 1 2 2 b) 1 4 5
2) Input:
5 5 3
1 2 5 1 3 2 4 5 1 4
Output:
a) 3 2 1 2 2
b) NO
and 3) Input:
11 18 3
1 8 4 7 7 10 11 10 2 1 2 3 8 9 8 3 9 3 9 2 5 6 5 11 1 4 10 6 7 6 2 8 11 7 11 6
Output:
a) 3 4 3 2 2 4 4 4 3 3 4
b) 2 3 6 7 8 9 10 11
Any help would be appreciated. Also, sorry for the bulky content, it had to be roughly translated. :)
Thx a lot
The problem calls for you to compute the k-core of an N-node graph with m edges. There's a simple algorithm for this: while the lowest degree vertex has degree less than k, delete it. The remaining vertices are the desired subset. Use a bucket queue to keep the nodes sorted by degree for efficient operation.
On second thought, we just need to track (1) the degree of each node (2) which nodes have degree less than k. In untested Python:
import collections
def kcore(edges, k):
neighbors = collections.defaultdict(set)
for u, v in edges:
neighbors[u].add(v)
neighbors[v].add(u)
bad = {u for (u, neigh) in neighbors.items() if len(neigh) < k}
while bad:
u = bad.pop()
for v in neighbors[u]:
neighbors[v].remove(u)
if len(neighbors[v]) < k:
bad.add(v)
del neighbors[u]
return set(neighbors)
Related
Source table Cricket_Score:
Overs
Balls
Runs
1
1
1
1
2
2
1
3
4
1
4
0
1
5
1
1
6
2
2
1
3
2
2
1
2
3
1
2
4
4
2
5
6
2
6
0
3
1
2
3
2
1
3
3
1
3
4
6
3
5
0
3
6
4
I Want to an output like this:
Overs
Total_Runs
1
10
2
25
3
39
Description: - For First Over means First 6 Balls I Want Sum of First 6 Balls that is 10. and For Second 6 Balls I Want Sum of First 6 Balls [Over] + Second 6 Balls That is 25 [10 + 15 = 25]. and For Third 6 Balls I Want Sum of First 6 Balls [Over] + Second 6 Balls + Third ^ Balls That is 39 [10 + 15 + 14 = 39].
Note: - 6 balls means one over.
How to create a mapping in for this scenario in Informatica / which logic should I use?
i will assume your data is EXACTLY like you have shown in your question. If its not like this in source then it will be a major issue. If its a table where data is not sorted, it will be an issue.
Solution -
Create an expression transformation with below ports - in below order. in - input port, v_variable port, out_* output port
in_balls
in_runs
in_overs
v_cumulative_runs= in_runs+ iif(isnull(v_cumulative_run),0,v_cumulative_run)
out_total_runs=v_cumulative_runs
out_overs=in_overs
Use an aggregator -
in_total_runs
in_out_overs -- group by this port
out_total_runs = max(in_total_runs)
Attach in_out_overs and out_total_runs links to target.
I did some research but i have difficulties finding an answer.
I am using python 2.7 and pandas so far but i am still learning.
I have two CSVs, let say it's the alphabet A-Z in one and digits in the second one, 0-100.
I want to merge the two files to have A0 to A100 up through Z.
For information the two files have DNA sequence so i believe they are strings.
I tried to create arrays with numpy and create a matrix but to no available..
here is a preview of the files:
barcode
0 GGAAGAA
1 CCAAGAA
2 GAGAGAA
3 AGGAGAA
4 TCGAGAA
5 CTGAGAA
6 CACAGAA
7 TGCAGAA
8 ACCAGAA
9 GTCAGAA
10 CGTAGAA
11 GCTAGAA
12 GAAGGAA
13 AGAGGAA
14 TCAGGAA
659
barcode
0 CGGAAGAA
1 GCGAAGAA
2 GGCAAGAA
3 GGAGAGAA
4 CCAGAGAA
5 GAGGAGAA
6 ACGGAGAA
7 CTGGAGAA
8 CACGAGAA
9 AGCGAGAA
10 TCCGAGAA
11 GTCGAGAA
12 CGTGAGAA
13 GCTGAGAA
14 CGACAGAA
1995
I am putting here the way i found to do it, there might be a sexier way:
index = pd.MultiIndex.from_product([df8.barcode, df7.barcode], names = ["df8", "df7"])
df = pd.DataFrame(index = index).reset_index()
def concat_BC(x):#concatenate the two sequences into one new column
return str(x["df8"]) + str(x["df7"])
df["BC"] = df.apply(concat_BC, axis=1)
– Stephane Chiron
I have a trouble using L1 command in Stata 14 to create lag variables.
The resulted Lag variable is 100% missing values!
gen d = L1.equity
tnanks in advance
There is hardly enough information given in the question to know for certain, but as #Dimitriy V. Masterov suggested by questioning how your data is tsset, you likely have an issue there.
As a quick example, imagine a panel with two countries, country 1 and country 3, with gdp by country measured over five years:
clear
input float(id year gdp)
1 1 5
1 2 2
1 3 7
1 4 9
1 5 6
3 1 3
3 2 4
3 3 5
3 4 3
3 5 4
end
Now, if you improperly tsset this data, you can easily generate the missing values you describe:
tsset year id
gen lag_gdp = L1.gdp
And notice now how you have 10 missing values generated. In this example, it happens because the panel and time variables are out of order and the (incorrectly specified) time variable has gaps (period 1 and period 3, but no period 2).
Something else I have witnessed is someone trying to tsset by their time variable and their analysis variable, which is also incorrect:
clear
input float(year gdp)
1 5
2 3
3 2
4 4
5 7
end
tsset year gdp
gen d = L1.gdp
I suspect you are having a similar issue.
Without knowing what your data looks like or how it is tsset there is no possible way to diagnose this, but it is very likely an issue with how the data is tsset.
[I copied part of the below example from a separate post and changed it to suit my specific needs]
pos_1 pos_2
2 4
2 5
1 2
3 9
4 2
9 3
The above is read as person_2 is connected to person_4,...,person_4 is connected to person_2, and person_9 is connected to person_3.
I want to create a third categorical [edited] variable, component, that lets me know if the observed link is part of a connected component (subnetwork) within this network. In this case, there are two connected components in the network:
pos_1 pos_2 component
2 4 1
2 5 1
1 2 1
3 9 2
4 2 1
9 3 2
All nodes in component 1 are connected to each other, but not to the nodes in component 2 and vice versa. Is there a way to generate this component variable in Stata? I know there are alternative programs to do this in, but my code would be more seamless if I can integrate it into Stata.
If you reshape the data to long form, you can use group_id (from SSC) to get what you want:
clear
input pos_1 pos_2
2 4
2 5
1 2
3 9
4 2
9 3
end
gen id = _n
reshape long pos_, i(id) j(n)
clonevar comp = id
list, sepby(comp)
group_id comp, match(pos)
reshape wide pos_, i(id) j(n)
egen component = group(comp)
list
I have a Day Strucuture Table, which has following Columns I want to display:
DoW HoD Value
1 1 1
1 2 2
1 3 2
1 4 2
1 5 2
1 6 2
1 7 2
1 8 2
1 9 2
1 10 2
1 11 4
1 12 4
1 13 4
1 14 4
1 15 4
1 16 4
1 17 4
1 18 4
1 19 4
1 20 4
1 21 1
1 22 1
1 23 1
1 24 1
Dow is The Day of Week (Monday etc.), HoD is the Hour of Day and Value is the actual value.
Now I want to Bind this Day Structure Entity Collection directly to a Control so any Changes can be bound TwoWay
Like this Format:
I think the best way to achieve this is to use a Template and/or a converter, but I just dont know how ;)
I already read this article, but Lack of a TwoWay Binding functionality makes it not useful for me :(
I Hope you can help me
Jonny
Again i solved it on my own ;)
For this problem i created a Grid with a fixed amout of rows and columns. Inside this Grid I put a Itemscontrol bound to my List of data. Inside the DataTemplate I placed a Textbox bound to the current value and bound the Grid Row and Columnproperties to the Day of the Week/Hour of Day.
Pro:
The Textbox is TwoWay Databound to a certain Object or Element.
Very Easy to implement if Row and Colum Property is numeric.
Con:
Limited to a fixed amout of Rows/Columns.
Very much Code to write in XAML (Copy and Paste)
Kinda "dirty" Code. Feels not like the best way to do it.
Im still open for other suggestions.