Pre-increment assginement as Row Number to List - list

i trying to assign a row number and a Set-number for List, but Set Number containing wrong number of rows in one set.
var objx = new List<x>();
var i = 0;
var r = 1;
objY.ForEach(x => objx .Add(new x
{
RowNumber = ++i,
DatabaseID= x.QuestionID,
SetID= i == 5 ? r++ : i % 5 == 0 ? r += 1 : r
}));
for Above code like objY Contains 23 rows, and i want to break 23 rows in 5-5 set.
so above code will give the sequence like[Consider only RowNumber]
[1 2 3 4 5][6 7 8 9][ 10 11 12 13 14 ].......
its a valid as by the logic
and if i change the logic for Setid as
SetID= i % 5 == 0 ? r += 1 : r
Result Will come Like
[1 2 3 4 ][5 6 7 8 9][10 11 12 13 14].
Again correct output of code
but expected for set of 5.
[1 2 3 4 5][ 6 7 8 9 10].........
What i missing.............
i should have taken my Maths class very Serious.

I think you want something like this:
var objX = objY.Select((x, i) => new { ObjX = x, Index = i })
.GroupBy(x => x.Index / 5)
.Select((g, i) =>
g.Select(x => new objx
{
RowNumber = x.Index + 1
DatabaseID = x.ObjX.QuestionID,
SetID = i + 1
}).ToList())
.ToList();
Note that i'm grouping by x.Index / 5 to ensure that every group has 5 items.
Here's a demo.
Update
it will be very helpful,if you can explain your logic
Where should i start? I'm using Linq methods to select and group the original list to create a new List<List<ObjX>> where every inner list has maximum 5 elements(less in the last if the total-count is not dividable by 5).
Enumerable.Select enables to project something from the input sequence to create something new. This method is comparable to a variable in a loop. In this case i project an anonymous type with the original object and the index of it in the list(Select has an overload that incorporates the index). I create this anonymous type to simply the query and because i need the index later in the GroupBy``.
Enumerable.GroupBy enables to group the elements in a sequence by a specified key. This key can be anything which is derivable from the element. Here i'm using the index two build groups of a maximum size of 5:
.GroupBy(x => x.Index / 5)
That works because integer division in C# (or C) results always in an int, where the remainder is truncated(unlike VB.NET btw), so 3/4 results in 0. You can use this fact to build groups of the specified size.
Then i use Select on the groups to create the inner lists, again by using the index-overload to be able to set the SetId of the group:
.Select((g, i) =>
g.Select(x => new objx
{
RowNumber = x.Index + 1
DatabaseID = x.ObjX.QuestionID,
SetID = i + 1
}).ToList())
The last step is using ToList on the IEnumerable<List<ObjX>> to create the final List<List<ObX>>. That also "materializes" the query. Have a look at deferred execution and especially Jon Skeets blog to learn more.

Related

writing to columns in same row in csv file (python)

Im trying to write values to a csv file such that for every two iterations, the result is in the same row and then the next the values print to a new row. Any help would be greatly appreciated. Thank you!
This is what I have so far:
import csv
import math
savePath = '/home/dehaoliu/opencv_test/Engineering_drawings_outputs/'
with open(str(savePath) +'outputsTest.csv','w') as f1:
writer=csv.writer(f1, delimiter='\t',lineterminator='\n',)
temp = []
for k in range(0,2):
temp = []
for i in range(0,4):
a = 2 +i
b = 3+ i
list = [a,b]
temp.append(list)
writer.writerow(temp)
The result I am getting now is
[2 3][3 4][4 5][5 6]
[2 3][3 4][4 5][5 6]
But I would like to get this (without the brackets) where each number in a row is in a separate column:
2 3 3 4
4 5 5 6
Try the following:
import csv
import math
savePath = '/home/dehaoliu/opencv_test/Engineering_drawings_outputs/'
with open(str(savePath) +'outputsTest.csv','w') as f1:
writer=csv.writer(f1, delimiter='\t',lineterminator='\n',)
temp = [2, 3]
for i in range(2):
temp = [x + i for x in temp]
additional = [y+1 for y in temp]
writer.writerow(temp + additional)
temp = additional[:]
This should return:
# 2 3 3 4
# 4 5 5 6
You start with a temporary containing the numbers 2 and 3. Then, you loop from 0 to 2 (excluding). At every iteration, you increment the values of the temporary by the current index and subsequently create an additional list with these new values of your temporary list. Once that's done, you join the two lists together and write the result out to your file. At this point, you can set your temporary list to be equal to the values of the additional list, before moving on to the next iteration.
I hope this helps.
The way you present it you can do it with a simple seed and increment:
import csv
import os
save_path = "/home/dehaoliu/opencv_test/Engineering_drawings_outputs/"
with open(os.path.join(save_path, "outputsTest.csv"), "w") as f:
writer = csv.writer(f, delimiter="\t", lineterminator="\n")
temp = [2, 3, 3, 4] # init seed
increment = len(temp) // 2 # how many pairs we have, used to increase our seed each row
for _ in range(2): # how many rows do you need, any positive integer will do
writer.writerow(temp) # write the current value
temp = [x + increment for x in temp] # add 'increment' to the elements
Resulting in:
2 3 3 4
4 5 5 6
But if your seed is: temp = [2, 3, 3, 4, 4, 5] and you decide to generate 4 rows, it will still adapt:
2 3 3 4 4 5
5 6 6 7 7 8
8 9 9 10 10 11
11 12 12 13 13 14

Random.randint on lists in Python

I want to create a list and fill it with 15 zeros, then I want to change the 0 to 1 in 5 random spots of the list, so it has 10 zeros and 5 ones, here is what I tried
import random, time
dasos = []
for i in range(1, 16):
dasos.append(0)
for k in range(1, 6):
dasos[random.randint(0, 15)] = 1
Sometimes I would get anywhere from 0 to 5 ones but I want exactly 5 ones,
if I add:
print(dasos)
...to see my list I get:
IndexError: list assignment index out of range
I think the best solution would be to use random.sample:
my_lst = [0 for _ in range(15)]
for i in random.sample(range(15), 5):
my_lst[i] = 1
You could also consider using random.shuffle and use the first 5 entries:
my_lst = [0 for _ in range(15)]
candidates = list(range(15))
random.shuffle(candidates)
for i in candidates[0:5]:
my_lst[i] = 1
TL;DR: Read the the Python random documentation, this can be done in multiple ways.

Subtract value in one data frame from the next value in a second data frame

I have a data frame that is composed of several datasets (about 146 and counting). two of my columns are labeled "start_time" and "stop_time," which represent the start and stop of a response (i.e., the total duration of the response).
I need to get the "inter-response time" or the start_time subtracted from the next corresponding value in start_time. Basically if:
start_time = [1,4,7]
stop_time = [2,5,8]
I need:
stop_time[0] - start_time[1]
stop_time[2] - start_time[3]
in order to get:
iri = [2,2]
My code looks like this:
iri_t = []
def grps():
for grp in lset2_name_grps.groups:
beg_eng_t = pd.DataFrame([lset2_name_grps.stop_time, lset2_name_grps.start_time], columns=['end_t','beg_t'])
end_t = [i for i in lset2_name_grps.stop_time]
beg_t = [i for i in lset2_name_grps.start_time]
beg_t = np.insert(beg_t, len(beg_t),0)
end_t = np.insert(end_t, 0,0)
iri_t.append(np.subtract(end_t, beg_t))
# for i,j in zip(end_t, beg_t):
# iri_t.append(np.subtract(i,j))
# lset2_name_grps['iri'] = iri_t
grps()
Essentially, it doesn't do anything close to what I'm trying to accomplish and the only out I get is either "Not Implemented" or an error.
How about something like this:
import pandas as pd
starts = pd.Series([1, 4, 7])
stops = pd.Series([2, 5, 8])
iri_t = [0]
for i in range(1, len(starts)):
iri_t.append(starts[i] - ends[i-1])
times_df = pd.concat([starts, stops, pd.Series(iri_t)], axis=1)
This creates the following data_frame:
0 1 2
0 1 2 0
1 4 5 2
2 7 8 2
I think what your asking (correct me if I'm wrong) is best accomplished by putting the two columns in a single dataframe, using shift to offset one of your columns, then doing an ordinary subtraction.
df = pd.DataFrame({'start_time':[1,4,7], 'stop_time':[2,5,8]})
df.stop_time - df.start_time.shift()
Out[5]:
0 NaN
1 4
2 4
dtype: float64

Stata - assign different variables depending on the value within a variable

Sorry that title is confusing. Hopefully it's clear below.
I'm using Stata and I'd like to assign the value 1 to a variable that depends on the value within a different variable. I have 20 order variables and also 20 corresponding variables. For example if order1 = 3, I'd like to assign variable3 = 1. Below is a snippet of what the final dataset would look like if I had only 3 of each variable.
Right now I'm doing this with two loops but I have to another loop around this that goes through this 9 more times plus I'd doing this for a couple hundred data files. I'd like to make it more efficient.
forvalues i = 1/20 {
forvalues j = 1/20 {
replace variable`j' = 1 if order`i'==`j'
}
}
Is it possible to use the value of order'i' to assign the variable[order`i'VALUE] directly? Then I can get rid of the j loop above. Something like this.
forvalues i = 1/20 {
replace variable[`order`i'value] = 1
}
Thanks for your help!
***** CLARIFICATION ADDED Feb 2nd.**
I simplified my problem and the dataset too much bc the solutions suggested work for what I presented but, are not getting at what I'm really attempting to do. Thank you three for your solutions though. I was not clear enough in my post.
To clarify, my data doesn't have a one to one correspondence of each order# assigning variable# a 1 if it's not missing. For example, the first observation for order1=3, variable1 isn't supposed to get a 1, variable3 should get a 1. What I didn't include in my original post is that I'm actually checking for other conditions to set it equal to 1.
For more background, I'm counting up births of women by birth order(1st child, 2nd child, etc) that occurred at different ages of mothers. So in the data, each row is a woman, each order# is the number birth (order1=3, it's her third child). The corresponding variable#s are the counts (variable# means the woman has a child of birth order #). I mentioned in the post, that I do this 9 times bc I'm doing it for 5 year age groups (15-19; 20-24; etc). So the first set of variable# would be counts of birth by order when women were ages 15-19; the second set of variable# would be counts of births by order when women were 20-24. etc etc. After this, I sum up the counts in different ways (by woman's education, geography, etc).
So with the additional loop what I do is something more like this
forvalues k = 1/9{
forvalues i = 1/20 {
forvalues j = 1/20 {
replace variable`k'_`j' = 1 if order`i'==`j' & age`i'==`k' & birth_age`i'<36
}
}
}
Not sure if it's possible, but I wanted to simplify so I only need to cycle through each child once, without cycling through the birth orders and directly use the value in order# to assign a 1 to the correct variable. So if order1=3 and the woman had the child at the specific age group, assign variable[agegroup][3]=1; if order1=2, then variable[agegroup][2] should get a 1.
forvalues k=1/9{
forvalues i = 1/20 {
replace variable`k'_[`order`i'value] = 1 if age`i'==`k' & birth_age`i'<36
}
}
I would reshape twice. First reshape to long, then condition variable on !missing(order), then reshape back to wide.
* generate your data
clear
set obs 3
forvalues i = 1/3 {
generate order`i' = .
local k = (3 - `i' + 1)
forvalues j = 1/`k' {
replace order`i' = (`k' - `j' + 1) if (_n == `j')
}
}
list
*. list
*
* +--------------------------+
* | order1 order2 order3 |
* |--------------------------|
* 1. | 3 2 1 |
* 2. | 2 1 . |
* 3. | 1 . . |
* +--------------------------+
* I would rehsape to long, then back to wide
generate id = _n
reshape long order, i(id)
generate variable = !missing(order)
reshape wide order variable, i(id) j(_j)
order order* variable*
drop id
list
*. list
*
* +-----------------------------------------------------------+
* | order1 order2 order3 variab~1 variab~2 variab~3 |
* |-----------------------------------------------------------|
* 1. | 3 2 1 1 1 1 |
* 2. | 2 1 . 1 1 0 |
* 3. | 1 . . 1 0 0 |
* +-----------------------------------------------------------+
Using a simple forvalues loop with generate and missing() is orders of magnitude faster than other proposed solutions (until now). For this problem you need only one loop to traverse the complete list of variables, not two, as in the original post. Below some code that shows both points.
*----------------- generate some data ----------------------
clear all
set more off
local numobs 60
set obs `numobs'
quietly {
forvalues i = 1/`numobs' {
generate order`i' = .
local k = (`numobs' - `i' + 1)
forvalues j = 1/`k' {
replace order`i' = (`k' - `j' + 1) if (_n == `j')
}
}
}
timer clear
*------------- method 1 (gen + missing()) ------------------
timer on 1
quietly {
forvalues i = 1/`numobs' {
generate variable`i' = !missing(order`i')
}
}
timer off 1
* ----------- method 2 (reshape + missing()) ---------------
drop variable*
timer on 2
quietly {
generate id = _n
reshape long order, i(id)
generate variable = !missing(order)
reshape wide order variable, i(id) j(_j)
}
timer off 2
*--------------- method 3 (egen, rowmax()) -----------------
drop variable*
timer on 3
quietly {
// loop over the order variables creating dummies
forvalues v=1/`numobs' {
tab order`v', gen(var`v'_)
}
// loop over the domain of the order variables
// (may need to change)
forvalues l=1/`numobs' {
egen variable`l' = rmax(var*_`l')
drop var*_`l'
}
}
timer off 3
*----------------- method 4 (original post) ----------------
drop variable*
timer on 4
quietly {
forvalues i = 1/`numobs' {
gen variable`i' = 0
forvalues j = 1/`numobs' {
replace variable`i' = 1 if order`i'==`j'
}
}
}
timer off 4
*-----------------------------------------------------------
timer list
The timed procedures give
. timer list
1: 0.00 / 1 = 0.0010
2: 0.30 / 1 = 0.3000
3: 0.34 / 1 = 0.3390
4: 0.07 / 1 = 0.0700
where timer 1 is the simple gen, timer 2 the reshape, timer 3 the egen, rowmax(), and timer 4 the original post.
The reason you need only one loop is that Stata's approach is to execute the command for all observations in the database, from top (first observation) to bottom (last observation). For example, variable1 is generated but according to whether order1 is missing or not; this is done for all observations of both variables, without an explicit loop.
I wonder if you actually need to do this. For future questions, if you have a further goal in mind, I think a good strategy is to mention it in your post.
Note: I've reused code from other posters' answers.
Here's a simpler way to do it (that still requires 2 loops):
// loop over the order variables creating dummies
forvalues v=1/20 {
tab order`v', gen(var`v'_)
}
// loop over the domain of the order variables (may need to change)
forvalues l=1/3 {
egen variable`l' = rmax(var*_`l')
drop var*_`l'
}

Computing all values or stopping and returning just the best value if found

I have a list of items and for each item I am computing a value. Computing this value is a bit computationally intensive so I want to minimise it as much as possible.
The algorithm I need to implement is this:
I have a value X
For each item
a. compute the value for it, if it is < 0 ignore it completely
b. if (value > 0) && (value < X)
return pair (item, value)
Return all (item, value) pairs in a List (that have the value > 0), ideally sorted by value
To make it a bit clearer, step 3 only happens if none of the items have a value less than X. In step 2, when we encounter the first item that is less than X we should not compute the rest and just return that item (we can obviously return it in a Set() by itself to match the return type).
The code I have at the moment is as follows:
val itemValMap = items.foldLeft(Map[Item, Int)]()) {
(map : Map[Item, Int], key : Item) =>
val value = computeValue(item)
if ( value >= 0 ) //we filter out negative ones
map + (key -> value)
else
map
}
val bestItem = itemValMap.minBy(_._2)
if (bestItem._2 < bestX)
{
List(bestItem)
}
else
{
itemValMap.toList.sortBy(_._2)
}
However, what this code is doing is computing all the values in the list and choosing the best one, rather than stopping as a 'better' one is found. I suspect I have to use Streams in some way to achieve this?
OK, I'm not sure how your whole setup looks like, but I tried to prepare a minimal example that would mirror your situation.
Here it is then:
object StreamTest {
case class Item(value : Int)
def createItems() = List(Item(0),Item(3),Item(30),Item(8),Item(8),Item(4),Item(54),Item(-1),Item(23),Item(131))
def computeValue(i : Item) = { Thread.sleep(3000); i.value * 2 - 2 }
def process(minValue : Int)(items : Seq[Item]) = {
val stream = Stream(items: _*).map(item => item -> computeValue(item)).filter(tuple => tuple._2 >= 0)
stream.find(tuple => tuple._2 < minValue).map(List(_)).getOrElse(stream.sortBy(_._2).toList)
}
}
Each calculation takes 3 seconds. Now let's see how it works:
val items = StreamTest.createItems()
val result = StreamTest.process(2)(items)
result.foreach(r => println("Original: " + r._1 + " , calculated: " + r._2))
Gives:
[info] Running Main
Original: Item(3) , calculated: 4
Original: Item(4) , calculated: 6
Original: Item(8) , calculated: 14
Original: Item(8) , calculated: 14
Original: Item(23) , calculated: 44
Original: Item(30) , calculated: 58
Original: Item(54) , calculated: 106
Original: Item(131) , calculated: 260
[success] Total time: 31 s, completed 2013-11-21 15:57:54
Since there's no value smaller than 2, we got a list ordered by the calculated value. Notice that two pairs are missing, because calculated values are smaller than 0 and got filtered out.
OK, now let's try with a different minimum cut-off point:
val result = StreamTest.process(5)(items)
Which gives:
[info] Running Main
Original: Item(3) , calculated: 4
[success] Total time: 7 s, completed 2013-11-21 15:55:20
Good, it returned a list with only one item, the first value (second item in the original list) that was smaller than 'minimal' value and was not smaller than 0.
I hope that the example above is easily adaptable to your needs...
A simple way to avoid the computation of unneeded values is to make your collection lazy by using the view method:
val weigthedItems = items.view.map{ i => i -> computeValue(i) }.filter(_._2 >= 0 )
weigthedItems.find(_._2 < X).map(List(_)).getOrElse(weigthedItems.sortBy(_._2))
By example here is a test in the REPL:
scala> :paste
// Entering paste mode (ctrl-D to finish)
type Item = String
def computeValue( item: Item ): Int = {
println("Computing " + item)
item.toInt
}
val items = List[Item]("13", "1", "5", "-7", "12", "3", "-1", "15")
val X = 10
val weigthedItems = items.view.map{ i => i -> computeValue(i) }.filter(_._2 >= 0 )
weigthedItems.find(_._2 < X).map(List(_)).getOrElse(weigthedItems.sortBy(_._2))
// Exiting paste mode, now interpreting.
Computing 13
Computing 1
defined type alias Item
computeValue: (item: Item)Int
items: List[String] = List(13, 1, 5, -7, 12, 3, -1, 15)
X: Int = 10
weigthedItems: scala.collection.SeqView[(String, Int),Seq[_]] = SeqViewM(...)
res27: Seq[(String, Int)] = List((1,1))
As you can see computeValue was only called up to the first value < X (that is, up to 1)