Duplication of data entries by id if they meet a certain condition

Duplication of data entries by id if they meet a certain condition - stata

In the original choice data set, individuals (id) are captured making purchases (choice) among all the product options possible (assortchoice is a product code). Every individual always faces the same set of products to choose from; as a result the value of choice is always either 0 or 1 ("was the product chosen or not?").
clear
input
id assortchoice choice sumchoice
2 12 1 2
2 13 0 2
2 14 0 2
2 15 0 2
2 16 0 2
2 17 0 2
2 18 0 2
2 19 0 2
2 20 0 2
2 21 0 2
2 22 0 2
2 23 1 2
3 12 1 1
3 13 0 1
3 14 0 1
3 15 0 1
3 16 0 1
3 17 0 1
3 18 0 1
3 19 0 1
3 20 0 1
3 21 0 1
3 22 0 1
3 23 0 1
4 12 1 3
4 13 0 3
4 14 1 3
4 15 1 3
4 16 0 3
4 17 0 3
4 18 0 3
4 19 0 3
4 20 0 3
4 21 0 3
4 22 0 3
4 23 0 3
end
I created the following code to understand how many choices were made by each individual:
egen sumchoice=total(choice), by(id)
In this example, an individual 3 (id=3) only chose one product (since sumchoice=1), but individual 2 made two choices (sumchoice=2). Finally, individual 4 made three choices (sumchoice=3).
Since this is a choice data, I need to transform all the instances of multiple choices into sets of single choices.
What I mean by that: if an individual made two purchases, I need to duplicate the choice set for that individual twice; for an individual who made 3 purchases, I need to replicate the choice set three times, so the final structure looks like the data set below.
clear
input
id transaction assortchoice choice
2 1 12 1
2 1 13 0
2 1 14 0
2 1 15 0
2 1 16 0
2 1 17 0
2 1 18 0
2 1 19 0
2 1 20 0
2 1 21 0
2 1 22 0
2 1 23 0
2 2 12 0
2 2 13 0
2 2 14 0
2 2 15 0
2 2 16 0
2 2 17 0
2 2 18 0
2 2 19 0
2 2 20 0
2 2 21 0
2 2 22 0
2 2 23 1
3 1 12 1
3 1 13 0
3 1 14 0
3 1 15 0
3 1 16 0
3 1 17 0
3 1 18 0
3 1 19 0
3 1 20 0
3 1 21 0
3 1 22 0
3 1 23 0
4 1 12 1
4 1 13 0
4 1 14 0
4 1 15 0
4 1 16 0
4 1 17 0
4 1 18 0
4 1 19 0
4 1 20 0
4 1 21 0
4 1 22 0
4 1 23 0
4 2 12 0
4 2 13 0
4 2 14 1
4 2 15 0
4 2 16 0
4 2 17 0
4 2 18 0
4 2 19 0
4 2 20 0
4 2 21 0
4 2 22 0
4 2 23 0
4 3 12 0
4 3 13 0
4 3 14 0
4 3 15 1
4 3 16 0
4 3 17 0
4 3 18 0
4 3 19 0
4 3 20 0
4 3 21 0
4 3 22 0
4 3 23 0
end
***update:
transaction indicates which transaction order this is:
bysort id assortchoice (choice): gen transaction=_n
Hence, choice=1 should appear only once per each transaction.

The answer isn't quite "use expand" as there is a twist that you don't want exact replicates.
expand sumchoice
bysort id assortchoice (choice) : replace choice = 0 if _n != _N & choice == 1
list if id == 2 , sepby(assortchoice)
+-----------------------------------+
| id assort~e choice sumcho~e |
|-----------------------------------|
1. | 2 12 0 2 |
2. | 2 12 1 2 |
|-----------------------------------|
3. | 2 13 0 2 |
4. | 2 13 0 2 |
|-----------------------------------|
5. | 2 14 0 2 |
6. | 2 14 0 2 |
|-----------------------------------|
7. | 2 15 0 2 |
8. | 2 15 0 2 |
|-----------------------------------|
9. | 2 16 0 2 |
10. | 2 16 0 2 |
|-----------------------------------|
11. | 2 17 0 2 |
12. | 2 17 0 2 |
|-----------------------------------|
13. | 2 18 0 2 |
14. | 2 18 0 2 |
|-----------------------------------|
15. | 2 19 0 2 |
16. | 2 19 0 2 |
|-----------------------------------|
17. | 2 20 0 2 |
18. | 2 20 0 2 |
|-----------------------------------|
19. | 2 21 0 2 |
20. | 2 21 0 2 |
|-----------------------------------|
21. | 2 22 0 2 |
22. | 2 22 0 2 |
|-----------------------------------|
23. | 2 23 0 2 |
24. | 2 23 1 2 |
+-----------------------------------+

Related

Subtract Set Value at Aggregated Level

Values are for two groups by quarter.
In DAX, need to summarize all the data but also need to remove -5 from each quarter (-20 for full year) in 2021 for Group 1, without allowing the value to go below 0.
This only impacts:
Group 1
2021
However, I also need to retain the data details without the adjustment. So I can't do this in Power Query.
Data:
Group
Date
Value
1
01/01/2020
10
1
02/01/2020
9
1
03/01/2020
10
1
04/01/2020
8
1
05/01/2020
10
1
06/01/2020
11
1
07/01/2020
18
1
08/01/2020
2
1
09/01/2020
1
1
10/01/2020
0
1
11/01/2020
1
1
12/01/2020
0
1
01/01/2021
1
1
02/01/2021
12
1
03/01/2021
12
1
04/01/2021
3
1
05/01/2021
13
1
06/01/2021
14
1
07/01/2021
7
1
08/01/2021
1
1
09/01/2021
0
1
10/01/2021
1
1
11/01/2021
2
1
12/01/2021
1
2
01/01/2020
18
2
02/01/2020
7
2
03/01/2020
6
2
04/01/2020
8
2
05/01/2020
12
2
06/01/2020
13
2
07/01/2020
14
2
08/01/2020
8
2
09/01/2020
7
2
10/01/2020
6
2
11/01/2020
5
2
12/01/2020
4
2
01/01/2021
12
2
02/01/2021
18
2
03/01/2021
19
2
04/01/2021
20
2
05/01/2021
12
2
06/01/2021
12
2
07/01/2021
7
2
08/01/2021
18
2
09/01/2021
16
2
10/01/2021
15
2
11/01/2021
13
2
12/01/2021
1
Result:
Qtr/Year
Group 1 Value
Group 2 Value
Total
Q1-2020
29
31
60
Q2-2020
29
33
62
Q3-2020
21
29
50
Q4-2020
1
15
16
2020
80
108
188
Q1-2021
20
49
69
Q2-2021
25
44
69
Q3-2021
3
41
44
Q4-2021
0
29
29
2021
48
271
211

I'd suggest summarizing at the Year/Quarter/Group granularity and summing that up as follows:
SumValue =
VAR Summary =
SUMMARIZE (
Table2,
Table2[Year],
Table2[Qtr],
Table2[Group],
"#RawValue", SUM ( Table2[Value] ),
"#RemoveValue", IF ( Table2[Year] = 2021 && Table2[Group] = 1, 5 )
)
RETURN
SUMX ( Summary, MAX ( [#RawValue] - [#RemoveValue], 0 ) )
(This assumes the amount to remove for a year is the same as for four quarters.)

Solve two equations of Linear Inequality involving three different variables through code

I have two linear inequality equations with me, I am trying to understand how can I solve them using the code.
2x + z <= A
3y + z <= B
I know the values of A and B. I am trying to find all the possible positive values of x , y and z that will satisfy the given equation.
I want to solve the above equation using code but I haven't been able to come up with anything.
Any pointers are helpful. Thanks.

Linear inequalities can be represented as boolean expressions. Either the inequality is true, or false. We can use a function in C++ just like a function in mathematics here.
bool equation_one(double x, double z, double A)
{
return ((2 * x) + z <= A);
}
bool equation_two(double y, double z, double A)
{
return ((3 * y) + z <= A);
}
Then, if we wanted to see if any values satisfies these equations, we can just try values.
NOTE: This is horribly inefficient. We are looking at cubic time bounds. But, this is much more straight forward then using a linear algebra technique.
for(double x = 0; x <= A/2; x++)
for(double y = 0; y <= B/3; y++)
for(double z = 0; z <= A && z <= B; z++)
if(equation_one(x, z, A) && equation_two(y, z, B))
//These are the values we want!
EDIT: My bad, you said positive values only. Fixed.
EDIT 2: Thought I'd provide some output. For A = 20 and B = 20, these are all possible positive integer values that satisfy the equations with x, y, and z respectively.
0 0 0
0 0 1
0 0 2
0 0 3
0 0 4
0 0 5
0 0 6
0 0 7
0 0 8
0 0 9
0 0 10
0 0 11
0 0 12
0 0 13
0 0 14
0 0 15
0 0 16
0 0 17
0 0 18
0 0 19
0 0 20
0 1 0
0 1 1
0 1 2
0 1 3
0 1 4
0 1 5
0 1 6
0 1 7
0 1 8
0 1 9
0 1 10
0 1 11
0 1 12
0 1 13
0 1 14
0 1 15
0 1 16
0 1 17
0 2 0
0 2 1
0 2 2
0 2 3
0 2 4
0 2 5
0 2 6
0 2 7
0 2 8
0 2 9
0 2 10
0 2 11
0 2 12
0 2 13
0 2 14
0 3 0
0 3 1
0 3 2
0 3 3
0 3 4
0 3 5
0 3 6
0 3 7
0 3 8
0 3 9
0 3 10
0 3 11
0 4 0
0 4 1
0 4 2
0 4 3
0 4 4
0 4 5
0 4 6
0 4 7
0 4 8
0 5 0
0 5 1
0 5 2
0 5 3
0 5 4
0 5 5
0 6 0
0 6 1
0 6 2
1 0 0
1 0 1
1 0 2
1 0 3
1 0 4
1 0 5
1 0 6
1 0 7
1 0 8
1 0 9
1 0 10
1 0 11
1 0 12
1 0 13
1 0 14
1 0 15
1 0 16
1 0 17
1 0 18
1 1 0
1 1 1
1 1 2
1 1 3
1 1 4
1 1 5
1 1 6
1 1 7
1 1 8
1 1 9
1 1 10
1 1 11
1 1 12
1 1 13
1 1 14
1 1 15
1 1 16
1 1 17
1 2 0
1 2 1
1 2 2
1 2 3
1 2 4
1 2 5
1 2 6
1 2 7
1 2 8
1 2 9
1 2 10
1 2 11
1 2 12
1 2 13
1 2 14
1 3 0
1 3 1
1 3 2
1 3 3
1 3 4
1 3 5
1 3 6
1 3 7
1 3 8
1 3 9
1 3 10
1 3 11
1 4 0
1 4 1
1 4 2
1 4 3
1 4 4
1 4 5
1 4 6
1 4 7
1 4 8
1 5 0
1 5 1
1 5 2
1 5 3
1 5 4
1 5 5
1 6 0
1 6 1
1 6 2
2 0 0
2 0 1
2 0 2
2 0 3
2 0 4
2 0 5
2 0 6
2 0 7
2 0 8
2 0 9
2 0 10
2 0 11
2 0 12
2 0 13
2 0 14
2 0 15
2 0 16
2 1 0
2 1 1
2 1 2
2 1 3
2 1 4
2 1 5
2 1 6
2 1 7
2 1 8
2 1 9
2 1 10
2 1 11
2 1 12
2 1 13
2 1 14
2 1 15
2 1 16
2 2 0
2 2 1
2 2 2
2 2 3
2 2 4
2 2 5
2 2 6
2 2 7
2 2 8
2 2 9
2 2 10
2 2 11
2 2 12
2 2 13
2 2 14
2 3 0
2 3 1
2 3 2
2 3 3
2 3 4
2 3 5
2 3 6
2 3 7
2 3 8
2 3 9
2 3 10
2 3 11
2 4 0
2 4 1
2 4 2
2 4 3
2 4 4
2 4 5
2 4 6
2 4 7
2 4 8
2 5 0
2 5 1
2 5 2
2 5 3
2 5 4
2 5 5
2 6 0
2 6 1
2 6 2
3 0 0
3 0 1
3 0 2
3 0 3
3 0 4
3 0 5
3 0 6
3 0 7
3 0 8
3 0 9
3 0 10
3 0 11
3 0 12
3 0 13
3 0 14
3 1 0
3 1 1
3 1 2
3 1 3
3 1 4
3 1 5
3 1 6
3 1 7
3 1 8
3 1 9
3 1 10
3 1 11
3 1 12
3 1 13
3 1 14
3 2 0
3 2 1
3 2 2
3 2 3
3 2 4
3 2 5
3 2 6
3 2 7
3 2 8
3 2 9
3 2 10
3 2 11
3 2 12
3 2 13
3 2 14
3 3 0
3 3 1
3 3 2
3 3 3
3 3 4
3 3 5
3 3 6
3 3 7
3 3 8
3 3 9
3 3 10
3 3 11
3 4 0
3 4 1
3 4 2
3 4 3
3 4 4
3 4 5
3 4 6
3 4 7
3 4 8
3 5 0
3 5 1
3 5 2
3 5 3
3 5 4
3 5 5
3 6 0
3 6 1
3 6 2
4 0 0
4 0 1
4 0 2
4 0 3
4 0 4
4 0 5
4 0 6
4 0 7
4 0 8
4 0 9
4 0 10
4 0 11
4 0 12
4 1 0
4 1 1
4 1 2
4 1 3
4 1 4
4 1 5
4 1 6
4 1 7
4 1 8
4 1 9
4 1 10
4 1 11
4 1 12
4 2 0
4 2 1
4 2 2
4 2 3
4 2 4
4 2 5
4 2 6
4 2 7
4 2 8
4 2 9
4 2 10
4 2 11
4 2 12
4 3 0
4 3 1
4 3 2
4 3 3
4 3 4
4 3 5
4 3 6
4 3 7
4 3 8
4 3 9
4 3 10
4 3 11
4 4 0
4 4 1
4 4 2
4 4 3
4 4 4
4 4 5
4 4 6
4 4 7
4 4 8
4 5 0
4 5 1
4 5 2
4 5 3
4 5 4
4 5 5
4 6 0
4 6 1
4 6 2
5 0 0
5 0 1
5 0 2
5 0 3
5 0 4
5 0 5
5 0 6
5 0 7
5 0 8
5 0 9
5 0 10
5 1 0
5 1 1
5 1 2
5 1 3
5 1 4
5 1 5
5 1 6
5 1 7
5 1 8
5 1 9
5 1 10
5 2 0
5 2 1
5 2 2
5 2 3
5 2 4
5 2 5
5 2 6
5 2 7
5 2 8
5 2 9
5 2 10
5 3 0
5 3 1
5 3 2
5 3 3
5 3 4
5 3 5
5 3 6
5 3 7
5 3 8
5 3 9
5 3 10
5 4 0
5 4 1
5 4 2
5 4 3
5 4 4
5 4 5
5 4 6
5 4 7
5 4 8
5 5 0
5 5 1
5 5 2
5 5 3
5 5 4
5 5 5
5 6 0
5 6 1
5 6 2
6 0 0
6 0 1
6 0 2
6 0 3
6 0 4
6 0 5
6 0 6
6 0 7
6 0 8
6 1 0
6 1 1
6 1 2
6 1 3
6 1 4
6 1 5
6 1 6
6 1 7
6 1 8
6 2 0
6 2 1
6 2 2
6 2 3
6 2 4
6 2 5
6 2 6
6 2 7
6 2 8
6 3 0
6 3 1
6 3 2
6 3 3
6 3 4
6 3 5
6 3 6
6 3 7
6 3 8
6 4 0
6 4 1
6 4 2
6 4 3
6 4 4
6 4 5
6 4 6
6 4 7
6 4 8
6 5 0
6 5 1
6 5 2
6 5 3
6 5 4
6 5 5
6 6 0
6 6 1
6 6 2
7 0 0
7 0 1
7 0 2
7 0 3
7 0 4
7 0 5
7 0 6
7 1 0
7 1 1
7 1 2
7 1 3
7 1 4
7 1 5
7 1 6
7 2 0
7 2 1
7 2 2
7 2 3
7 2 4
7 2 5
7 2 6
7 3 0
7 3 1
7 3 2
7 3 3
7 3 4
7 3 5
7 3 6
7 4 0
7 4 1
7 4 2
7 4 3
7 4 4
7 4 5
7 4 6
7 5 0
7 5 1
7 5 2
7 5 3
7 5 4
7 5 5
7 6 0
7 6 1
7 6 2
8 0 0
8 0 1
8 0 2
8 0 3
8 0 4
8 1 0
8 1 1
8 1 2
8 1 3
8 1 4
8 2 0
8 2 1
8 2 2
8 2 3
8 2 4
8 3 0
8 3 1
8 3 2
8 3 3
8 3 4
8 4 0
8 4 1
8 4 2
8 4 3
8 4 4
8 5 0
8 5 1
8 5 2
8 5 3
8 5 4
8 6 0
8 6 1
8 6 2
9 0 0
9 0 1
9 0 2
9 1 0
9 1 1
9 1 2
9 2 0
9 2 1
9 2 2
9 3 0
9 3 1
9 3 2
9 4 0
9 4 1
9 4 2
9 5 0
9 5 1
9 5 2
9 6 0
9 6 1
9 6 2
10 0 0
10 1 0
10 2 0
10 3 0
10 4 0
10 5 0
10 6 0

fortran read and write from file(reading from .msh and writing to dat)

I am trying to read a .msh file and want to generate .dat file in rearranged manner (node number, x1 ,y1 , z1, x2, y2, z2)
$MeshFormatv
2.2 0 8
$EndMeshFormat
$PhysicalNames
4
1 1 "inlet"
1 2 "top"
1 3 "exit"
1 4 "bottom"
$EndPhysicalNames
$Nodes
45
1 -2 -2 0
2 2 -2 0
3 2 2 0
4 -2 2 0
5 -1.666666666666667 -2 0
6 -1.333333333333333 -2 0
7 -1 -2 0
8 -0.6666666666666665 -2 0
9 -0.3333333333333335 -2 0
10 0 -2 0
11 0.3333333333333335 -2 0
12 0.666666666666667 -2 0
13 1 -2 0
14 1.333333333333333 -2 0
15 1.666666666666667 -2 0
16 2 -1.666666666666667 0
17 2 -1.333333333333333 0
18 2 -1 0
19 2 -0.6666666666666665 0
20 2 -0.3333333333333335 0
21 2 0 0
22 2 0.3333333333333335 0
23 2 0.666666666666667 0
24 2 1 0
25 2 1.333333333333333 0
26 2 1.666666666666667 0
27 1.666666666666667 2 0
28 1.333333333333333 2 0
29 1 2 0
30 0.6666666666666665 2 0
31 0.3333333333333335 2 0
32 0 2 0
33 -0.3333333333333335 2 0
34 -0.666666666666667 2 0
35 -1 2 0
36 -1.333333333333333 2 0
37 -1.666666666666667 2 0
38 -2 1.555555555555556 0
39 -2 1.111111111111111 0
40 -2 0.6666666666666667 0
41 -2 0.2222222222222223 0
42 -2 -0.2222222222222223 0
43 -2 -0.6666666666666665 0
44 -2 -1.111111111111111 0
45 -2 -1.555555555555555 0
$EndNodes
$Elements
45
1 1 2 4 1 1 5
2 1 2 4 1 5 6
3 1 2 4 1 6 7
4 1 2 4 1 7 8
5 1 2 4 1 8 9
6 1 2 4 1 9 10
7 1 2 4 1 10 11
8 1 2 4 1 11 12
9 1 2 4 1 12 13
10 1 2 4 1 13 14
11 1 2 4 1 14 15
12 1 2 4 1 15 2
13 1 2 3 2 2 16
14 1 2 3 2 16 17
15 1 2 3 2 17 18
16 1 2 3 2 18 19
17 1 2 3 2 19 20
18 1 2 3 2 20 21
19 1 2 3 2 21 22
20 1 2 3 2 22 23
21 1 2 3 2 23 24
22 1 2 3 2 24 25
23 1 2 3 2 25 26
24 1 2 3 2 26 3
25 1 2 2 3 3 27
26 1 2 2 3 27 28
27 1 2 2 3 28 29
28 1 2 2 3 29 30
29 1 2 2 3 30 31
30 1 2 2 3 31 32
31 1 2 2 3 32 33
32 1 2 2 3 33 34
33 1 2 2 3 34 35
34 1 2 2 3 35 36
35 1 2 2 3 36 37
36 1 2 2 3 37 4
37 1 2 1 4 4 38
38 1 2 1 4 38 39
39 1 2 1 4 39 40
40 1 2 1 4 40 41
41 1 2 1 4 41 42
42 1 2 1 4 42 43
43 1 2 1 4 43 44
44 1 2 1 4 44 45
45 1 2 1 4 45 1
$EndElements
I have tried with allocatable, I want to skip the lines till character '$Nodes' appear and and one more line then read it in a array and then skip the three lines of character. Read the next in another array and then rearrange the no as mentioned above.
program coordinates
implicit none
INTEGER:: ierror, nodeno, elementno, i, j, k , t, p, l=0, n
CHARACTER:: command
real::data(2,100)
!CHARACTER (len=5)::N!odes
!CHARACTER (len=8)::EndN!odes
!CHARACTER (len=8)::E!lements
!CHARACTER (len=11)::EndE!lements
!CHARACTER :: No*5, EndN*8, E*8, EndE*11
! CHARACTER*5 :: Nod
! CHARACTER*8 :: Ele
! real, allocatable, dimension(:,4)::node
! real, allocatable, dimension(:,7)::element
! real, allocatable, dimension(:)::n,x,y,z,a,b,c,d,g,h
!call system(l='grep -n '$Nodes' /home/user/Nitesh/Fortran/rect.msh|tail -LineNumberToStartWith|grep regEX')
! 'l = 'grep -n '$Nodes' /home/user/Nitesh/Fortran/rect.msh
command = 'grep -n $Nodes /home/user/Nitesh/Fortran/rect.msh|cut -f1 -d:'
! call system('command')
call system('grep -n '$Nodes' /home/user/Nitesh/Fortran/rect.msh|cut -f1 -d:')
! call system('l')
! print*, "enter the no. of nodes"
! read*,t
! print*, "enter the no. of elements"
! read*,p
! print*, "enter the line no. nodes data(array) starting from"
! read*,l
!allocate(node(t),element(j))
print*, "opening file"
! allocate(n(t),x(t),y(t),z(t),a(t),b(t),c(t),d(t),g(t),h(t))
OPEN (FILE='/home/user/Nitesh/Fortran/my.dat',UNIT=8, STATUS='OLD', ACTION='READ', &
IOSTAT=ierror)
if(ierror/=0)then
print*,"File rect.msh cannot be open"
stop
end if
do i=1,n
read(8,'(/)')
end do
do i=1,t+l
read(8,*) data(:,i)
!read(8,*) j,k
end do
! 8 format('',F10.6,F10.6,F10.6,F10.6)
! if(Nod == 'Nodes') then
! read*,(n(i),x(i),y(i),z(i),i=1,t)
! end if
! if(Ele == 'Elements') then
! read*,(n(j),a(j),b(j),c(j),d(j),g(j),h(j),k=1,p)
! end if
! OPEN (UNIT=10, FILE='/home/Nitesh/rect_new.msh', STATUS='NEW', ACTION='WRITE', &
! IOSTAT=ierror)
! write(*,10)
! 10 format (' ',n())
! CLOSE (UNIT=10)
CLOSE (UNIT=8)
print*, "file read"
! do i=1,n
! n(t)=g(j)
! x(t),y(t),z(t)
open(file='/home/user/Nitesh/Fortran/rect.dat', unit=24, status='replace', action='write', &
IOSTAT=ierror)
if(ierror/=0)then
print*,"File rect.msh cannot be open"
stop
end if
print*,"writing data"
do i=1,l+t
write(24,*) data(:,i)
end do
print*,"data written"
! OPEN (UNIT=10, FILE='rect_new.msh', STATUS='NEW', ACTION='WRITE', &
! IOSTAT=ierror)
! write(*,10)
! 10 format (' ','The coordinates of elements')
! CLOSE (UNIT=10)
end program coordinates

Consecutive tagging in Stata

The task is to identify which consecutive week a product (in a specific store) has been on promotion.
clear
input ///
upc week store promo
1 1 86 1
1 2 86 1
1 3 86 1
1 4 86 1
3 1 86 0
3 2 86 1
4 1 86 0
4 2 86 1
4 3 86 1
end
The end result should look something like this:
upc week store promo promocount
1 1 86 1 1
1 2 86 1 2
1 3 86 1 3
1 4 86 1 4
3 1 86 0 0
3 2 86 1 1
4 1 86 0 0
4 2 86 1 1
4 3 86 1 2
end
I have 800K obs., and I am encountering a problem with the real data set. When I run bysort upc week store promo: gen prcount = _n if promo==1, my data set is sorted in a different way (which, as a result, yields wrong tagging):
upc week store promo
1 1 86 1
3 1 86 0
4 1 86 0
1 2 86 1
3 2 86 1
4 2 86 1
1 3 86 1
4 3 86 1
1 4 86 1
Anyway, I now realize my code is wrong. Any suggestions?

I think
. quietly input ///
> upc week store promo
. generate promocount = 0
. bysort store upc (week): replace promocount = 1+cond(_n==1,0,promocount[_n-1]) if promo>0
(7 real changes made)
. list, clean noobs
upc week store promo promoc~t
1 1 86 1 1
1 2 86 1 2
1 3 86 1 3
1 4 86 1 4
3 1 86 0 0
3 2 86 1 1
4 1 86 0 0
4 2 86 1 1
4 3 86 1 2
does do what you want.

awk if-greater-than and replacement under condition

I have following data
......
6 4 4 17 154 93 309 0 11930
7 3 2 233 311 0 11936 11932 111874
8 3 1 15 0 11938 11943 211004 11449
9 3 2 55 102 0 11932 11941 111883
10 3 2 197 231 0 11925 11921 111849
11 3 2 160 777 0 11934 11928 111875
......
I hope to replace any values greater than 5000 to 0, from column 4 to column 9. How can I do this work with awk?

To print with lots of spaces like the input, something like this:
awk '{for(i=4;i<=NF;i++)if($i>5000)$i=0; for(i=1;i<=NF;i++)printf "%7d",$i;printf"\n"}' file
Output
6 4 4 17 154 93 309 0 0
7 3 2 233 311 0 0 0 0
8 3 1 15 0 0 0 0 0
9 3 2 55 102 0 0 0 0
10 3 2 197 231 0 0 0 0
11 3 2 160 777 0 0 0 0
For scrunched up together (TM) output, you can use this:
awk '{for(i=4;i<=NF;i++)if($i>5000)$i=0}1' file
6 4 4 17 154 93 309 0 0
7 3 2 233 311 0 0 0 0
8 3 1 15 0 0 0 0 0
9 3 2 55 102 0 0 0 0
10 3 2 197 231 0 0 0 0
11 3 2 160 777 0 0 0 0

An alternative approach (requires gawk4+):
{
patsplit($0, a, "[0-9]+", s)
printf s[0]
for (i=1; i<=length(a); i++){
if(i>4 && a[i]>5000) {
l=length(a[i])
a[i]=0
}
else l=0
printf "%"l"s%s", a[i], s[i]
}
printf "\n"
}
It is more flexible when the spacing would vary, as opposed to the example data. It might also be faster than the accepted answer, in case the number of fields is way bigger than 9.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Duplication of data entries by id if they meet a certain condition - stata

Related

Subtract Set Value at Aggregated Level

Solve two equations of Linear Inequality involving three different variables through code

fortran read and write from file(reading from .msh and writing to dat)

Consecutive tagging in Stata

awk if-greater-than and replacement under condition

Categories

Resources