Lost Update Scenario Concurrency

Lost Update Scenario Concurrency - concurrency

Time T1 T2
t1 READ(A)
t2 A = A - 50
t3 READ (A)
t4 A = A - 100
t5 WRITE (A)
t6 READ (B)
t7 WRITE(A)
t8 READ (B)
t9 B = B + 50
t10 WRITE (B)
t11 B = B + 10
t12 WRITE (B)
Can someone help me understand this schedule above and tell me if my interpretation is correct:
1) Assuming A = 300 first T1 will read A and subtract 50 so A = 250.
2) Then T2 (at time = t3) will Read(A) but it will read 300 (original value of A not updated value from time = t1) since t1 never said WRITE(A) in its execution after A = A - 50 correct?
3) Then in T2 (at time = t4) A = A - 100 so A is 200 and T2 will write that value to A. Then it will read B.
4) Now my question is when WRITE (A) on t7 will it write the value of A from step 3 (200) or the value of A from step 1 which is 250?

I share your interpretation
If T1 modifies A locally, then yes, it will write 250 from within T1.
But this sounds really architecture dependent. If A is shared by both thread, then T1 will write 200, because A will have been re-read, then decreased with 100.
To have a more accurate answer, I suggest you define what READ(), A = A + N and WRITE() really does. It seems to involve memory and files, but how to guess what you have in mind?
Probably another easy way to know, is to implement this.
Hope it helps

Related

Assign the work for 2 people so that the time is minimum

Question:
There are N works that needs to be assigned for 2 people. Person A can finish work i in a[i] time, person B can finish work i in b[i] time.
Each work can only be assigned to 1 person. After the works are assigned, each person will do their works seperately.
The overall time will be the larger of the total time taken by the 2 people.
Find a way to assign the work so that the Overall Time is minimum.
Example:
N = 6
a[] = 10 100 30 50 50 80
b[] = 100 30 40 40 60 90
Answer: 130
Explaination:
Person A do work 1, 3, 6 -> total time: 120
Person B do work 2, 4, 5 -> total time: 130
Overall time: 130
Constrants:
N <= 100
a[i], b[i] <= 30.000
My take
I tried solving it with dynamic-programming, more specifically: DP[i][p][c]
With i is the number of works done so far, p is total time of person A so far, c is total time of person B so far. For each i, we can try to give the work to either person A or B, then save the best answer in DP[i][p][c] so we dont have to recalculate it.
But p and c can get up to 3.000.000, so I tried to shrink it to DP[i][max(p,c)]
The code below gives the right answer for the example case, and some other case I generated:
int n, firstCost[105], secondCost[105];
int dp[105][300005];
int solve(int i, int p, int c){
if(i > n) return max(p, c);
int &res = dp[i][max(p, c)];
if(res != -1) return res;
res = INT_MAX;
int tmp1 = solve(i+1, p + firstCost[i], c);
int tmp2 = solve(i+1, p, c + secondCost[i]);
res = min(tmp1, tmp2);
return res;
}
int main(){
// input...
cout << solve(1, 0, 0);
}
But when I submited it, it gives the wrong anwer to this case:
20
4034 18449 10427 4752 8197 7698 17402 16164 12306 5249 19076 18560 16584 18969 3548 11260 6752 18052 14684 18113
19685 10028 938 10379 11583 10383 7175 4557 850 5704 14156 18587 2869 16300 15393 14874 18859 9232 6057 3562
My output was 77759 but the answer is suppose to be 80477.
I don't know what I did wrong, is there anyway to imrpove my solution?
P/S:
Here's the original problem, the page is in Vietnamese, you can create an account and submit there

The trick that you're missing is the idea of an optimal fringe.
You are trying to shrink it to max(p,c), but it may well be that you need to send the first half the jobs to person A, and that initially looks like a terrible set of choices. You are right that you could get the right answer with DP[i][p][c], but that quickly gets to be too much data.
But suppose that p0 <= p1 and c0 <= c1. Then there is absolutely no way that looking at a path through (p1, c1) can ever lead to a better answer than (p0, c0). And therefore we can drop (p1, c1) immediately.
I won't give you code, but I'll show you a bit of how this starts with your example.
4034 18449 10427 4752 8197 7698 17402 16164 12306 5249 19076 18560 16584 18969 3548 11260 6752 18052 14684 18113
19685 10028 938 10379 11583 10383 7175 4557 850 5704 14156 18587 2869 16300 15393 14874 18859 9232 6057 3562
At first we start off with DP = [[0,0]].
After we assign the first element, you get [[0,19685], [4034,0]].
After we assign the second we get, [[0,29713], [4034,10028], [18449,19685], [22483,0]]. We can drop [18449,19685] because it isn't as good as [4034,10028], so we get to [[0,29713], [4034,10028], [22483,0]].
The third element gives [[0,30651], [4034,10966], [10427,29713], [14461,10028], [22483,938], [32910,0]] and then we can drop [10427,29713] as being worse than [4034,10966]. And now we are at [[0,30651], [4034,10966], [14461,10028], [22483,938], [32910,0]].
And so on.
As an additional optimization I'd first sort the indexes by c[i]/p[i] and produce a greedy solution where we assign all of the beginning indexes to A and all of the end to B. From the existence of that greedy solution, we never need to look at any solution with p or c worse than that known solution. After we get half-way through the jobs, this should become a useful filter.

C++ Multithreading nested for loops

First off, I know very little about multithreading and I am having troubles finding how the best way to optimize this code, but multithreading seems the path I should be on.
double
applyFilter(struct Filter *filter, cs1300bmp *input, cs1300bmp *output)
{
long long cycStart, cycStop;
cycStart = rdtscll();
output -> width = input -> width;
output -> height = input -> height;
int temp1 = output -> width;
int temp2 = output -> height;
int width=temp1-1;
int height=temp2 -1;
int getDivisorVar= filter -> getDivisor();
int t0, t1, t2, t3, t4, t5, t6, t7, t8, t9;
int keep0= filter -> get(0,0);
int keep1= filter -> get(1,0);
int keep2= filter -> get(2,0);
int keep3= filter -> get(0,1);
int keep4= filter -> get(1,1);
int keep5= filter -> get(2,1);
int keep6= filter -> get(0,2);
int keep7= filter -> get(1,2);
int keep8= filter -> get(2,2);
//Declare variables before the loop
int plane, row, col;
for (plane=0; plane < 3; plane++) {
for(row=1; row < height ; row++) {
for (col=1; col < width; col++) {
t0 = (input -> color[plane][row - 1][col - 1]) * keep0;
t1 = (input -> color[plane][row][col - 1]) * keep1;
t2 = (input -> color[plane][row + 1][col - 1]) * keep2;
t3 = (input -> color[plane][row - 1][col]) * keep3;
t4 = (input -> color[plane][row][col]) * keep4;
t5 = (input -> color[plane][row + 1][col]) * keep5;
t6 = (input -> color[plane][row - 1][col + 1]) * keep6;
t7 = (input -> color[plane][row][col + 1]) * keep7;
t8 = (input -> color[plane][row + 1][col + 1]) * keep8;
// NEW LINE HERE
t9 = t0 + t1 + t2 + t3 + t4 + t5 + t6 + t7 + t8;
t9 = t9 / getDivisorVar;
if ( t9 < 0 ) {
t9 = 0;
}
if ( t9 > 255 ) {
t9 = 255;
}
output -> color[plane][row][col] = t9;
} ....
All of this code most likely isn't necessary, but it does provide some context. So because the first of the 3 "for" loop only goes from 0-2 I was hoping there was a way I could thread the bottom two "for" loops to all be running at the same time for a different "plane" value. Is this even possible? And if so, would it actually make my program faster?

I would also look into OpenMP. It is a great library that allows for threading in a VERY simple manner using pragmas. OpenMP is compileable on many platforms, you just have to make sure yours supports it!
I have a set of code that had 8 levels of for loops, and it threaded it very nicely.

Yes, it's perfectly possible. In this case, you should event get away without worrying about access synchronisation (ie race conditions), as both threads would be operating on different sets of data.
This would definitely speed up your code on a multicore machine.
You might want to have a look at std::thread (if you're ok with c++ 11) for cross platform threading implementation (since you haven't specified your target platform). Or better with threading support library
You may also think about detecting number of cores and launch appropriate number of threads, as in threadcount = min(planes, cores) and provide each worker function with access to single plane's set of data.

It certainly looks like you could break this into thread and you would probably see a good speed increase. However, your compiler will already be trying to unroll the loop for you and gain parallelism by vectorizing instructions. Your gains may not be as much as you suspect, especially if you are saturating the memory bus with reads from disparate locations.
What you might consider is, if this is a 2d graphics operation, try and use OpenGL or similar as it will leverage hardware on your system, and that has some parallelism built into it.

Threaded version of the code will be slower than simple implementation. Because in the threaded version there will be much time spend on synchronisation. Also in threaded version you will have cache performance drawback.
Also it is high probability, that outer for loop with 3 passes will be unrolled by complier and will be executed in parallel.
You can try to make threaded version and compare performance. Anyway it will be usefull experience.

For a situation like this you could do worse than use a compiler that automatically turns for loops into threads.
With code like that the compiler can determine whether or not there is any inter-iteration data dependency. If not then it knows that it can safely split the for loop(s) across multiple threads, putting bog standard thread syncing at the end. Normally such a compiler is able to insert code that determines at run time whether the overhead of having the threads is going to be outweighed by the benefits.
The only thing is, do you have a compiler that does it? If so then its by far the easiest way to get the benefits of threads for straightforward, almost overt parallelism like this.
I know that Sun's C compiler does it (I think they were one of the earliest to do this. It might be on only the Solaris version of their compiler). I think that Intel's compiler can too. I have doubts about GCC (though I'd be very happy to be corrected on that point), and I'm not too sure about Microsoft's compiler.

How many times will process P0 print "0"?

The following program consists of 3 concurrent processes and 3 binary semaphores The
semaphore, are initialized as S0=1 S1=0 S2=0
Process P0:
while(1)
{
wait (S0);
print '0';
release (S1);
release (S2);
}
Process P1:
wait(S1);
release (S0);
Process P2:
wait(S2);
release (S0);
How many times will process PO print '0"??
(A) At least twice (b) Exactly tWlce (c) Exactly thrice (d) Exactly once
in this I have a confusion that Process P1 and P2 as will execute once or they will continue after executing once as they are not having while loop like process P0, if they will execute once only then According to me the answer should be (b), and if they will execute again then the answer will be (A)
please help thanks in advance

Initially P0 will execute because only S0=1. It will print single 0.
Now when S1 and S2 are releases by P0 then any one of them can be executed.
Let us suppose P1 executes and releases S0(Now value of S0 is 1).
Now there are two possibilities either P0 or P2 can execute.
Let us take P2 executes and releases S0, so at the end P0 execute and print 0 (means two 0's)
but if P0 executes before P2 then total of 3 0's will print(one at the time of P0 and then P2 which releases S0 so P0 executes again).
So the perfect answer is at least two 0's.

The solution proceeds as below:
Only process P0 can execute first. That's because semaphore used by process P0 i.e S0 has an initial value of 1. Now when P0 calls wait on S0 the value of S0 becomes 0, implying that S0 has been taken by P0. As far as Process P1 and P2 are concerned, when they call wait on S1 and S2 respectively, they can't proceed because the semaphores are already initialized as taken i.e 0, so they have to wait until S1 and S2 are released!
P0 proceeds first and prints 0. Now the next statements release S1 and S2! When S1 is released the wait of process P1 is over as the value of S1 goes up by 1 and is flagged not taken. P1 takes S1 and makes S1 as taken. The same goes with Process P2.
Now, only one of P1 or P2 can execute, because either of them can be in the critical section at a given time.. Suppose P2 executes. It releases S0 and terminates.
Let P1 execute next.. P1 starts Releases S0 and terminates.
Now only P0 can execute because its in a while loop whose condition is set to true, which makes it to run always. P0 executes prints 0 second time and releases S1 and S2. But P1 and P2 have already been terminated so P0 waits forever for the release of S0.
Here's a second solution which prints 0 three times:
P0 starts, prints 0 adn releases S1 and S2.
Let P2 execute. P2 starts, releases S0 and terminates. After this only P0 or P1 can execute.
Let P0 execute. Prints 0 for second time and releases S1 and S2. At this point only P1 can execute.
P1 starts, releases S0, P1 terminates. At this point only P0 can execute because its in a while loop whose condition is set to true!
P0 starts, prints 0 for the 3rd time and releases S1 and S2. It then waits for someone to release S0 which never happens.
So the answer becomes exactly twice or exactly thrice, which can also be said "atleast twice"!
Please tell me if i am wrong anywhere!!
For more problems on semaphore, refer this

I'm assuming wait() decrements the semaphore and blocks if it becomes <= 0, whereas release increases the counter and wakes up the next process.
Given your code, P1 and P2 execute once (there is no loop around them). This means each of them triggers S0 once. And as P0 blocks waiting on S0 before every print, it will finally print '0' twice.
One more thing to check is the initial state of S0, because P0 will only block if S0 is 0. This is the case given your statement. Therefore, the answer is that P0 will print 0 exactly twice.

Reading the question and the code I would also say (A). I am assuming that the processes cannot be preempted before completing their task.
It says the initial state is S0=1 S1=0 S2=0, and from what we know P1 and P2 will execute exactly once.
Concurrent processes can be complex and however I try to describe the flow people will find faults with the way I think about it, that is ok, I'm here to learn too.
Now you have a few situations yielding different results depending on order of processes.
P0 -> P1 -> P0 -> P2 -> P0 = Three times
P0 -> P1 -> P2 -> P0 = Twice
P0 -> P2 -> P1 -> P0 = Twice
This gives us the answer of at least twice.
Edit:
All this is made under the assumption of wait() blocking while semaphore == 0 and that release() sets semaphore = 1 because otherwise the code would just mostly be insanity.
If the processes can be interrupted at any time, then things can get interesting.
P0 starts out running because S0=1 at start
P0 print '0';
P0 release(S1);
-- here S1 may take over or not --
P0 release(S2);
-- here S2 may take over or not --
P0 goes back to wait(S0)
-- here P0 continues or if S1 *and* S2 have not run blocks --
-- it may also be that only S1 or S2 ran and now the other will run --
Now I tried figuring out a way to visualize how things would work out, and I failed to find a way good to put it in the code block.
If both S1 and S2 runs as soon as they can, since the semaphores are binary and can be in only one of two states, P0 will only be run twice, however if scheduling is perverse enough to delay either S1 or S2 until P0 has passed wait() once more P0 will run three times.
But I think this question was not meant to have interruptable processes, it just gets messy.

P0 will execute first because only S0=1. Hence it will print 0 (for the first time). Also P0 releases S1 and S2. Since S1=1 and S2=1, therefore P1 or P2, any one of them can be executed.
Let us assume that P1 executes and releases S0 (Now value of S0 = 1). Note that P1 process is completed.
Now S0=1 and S2=1, hence either P0 can execute or P2 can execute. Let us check both the conditions:-
Let us assume that P2 executes, and releases S0 and completes its execution. Now P0 executes; S0=0 and prints 0 (i.e. second 0). And then releases S1 and S2. But note that P1 and P2 processes has already finished their execution. Again if P0 tries to execute it goes into sleep condition because S0=0. Therefore, minimum number of times '0' gets printed is 2.
Now, let us assume that P0 executes. Hence S0=0, (due to wait(S0)), and it will print 0 (second 0) and releases S1 and S2. Now only P2 can execute, because P1 has already completed its execution and P0 cannot execute because S0 = 0. Now P2 executes and releases S0 (i.e. S0=1) and finishes its execution. Now P0 starts its execution and again prints 0 (thrid 0) and releases S1 and S2 (Note that now S0=0). P1 and P2 has already completed its execution therefore again P1 takes its turn, but since S0=0, it goes into sleep condition. And the processes P1 and P2 which could wakeup P0 has already finished their execution.Therefore, maximum number of times '0' gets printed is 2.
Reference: http://www.btechonline.org/2013/01/gate-questions-os-synchronization.html

Initially only P0 can go inside the while loop as S0 = 1, S1 = 0, S2 = 0. P0 first prints '0' then, after releasing S1 and S2, either P1 or P2 will execute and release S0. So 0 is printed again.

How do I detect if two dates straddle a weekend?

Problem
Given two datetimes, dt0 and dt1 (could be out of order), what is an algorithm which can determine if there is at least 24 hours worth of weekend (SAT, SUN) between the two dates?
Assume that there is a dayofweek() function that returns 0 for SUN, 1 for MON, etc...
Note: the problem is easy to visualize geometrically in terms of line segments, but the calculation eludes me for the moment.
Solution
The solution below will work for UTC, but it will fail for DST.
weekdayno() implementation not included: SUN==0, MON==1, etc...
isWeekday() is also not shown, but is trivial to implement once you have dayofweek()
binary operator-() implementation also not shown, but we simply convert both instances to UNIX-time (no. of secs since Epoch) and take the difference to yield the number of seconds between two DateTimes
hh() mm() ss() are just const accessors for returning hours, minutes, and seconds, respectively
James McNellis is right on the mark concerning DST.
Getting this code to work for the general DST case is non-trivial: need to add tz and anywhere you do any kind of date arithmetic requires careful consideration. Additional unit tests will be needed.
Lessons Learned
Query stackoverflow for various ways to look at a problem.
You can never have too many unit tests: need them to flush out weird edge cases
Use visualization, if possible, to look at a problem
What appears to be a trivial problem can actually be a bit tricky when you look at the details (eg DST).
Keep the solution as simple as possible because your code will very likely change: in order to fix for bugs/new test cases or in order to add new features (eg make it work for DST). Keep it as readable and easy to understand as possible: prefer algorithms over switch/cases.
Be brave and try things out: keep hammering at the solution until something works comes about. Use unit-tests so you can continuously refactor. It takes a lot of work to write simple code, but in the end, it's worth it.
Conclusion
The current solution is sufficient for my purposes (I will use UTC to avoid DST problems). I will select holygeek's Answer for his suggestion that I draw some ASCII art. In this case, doing so has helped me come up with an algorithm that is easy-to-understand and really, as simple as I can possibly make it. Thanks to all for contributing to the analysis of this problem.
static const size_t ONEDAYINSECS = (24 * 60 * 60);
DateTime
DateTime::nextSatMorning() const
{
// 0 is SUN, 6 is SAT
return *this + (6 - weekdayno()) * ONEDAYINSECS -
(((hh() * 60) + mm())*60 + ss());
}
DateTime
DateTime::previousSunNight() const
{
return *this - ((weekdayno() - 1 + 7)%7) * ONEDAYINSECS -
(((hh() * 60) + mm())*60 + ss());
}
bool
DateTime::straddles_24HofWeekend_OrMore( const DateTime& newDt ) const
{
const DateTime& t0 = min( *this, newDt );
const DateTime& t1 = max( *this, newDt );
// ------------------------------------
//
// <--+--F--+--S--+--S--+--M--+-->
// t0 ^ ^ t1
// +---->+ +<----|
// | |
// +<--nSecs-->+
// edge0 edge1
//
// ------------------------------------
DateTime edge0 = t0.isWeekday() ? t0.nextSatMorning() : t0;
DateTime edge1 = t1.isWeekday() ? t1.previousSunNight() : t1;
return (edge1 - edge0) > ONEDAYINSECS;
}
John Leidegren asked for my unit tests so here there are (using googletest)
Note that they pass for the non-DST cases above (running for UTC) - I expect the current implementation to fail for DST cases (haven't added them to the test cases below yet).
TEST( Test_DateTime, tryNextSatMorning )
{
DateTime mon{ 20010108, 81315 };
DateTime exp_sat{ 20010113, 0ul };
EXPECT_EQ( exp_sat, mon.nextSatMorning() );
}
TEST( Test_DateTime, tryPrevSunNight )
{
DateTime tue{ 20010109, 81315 };
DateTime exp_sun1{ 20010108, 0ul };
EXPECT_EQ( exp_sun1, tue.previousSunNight() );
DateTime sun{ 20010107, 81315 };
DateTime exp_sun2{ 20010101, 0ul };
EXPECT_EQ( exp_sun2, sun.previousSunNight() );
}
TEST( Test_DateTime, straddlesWeekend )
{
DateTime fri{ 20010105, 163125 };
DateTime sat{ 20010106, 101515 };
DateTime sun{ 20010107, 201521 };
DateTime mon{ 20010108, 81315 };
DateTime tue{ 20010109, 81315 };
EXPECT_FALSE( fri.straddles_24HofWeekend_OrMore( sat ));
EXPECT_TRUE( fri.straddles_24HofWeekend_OrMore( sun ));
EXPECT_TRUE( fri.straddles_24HofWeekend_OrMore( mon ));
EXPECT_TRUE( sat.straddles_24HofWeekend_OrMore( sun ));
EXPECT_TRUE( sat.straddles_24HofWeekend_OrMore( mon ));
EXPECT_FALSE( sun.straddles_24HofWeekend_OrMore( mon ));
EXPECT_TRUE( fri.straddles_24HofWeekend_OrMore( tue ));
EXPECT_FALSE( sun.straddles_24HofWeekend_OrMore( tue ));
}

This diagram may help:
SAT SUN
|---------|---------|
a---------b
a---------b
...
a---------b

The silliest way I can think of solving this is: copy the smaller datetime value, continuously add to it until it's either larger than the other datetime value (the bigger one) or dayofweek() doesn't equal 0 or 7 any more. Then check if the total value of time you added is less than 24 hours.
A slightly less silly way would be to check its a weekend, add 24 hours of time and then check once to make sure its a weekend and still less than the second datetime.
Daylight savings shouldn't really come into play as long as your function to find what day it is works.

Approach it in a structured manner: What are some of the simple/edge cases? What is the average case?
Simple cases I can think of off the top of my head (note, since you've already forced t0 to be the lower value, I'm assuming that below):
If t1 - t0 is less than 1 day, return false
If t1 - t0 is >= 6 days, return true (there's ALWAYS 24 hours of weekend time in any given 6 day block, even if you start on a Sunday).
Then we take dayofweek() for both t0 and t1, and do some checks (this is the average case). We can be extra cheap here now because we know t0 is only up to 5 days earlier than t1.
Edit: Removed my conditions because there were nasty little edge cases I wasn't considering. Anyway, the solution I recommended is still viable, I just won't do it here.

This site calculates business days in C#, does that help? http://www.infopathdev.com/forums/t/7156.aspx

When exactly is the postfix increment operator evaluated in a complex expression?

Say I have an expression like this
short v = ( ( p[ i++ ] & 0xFF ) << 4 | ( p[ i ] & 0xF0000000 ) >> 28;
with p being a pointer to a dynamically allocated array of 32 bit integers.
When exactly will i be incremented? I noticed that the above code delivers a different value for v than the following code:
short v = ( p[ i++ ] & 0xFF) << 4;
v |= ( p[ i ] & 0xF0000000 ) >> 28;
My best guess for this behaviour is that i is not incremented before the right side of the above | is evaluated.
Any insight would be appreciated!
Thanks in advance,
\Bjoern

The problem is order of evaluation:
The C++ standard does not define the order of evaluation of sub expressions. This is done so that the compiler can be as aggressive as possible in optimizations.
Lets break it down:
a1 a2
v = ( ( p[ i++ ] & 0xFF ) << 4 | ( p[ i ] & 0xF0000000 ) >> 28;
-----
(1) a1 = p[i]
(2) i = i + 1 (i++) after (1)
(3) a2 = p[i]
(4) t3 = a1 & 0xFF after (1)
(5) t4 = a2 & 0xF0000000 after (3)
(6) t5 = t3 << 4 after (4)
(7) t6 = t4 >> 28 after (5)
(8) t7 = t5 | t6 after (6) and (7)
(9) v = t7 after (8)
Now the compiler is free to re-arrange thus sub expressions as long as the above 'after' clauses are not violated. So one quick easy optimization is move 3 up one slot and then do common expression removal (1) and (3) (now beside each other) are the same and thus we can eliminate (3)
But the compiler does not have to do the optimization (and is probably better than me at it and has other tricks up its sleeve). But you can see how the value of (a1) will always be what you expect, but the value of (a2) will depend on what order the compiler decides to do the other sub-expressions.
The only guarantees that you have that the compiler can not move sub-expressions past a sequence point. Your most common sequence point is ';' (the end of the statement). There are others, but I would avoid using this knowledge as most people don't know the compiler workings that well. If you write code that uses sequence point tricks then somebody may re-factor the code to make it look more readable and now your trick has just turned into undefined be-behavior.
short v = ( p[ i++ ] & 0xFF) << 4;
v |= ( p[ i ] & 0xF0000000 ) >> 28;
-----
(1) a1 = p[i]
(2) i = i + 1 (i++) after (1)
(4) t3 = a1 & 0xFF after (1)
(6) t5 = t3 << 4 after (4)
(A) v = t5 after (6)
------ Sequence Point
(3) a2 = p[i]
(5) t4 = a2 & 0xF0000000 after (3)
(7) t6 = t4 >> 28 after (5)
(8) t7 = v | t6 after (7)
(9) v = t7 after (8)
Here everything is well defined as the write to i is sued in place and not re-read in the same expression.
Simple rule. don't use ++ or -- operators inside a larger expression.
Your code looks just as readable like this:
++i; // prefer pre-increment (it makes no difference here, but is a useful habit)
v = ( ( p[ i ] & 0xFF ) << 4 | ( p[ i ] & 0xF0000000 ) >> 28;
See this article for detailed explanation of evaluation order:
What are all the common undefined behaviours that a C++ programmer should know about?

i is incremented sometime before the next sequence point. The only sequence point in the expression you have given is at the end of the statement - so "sometime before the end of the statement" is the answer in this case.
That's why you shouldn't both modify an lvalue and read its value without an intervening sequence point - the result is indeterminate.
The &&, ||, comma and ? operators introduce sequence points, as well as the end of an expression and a function call (the latter means that if you do f(i++, &i), the body of f() will see the updated value if it uses the pointer to examine i).

The first example is undefined behavior. You cannot read a variable more than once in an expression that also changes the value of the variable. See this (among other places on the Internet).

Sometimes before the end of the expression.
It is undefined to read an object which is also modified for something else than determining the new value as it is undefined to write an object twice. And you may even get inconsistant value (i.e. reading something which isn't the old nor the new value).

Your expression has undefined behavior, see for example this about sequence points in C and C++ statements.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js