What are alternatives to functional programming for handling shared mutability? - concurrency

After watching some videos on the Rust language, I'm increasingly interested in examining my coding decisions based on mitigating the complexity of shared mutable state. Functional programming/Lambda Calculus seems to be the most popular standard to overcome the problem of shared mutable state. Are there alternatives though? Is there a consensus now that functional programming is a reasonable default approach to solve the problem?

Disclaimer:
I am aware that this post might not directly answer your question.
However, many programmers still overlook they can sometimes avoid shared mutability. I want to show you how here with an example and hope, it helps you though.
TL;DR: Ask yourself whether unshared mutability or shared immutability can also be options.
What about doubting whether you really need shared mutability?
If you turn one of both terms into the opposite, then you gain two useful alternatives:
unshared mutability
shared immutability
Let's have an example in Java 8 to illustrate what I mean.
This example of shared mutability uses synchronize to avoid visibility issues and race conditions:
public class MutablePoint {
private int x, y;
void move(int dx, int dy) {
x += dx;
y += dy;
}
#Override
public String toString() {
return "MutablePoint{x=" + x + ", y=" + y + '}';
}
}
public class SharedMutability {
public static void main(String[] args) {
final MutablePoint mutablePoint = new MutablePoint();
final Thread moveRightThread = new Thread(() -> {
for (int i = 0; i < 1000; i++) {
synchronized (mutablePoint) {
mutablePoint.move(1, 0);
}
Thread.yield();
}
}, "moveRight");
final Thread moveDownThread = new Thread(() -> {
for (int i = 0; i < 1000; i++) {
synchronized (mutablePoint) {
mutablePoint.move(0, 1);
}
Thread.yield();
}
}, "moveDown");
final Thread displayThread = new Thread(() -> {
for (int i = 0; i < 1000; i++) {
synchronized (mutablePoint) {
System.out.println(mutablePoint);
}
Thread.yield();
}
}, "display");
moveRightThread.start();
moveDownThread.start();
displayThread.start();
}
}
Explanation:
We have got 3 threads. While the two threads moveRight and moveDown write on the mutable point, the one thread display reads from it. All 3 threads must synchronize on the mutable point to avoid visibility issues and race conditions.
How can you apply unshared mutability?
Unshared means "only one thread reading and writing on a mutable object".
You don't need much for that. It's quite easy. You always only access one mutable object from the same ONE thread. Therefore you don't need the keyword synchronize nor any locks nor the keyword volatile. Moreover, this one thread can be very fast without locks and broken memory barriers if it only focuses on reading and writing values in the mutable object.
However you are limited to that one thread. That's usually no problem unless you block that one thread with tasks like I/O (don't do that!). Furthermore, you must ensure that the mutable object doesn't "escape" somehow by being assigned to a variable or field outside the one thread and accessed from there.
If you apply unshared mutability to the example, it could look like that:
public class UnsharedMutability {
private static final ExecutorService accessorService = Executors.newSingleThreadExecutor(); // only ONE thread!
private static final MutablePoint mutablePoint = new MutablePoint();
public static void main(String[] args) {
final Thread moveRightThread = new Thread(() -> {
for (int i = 0; i < 1000; i++) {
accessorService.submit(() -> {
mutablePoint.move(1, 0);
});
Thread.yield();
}
}, "moveRight");
final Thread moveDownThread = new Thread(() -> {
for (int i = 0; i < 1000; i++) {
accessorService.submit(() -> {
mutablePoint.move(0, 1);
});
Thread.yield();
}
}, "moveDown");
final Thread displayThread = new Thread(() -> {
for (int i = 0; i < 1000; i++) {
accessorService.submit(() -> {
System.out.println(mutablePoint);
});
Thread.yield();
}
}, "display");
moveRightThread.start();
moveDownThread.start();
displayThread.start();
}
}
Explanation:
We have got all 3 threads again. However, all 3 threads don't need to synchronize on the mutable point because they only access the mutable point in the same one thread which runs in the single threaded ExecutorService accessorService.
How can you apply shared immutability?
Immutability means "no ability to change the state of the object after its creation". Immutable objects always have only one state. Therefore they are always threadsafe. Immutable objects can create new immutable objects when you want to change them though.
However, creating too many objects too fast can cause a high memory consumption and lead to a higher GC activity. Sometimes you can deduplicate immutable objects if you have many duplicates of them.
If you apply shared immutability to the example, it could look like that:
public class ImmutablePoint {
private final int x;
private final int y;
public ImmutablePoint(int x, int y) {
this.x = x;
this.y = y;
}
ImmutablePoint move(int dx, int dy) {
return new ImmutablePoint(x+dx, y+dy);
}
#Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
ImmutablePoint that = (ImmutablePoint) o;
return x == that.x && y == that.y;
}
#Override
public int hashCode() {
return Objects.hash(x, y);
}
#Override
public String toString() {
return "ImmutablePoint{x=" + x + ", y=" + y + '}';
}
}
public class SharedImmutability {
private static AtomicReference<ImmutablePoint> pointReference = new AtomicReference<>(new ImmutablePoint(0, 0));
public static void main(String[] args) {
final Thread moveRightThread = new Thread(() -> {
for (int i = 0; i < 1000; i++) {
pointReference.updateAndGet(point -> point.move(1, 0));
Thread.yield();
}
}, "moveRight");
final Thread moveDownThread = new Thread(() -> {
for (int i = 0; i < 1000; i++) {
pointReference.updateAndGet(point -> point.move(0, 1));
Thread.yield();
}
}, "moveDown");
final Thread displayThread = new Thread(() -> {
for (int i = 0; i < 1000; i++) {
System.out.println(pointReference.get());
Thread.yield();
}
}, "display");
moveRightThread.start();
moveDownThread.start();
displayThread.start();
}
}
Explanation:
We have got all 3 threads again. However, we use an immutable point instead of a mutable point. While the two threads moveRight and moveDown replace the older instance of the immutable point by a newer one in the atomic reference pointReference, the thread display can get the current instance from pointReference and display it (whenever this thread wants because the instance of immutable point is independent of older and newer ones).
Remark:
The calls to yield() should force thread switches because a loop with only 1000 iterations is just too small. Most CPUs execute such a loop in one time slice.

Related

physx multithreading copy transform data

I have a scene with tons of similar objects moved by physx, and i want to draw all of this using opengl instansing. So, i need to form a array with transform data of each object and pass it in to opengl shader. And, currently, filling an array is bottleneck of my app, because physx simulation using 16 thread, but creating array use just one thread.
So, i created data_transfer_task class, which contain two indexes, start and stop and move transform data of physx objects between this indexes to array.
class data_transfer_task : public physx::PxTask {
public:
int start;
int stop;
start_transfer_task* base_task;
data_transfer_task(int start, int stop, start_transfer_task* task, physx::PxTaskManager *mtm) :physx::PxTask() {
this->start = start;
this->stop = stop;
this->mTm = mtm;
base_task = task;
}
void update_transforms();
virtual const char* getName() const { return "data_transfer_task"; }
virtual void run();
};
void data_transfer_task::update_transforms() {
for (int i = start; i < stop; i++) {
auto obj = base_task->objects->at(i);
auto transform = obj->getGlobalPose();
DrawableObject* dr = (DrawableObject*)obj->userData;
auto pos = transform.p;
auto rot = transform.q;
dr->set_position(glm::vec3(pos.x, pos.y, pos.z));
dr->set_rotation(glm::quat(rot.w, rot.x, rot.y, rot.z));
}
}
void data_transfer_task::run() { update_transforms(); }
I created another class start_transfer_task, which creates and sheduled tasks according to thread count.
class start_transfer_task : public physx::PxLightCpuTask{
public:
start_transfer_task(physx::PxCpuDispatcher* disp, std::vector<physx::PxRigidDynamic*>* obj, physx::PxTaskManager* mtm) :physx::PxLightCpuTask() {
this->mTm = mtm;
this->dispatcher = disp;
this->objects = obj;
}
physx::PxCpuDispatcher* dispatcher;
std::vector<physx::PxRigidDynamic*>* objects;
void start();
virtual const char* getName() const { return "start_transfer_task"; }
virtual void run();
};
void start_transfer_task::start() {
int thread_count = dispatcher->getWorkerCount();
int obj_count = objects->size();
int batch_size = obj_count / thread_count;
int first_size = batch_size + obj_count % thread_count;
auto task = new data_transfer_task(0, first_size, this, this->mTm);
this->mTm->submitUnnamedTask(*task, physx::PxTaskType::TT_CPU);
task->removeReference();
if (batch_size > 0) {
for (int i = 1; i < thread_count; i++) {
task = new data_transfer_task(first_size + batch_size * (i - 1), first_size + batch_size * i, this, this->mTm);
this->mTm->submitUnnamedTask(*task, physx::PxTaskType::TT_CPU);
task->removeReference();
}
}
}
void data_transfer_task::run() { update_transforms(); }
I create start_transfer_task instance before call simulate, pass start_transfer_task to simulate, and i expect that start_transfer_task should run after all physx task done its own job, so write and read api calls dont owerlap, and calling fetchResults(block=true) continue execution only where all of my tasks finish copy transform data.
while (is_simulate) {
auto transfer_task = new start_transfer_task(gScene->getCpuDispatcher(), &objects, gScene->getTaskManager());
gScene->simulate(1.0f / 60.0f, transfer_task);
gScene->fetchResults(true);
//some other logic to call graphics api and sleep to sustain 30 updates per second
But i got many warnings about read and write api call owerlapping like this.
\physx\source\physx\src\NpWriteCheck.cpp (53) : invalid operation : Concurrent API write call or overlapping API read and write call detected during physx::NpScene::simulateOrCollide from thread 8492! Note that write operations to the SDK must be sequential, i.e., no overlap with other write or read calls, else the resulting behavior is undefined. Also note that API writes during a callback function are not permitted.
And, sometimes after start my app, i got a strange assert message.
physx\source\task\src\TaskManager.cpp(195) : Assertion failed: !mPendingTasks"
So, what i doing wrong ?
The concurrent API call warning is essentially telling you that you are calling for multiple thread PhysX API functions that are supposed to be single threaded.
Using the PhysX API you have to be very careful because it is not thread safe, and the thread safeness is left to the user.
Read this for more information.

In Unreal C++ why are structs added to a TArray in an async thread not removed from RAM?

I am asking this question from an Unreal Engine C++ code point of view but I am wondering if my problem is more to do with the nuances of C++'s way of operating.
I have a Unreal actor. A simple class that holds an array of my own structs and runs a timer which triggers my own function. This function passes a reference of the actors array to an asynchronous task.
This async task then goes to work, first creating a new struct, then adding two floats to its own internal TArray of floats and then adds that struct to the main actors array.
The problem:
After the async task has completed and I delete the actor from the level editor window, the system RAM is decreased as I call the Empty() function on the main actors array in the Destroyed() function but the RAM used by all of the structs (ie: The float array inside each struct) is left in memory and never cleared out.
Observations:
If I do not use an async task and run the same function inside the main thread ALL of the memory is cleared successfully.
If I do not create the struct inside the async task and instead initalize the array with a load of structs which in turn are initialized with N number of floats inside the main thread, then pass that as a reference to the async task which works on the data, then the memory is also cleared out successfully.
What I would like to happen
I would like to pass a reference of the main actors array of structs to the async task. The async task would then go to work creating the data. Once it is complete, the main actor would then have access to the data and when the actor is deleted in the level editor window, ALL of the memory would be freed.
The code:
The definition of the data struct I am using:
struct FMyDataStruct
{
TArray<float> ArrayOfFloats;
FMyDataStruct()
{
ArrayOfFloats.Empty();
ArrayOfFloats.Shrink();
}
FMyDataStruct(int32 FloatCount)
{
ArrayOfFloats.Init(0.f, FloatCount);
}
~FMyDataStruct()
{
ArrayOfFloats.Empty();
ArrayOfFloats.Shrink();
}
};
The main actors definition of the array I am using:
TArray<FMyDataStruct> MyMainArray;
The main actors custom function I am running:
//CODE 1: This part DOES empty the RAM when run (ie: Run on main thread)
/*for (int32 Index = 0; Index < 50000000; Index++)
{
FMyDataStruct MyDataStruct;
MyDataStruct.ArrayOfFloats.Add(FMath::Rand());
MyDataStruct.ArrayOfFloats.Add(FMath::Rand());
MyMainArray.Add(MyDataStruct);
}*/
//CODE 2: This does NOT empty the RAM when run. The two floats * 50,000,000 are left in system memory after the actor is deleted.
auto Result = Async(EAsyncExecution::Thread, [&]()
{
for (int32 Index = 0; Index < 50000000; Index++)
{
FMyDataStruct MyDataStruct;
MyDataStruct.ArrayOfFloats.Add(FMath::Rand());
MyDataStruct.ArrayOfFloats.Add(FMath::Rand());
MyMainArray.Add(MyDataStruct);
}
});
An example of initializing the array in the main thread, then working on it inside the async task:
//Initialize the array and its structs (plus the float array inside the struct)
MyMainArray.Init(FMyDataStruct(2), 50000000);
//TFuture/Async task
auto Result = Async(EAsyncExecution::Thread, [Self]()
{
for (int32 Index = 0; Index < 50000000; Index++)
{
Self->MyMainArray[Index].ArrayOfFloats[0] = FMath::Rand();
Self->MyMainArray[Index].ArrayOfFloats[1] = FMath::Rand();
}
//Call the main threads task completed function
AsyncTask(ENamedThreads::GameThread, [Self]()
{
if (Self != nullptr)
{
Self->MyTaskComplete();
}
});
});
Final thoughts:
Ultimately what I am asking is can anyone explain to me why from a C++ point of view the structs and their data would be removed from memory successfully when created/added from the main thread but then not removed from memory if created inside the async task/thread?
Update #1:
Here is a minimum reproducible example:
Create a new project in either Unreal Engine 4.23, 4.24 or 4.25.
Add a new C++ actor to the project and name it "MyActor".
Edit the source with the following:
MyActor.h
#pragma once
#include "CoreMinimal.h"
#include "GameFramework/Actor.h"
#include "MyActor.generated.h"
struct FMyDataStruct
{
FMyDataStruct()
{
//Default Constructor
}
FMyDataStruct(const FMyDataStruct& other)
: ArrayOfFloats(other.ArrayOfFloats)
{
//Copy constructor
}
FMyDataStruct(FMyDataStruct&& other)
{
//Move constructor
if (this != &other)
{
ArrayOfFloats = MoveTemp(other.ArrayOfFloats);
}
}
FMyDataStruct& operator=(const FMyDataStruct& other)
{
//Copy assignment operator
if (this != &other) //avoid self assignment
{
ArrayOfFloats = other.ArrayOfFloats; //UE4 TArray deep copy
}
return *this;
}
FMyDataStruct& operator=(FMyDataStruct&& other)
{
//Move assignment operator
if (this != &other) //avoid self assignment
{
ArrayOfFloats = MoveTemp(other.ArrayOfFloats);
}
return *this;
}
FMyDataStruct(int32 FloatCount)
{
//Custom constructor to initialize the float array
if (FloatCount > 0)
{
ArrayOfFloats.Init(0.f, FloatCount);
}
}
~FMyDataStruct()
{
//Destructor
ArrayOfFloats.Empty();
ArrayOfFloats.Shrink();
}
public:
TArray<float> ArrayOfFloats;
};
UCLASS()
class BASICPROJECT1_API AMyActor : public AActor
{
GENERATED_BODY()
public:
AMyActor();
protected:
virtual void Destroyed() override;
public:
bool IsEditorOnly() const override;
bool ShouldTickIfViewportsOnly() const override;
virtual void Tick(float DeltaTime) override;
void DoSomething();
void AsyncTaskComplete();
bool bShouldCount = true;
float TimeCounter = 0.f;
TArray<FMyDataStruct> MyMainArray;
};
MyActor.cpp
#include "MyActor.h"
AMyActor::AMyActor()
{
PrimaryActorTick.bCanEverTick = true;
}
void AMyActor::Tick(float DeltaTime)
{
if (!HasAnyFlags(RF_ClassDefaultObject)) //Check for not CDO. We only want to run in the instance
{
if (bShouldCount)
{
TimeCounter += DeltaTime;
if (TimeCounter >= 5.f)
{
bShouldCount = false;
DoSomething();
}
}
}
}
void AMyActor::Destroyed()
{
Super::Destroyed();
MyMainArray.Empty();
MyMainArray.Shrink();
UE_LOG(LogTemp, Warning, TEXT("Actor got Destroyed!"));
}
bool AMyActor::IsEditorOnly() const
{
return true;
}
bool AMyActor::ShouldTickIfViewportsOnly() const
{
return true;
}
void AMyActor::DoSomething()
{
//Change the code that is run:
//1 = Main thread only
//2 = Async only
//3 = Init on main thread and process in async task
//======================
int32 CODE_SAMPLE = 1;
UE_LOG(LogTemp, Warning, TEXT("Actor is running DoSomething()"));
TWeakObjectPtr<AMyActor> Self = this;
if (CODE_SAMPLE == 1)
{
//CODE 1: Run on main thread. This part DOES empty the RAM when run. BLOCKS the editor window.
//=========================================================================
MyMainArray.Empty();
MyMainArray.Shrink();
MyMainArray.Reserve(50000000);
for (int32 Index = 0; Index < 50000000; Index++)
{
FMyDataStruct MyDataStruct;
MyDataStruct.ArrayOfFloats.Reserve(2);
MyDataStruct.ArrayOfFloats.Emplace(FMath::Rand());
MyDataStruct.ArrayOfFloats.Emplace(FMath::Rand());
MyMainArray.Emplace(MyDataStruct);
}
UE_LOG(LogTemp, Warning, TEXT("Main thread array fill is complete!"));
}
else if (CODE_SAMPLE == 2)
{
//CODE 2: Run on async task. This does NOT empty the RAM when run
//(4 bytes per float * 2 floats * 50,000,000 structs = 400Mb is left in system memory after the actor is deleted)
//=========================================================================
auto Result = Async(EAsyncExecution::Thread, [Self]()
{
if (Self != nullptr)
{
Self->MyMainArray.Empty();
Self->MyMainArray.Shrink();
Self->MyMainArray.Reserve(50000000);
for (int32 Index = 0; Index < 50000000; Index++)
{
FMyDataStruct MyDataStruct;
MyDataStruct.ArrayOfFloats.Reserve(2);
MyDataStruct.ArrayOfFloats.Emplace(FMath::Rand());
MyDataStruct.ArrayOfFloats.Emplace(FMath::Rand());
Self->MyMainArray.Emplace(MyDataStruct);
}
AsyncTask(ENamedThreads::GameThread, [Self]()
{
if (Self != nullptr)
{
Self->AsyncTaskComplete();
}
});
}
});
}
else if (CODE_SAMPLE == 3)
{
//CODE 3: Initialize the array in the main thread and work on the data in the async task
//=========================================================================
MyMainArray.Init(FMyDataStruct(2), 50000000);
auto Result = Async(EAsyncExecution::Thread, [Self]()
{
if (Self != nullptr)
{
for (int32 Index = 0; Index < 50000000; Index++)
{
Self->MyMainArray[Index].ArrayOfFloats[0] = FMath::Rand();
Self->MyMainArray[Index].ArrayOfFloats[1] = FMath::Rand();
}
AsyncTask(ENamedThreads::GameThread, [Self]()
{
if (Self != nullptr)
{
Self->AsyncTaskComplete();
}
});
}
});
}
}
void AMyActor::AsyncTaskComplete()
{
UE_LOG(LogTemp, Warning, TEXT("Async task is complete!"));
}
Compile and run the project.
Drag the actor into the level editor window.
After 5 seconds the code will run and the RAM usage will increase to 1750Mb.
Select the actor in the outliner window and delete it.
The RAM usage will perform like this:
CODE 1: RAM is cleared out all the way to the starting RAM usage of 650Mb.
CODE 2: RAM is cleared down to 1000Mb and never returns to starting usage.
CODE 3: RAM is cleared out all the way to the starting RAM usage of 650Mb.
I thank you for your help.

Java 8 streams vs iterator performance

I'm comparing 2 ways to filter lists, with and without using streams. It turns out that the method without using streams is faster for a list of 10,000 items. I'm interested in understanding why is it so. Can anyone explain the results please?
public static int countLongWordsWithoutUsingStreams(
final List<String> words, final int longWordMinLength) {
words.removeIf(word -> word.length() <= longWordMinLength);
return words.size();
}
public static int countLongWordsUsingStreams(final List<String> words, final int longWordMinLength) {
return (int) words.stream().filter(w -> w.length() > longWordMinLength).count();
}
Microbenchmark using JMH:
#Benchmark
#BenchmarkMode(Throughput)
#OutputTimeUnit(MILLISECONDS)
public void benchmarkCountLongWordsWithoutUsingStreams() {
countLongWordsWithoutUsingStreams(nCopies(10000, "IAmALongWord"), 3);
}
#Benchmark
#BenchmarkMode(Throughput)
#OutputTimeUnit(MILLISECONDS)
public void benchmarkCountLongWordsUsingStreams() {
countLongWordsUsingStreams(nCopies(10000, "IAmALongWord"), 3);
}
public static void main(String[] args) throws RunnerException {
final Options opts = new OptionsBuilder()
.include(PracticeQuestionsCh8Benchmark.class.getSimpleName())
.warmupIterations(5).measurementIterations(5).forks(1).build();
new Runner(opts).run();
}
java -jar target/benchmarks.jar -wi 5 -i 5 -f 1
Benchmark Mode Cnt Score Error Units
PracticeQuestionsCh8Benchmark.benchmarkCountLongWordsUsingStreams thrpt 5 10.219 ± 0.408 ops/ms
PracticeQuestionsCh8Benchmark.benchmarkCountLongWordsWithoutUsingStreams thrpt 5 910.785 ± 21.215 ops/ms
Edit: (as someone deleted the update posted as an answer)
public class PracticeQuestionsCh8Benchmark {
private static final int NUM_WORDS = 10000;
private static final int LONG_WORD_MIN_LEN = 10;
private final List<String> words = makeUpWords();
public List<String> makeUpWords() {
List<String> words = new ArrayList<>();
final Random random = new Random();
for (int i = 0; i < NUM_WORDS; i++) {
if (random.nextBoolean()) {
/*
* Do this to avoid string interning. c.f.
* http://en.wikipedia.org/wiki/String_interning
*/
words.add(String.format("%" + LONG_WORD_MIN_LEN + "s", i));
} else {
words.add(String.valueOf(i));
}
}
return words;
}
#Benchmark
#BenchmarkMode(AverageTime)
#OutputTimeUnit(MILLISECONDS)
public int benchmarkCountLongWordsWithoutUsingStreams() {
return countLongWordsWithoutUsingStreams(words, LONG_WORD_MIN_LEN);
}
#Benchmark
#BenchmarkMode(AverageTime)
#OutputTimeUnit(MILLISECONDS)
public int benchmarkCountLongWordsUsingStreams() {
return countLongWordsUsingStreams(words, LONG_WORD_MIN_LEN);
}
}
public static int countLongWordsWithoutUsingStreams(
final List<String> words, final int longWordMinLength) {
final Predicate<String> p = s -> s.length() >= longWordMinLength;
int count = 0;
for (String aWord : words) {
if (p.test(aWord)) {
++count;
}
}
return count;
}
public static int countLongWordsUsingStreams(final List<String> words,
final int longWordMinLength) {
return (int) words.stream()
.filter(w -> w.length() >= longWordMinLength).count();
}
Whenever your benchmark says that some operation over 10000 elements takes 1ns (edit: 1µs), you probably found a case of clever JVM figuring out that your code doesn't actually do anything.
Collections.nCopies doesn't actually make a list of 10000 elements. It makes a sort of a fake list with 1 element and a count of how many times it's supposedly there. That list is also immutable, so your countLongWordsWithoutUsingStreams would throw an exception if there was something for removeIf to do.
You do not return any values from your benchmark methods, thus, JMH has no chance to escape the computed values and your benchmark suffers dead code elimination. You compute how long it takes to do nothing. See the JMH page for further guidance.
Saying this, streams can be slower in some cases: Java 8: performance of Streams vs Collections

D parallel loop

First, how D create parallel foreach (underlying logic)?
int main(string[] args)
{
int[] arr;
arr.length = 100000000;
/* Why it is working?, it's simple foreach which working with
reference to int from arr, parallel function return ParallelForeach!R
(ParallelForeach!int[]), but I don't know what it is.
Parallel function is part od phobos library, not D builtin function, then what
kind of magic is used for this? */
foreach (ref e;parallel(arr))
{
e = 100;
}
foreach (ref e;parallel(arr))
{
e *= e;
}
return 0;
}
And second, why it is slower then simple foreach?
Finally, If I create my own taskPool (and don't use global taskPool object), program never end. Why?
parallel returns a struct (of type ParallelForeach) that implements the opApply(int delegate(...)) foreach overload.
when called the struct submits a parallel function to the private submitAndExecute which submits the same task to all threads in the pool.
this then does:
scope(failure)
{
// If an exception is thrown, all threads should bail.
atomicStore(shouldContinue, false);
}
while (atomicLoad(shouldContinue))
{
immutable myUnitIndex = atomicOp!"+="(workUnitIndex, 1);
immutable start = workUnitSize * myUnitIndex;
if(start >= len)
{
atomicStore(shouldContinue, false);
break;
}
immutable end = min(len, start + workUnitSize);
foreach(i; start..end)
{
static if(withIndex)
{
if(dg(i, range[i])) foreachErr();
}
else
{
if(dg(range[i])) foreachErr();
}
}
}
where workUnitIndex and shouldContinue are shared variables and dg is the foreach delegate
The reason it is slower is simply because of the overhead required to pass the function to the threads in the pool and atomically accessing the shared variables.
the reason your custom pool doesn't shut down is likely you don't shut down the threadpool with finish

Good example of livelock?

I understand what livelock is, but I was wondering if anyone had a good code-based example of it? And by code-based, I do not mean "two people trying to get past each other in a corridor". If I read that again, I'll lose my lunch.
Here's a very simple Java example of livelock where a husband and wife are trying to eat soup, but only have one spoon between them. Each spouse is too polite, and will pass the spoon if the other has not yet eaten.
public class Livelock {
static class Spoon {
private Diner owner;
public Spoon(Diner d) { owner = d; }
public Diner getOwner() { return owner; }
public synchronized void setOwner(Diner d) { owner = d; }
public synchronized void use() {
System.out.printf("%s has eaten!", owner.name);
}
}
static class Diner {
private String name;
private boolean isHungry;
public Diner(String n) { name = n; isHungry = true; }
public String getName() { return name; }
public boolean isHungry() { return isHungry; }
public void eatWith(Spoon spoon, Diner spouse) {
while (isHungry) {
// Don't have the spoon, so wait patiently for spouse.
if (spoon.owner != this) {
try { Thread.sleep(1); }
catch(InterruptedException e) { continue; }
continue;
}
// If spouse is hungry, insist upon passing the spoon.
if (spouse.isHungry()) {
System.out.printf(
"%s: You eat first my darling %s!%n",
name, spouse.getName());
spoon.setOwner(spouse);
continue;
}
// Spouse wasn't hungry, so finally eat
spoon.use();
isHungry = false;
System.out.printf(
"%s: I am stuffed, my darling %s!%n",
name, spouse.getName());
spoon.setOwner(spouse);
}
}
}
public static void main(String[] args) {
final Diner husband = new Diner("Bob");
final Diner wife = new Diner("Alice");
final Spoon s = new Spoon(husband);
new Thread(new Runnable() {
public void run() { husband.eatWith(s, wife); }
}).start();
new Thread(new Runnable() {
public void run() { wife.eatWith(s, husband); }
}).start();
}
}
Run the program and you'll get:
Bob: You eat first my darling Alice!
Alice: You eat first my darling Bob!
Bob: You eat first my darling Alice!
Alice: You eat first my darling Bob!
Bob: You eat first my darling Alice!
Alice: You eat first my darling Bob!
...
This will go on forever if uninterrupted. This is a livelock because both Alice and Bob are repeatedly asking each other to go first in an infinite loop (hence live). In a deadlock situation, both Alice and Bob would simply be frozen waiting on each other to go first — they won't be doing anything except wait (hence dead).
Flippant comments aside, one example which is known to come up is in code which tries to detect and handle deadlock situations. If two threads detect a deadlock, and try to "step aside" for each other, without care they will end up being stuck in a loop always "stepping aside" and never managing to move forwards.
By "step aside" I mean that they would release the lock and attempt to let the other one acquire it. We might imagine the situation with two threads doing this (pseudocode):
// thread 1
getLocks12(lock1, lock2)
{
lock1.lock();
while (lock2.locked())
{
// attempt to step aside for the other thread
lock1.unlock();
wait();
lock1.lock();
}
lock2.lock();
}
// thread 2
getLocks21(lock1, lock2)
{
lock2.lock();
while (lock1.locked())
{
// attempt to step aside for the other thread
lock2.unlock();
wait();
lock2.lock();
}
lock1.lock();
}
Race conditions aside, what we have here is a situation where both threads, if they enter at the same time will end up running in the inner loop without proceeding. Obviously this is a simplified example. A naiive fix would be to put some kind of randomness in the amount of time the threads would wait.
The proper fix is to always respect the lock heirarchy. Pick an order in which you acquire the locks and stick to that. For example if both threads always acquire lock1 before lock2, then there is no possibility of deadlock.
As there is no answer marked as accepted answer, I have attempted to create live lock example;
Original program was written by me in Apr 2012 to learn various concept of multithreading. This time I have modified it to create deadlock, race condition, livelock etc.
So let's understand the problem statement first;
Cookie Maker Problem
There are some ingredient containers: ChocoPowederContainer, WheatPowderContainer. CookieMaker takes some amount of powder from ingredient containers to bake a Cookie. If a cookie maker finds a container empty it checks for another container to save time. And waits until Filler fills the required container. There is a Filler who checks container on regular interval and fills some quantity if a container needs it.
Please check the complete code on github;
Let me explain you implementation in brief.
I start Filler as daemon thread. So it'll keep filling containers on regular interval. To fill a container first it locks the container -> check if it needs some powder -> fills it -> signal all makers who are waiting for it -> unlock container.
I create CookieMaker and set that it can bake up to 8 cookies in parallel. And I start 8 threads to bake cookies.
Each maker thread creates 2 callable sub-thread to take powder from containers.
sub-thread takes a lock on a container and check if it has enough powder. If not, wait for some time. Once Filler fills the container, it takes the powder, and unlock the container.
Now it completes other activities like: making mixture and baking etc.
Let's have a look in the code:
CookieMaker.java
private Integer getMaterial(final Ingredient ingredient) throws Exception{
:
container.lock();
while (!container.getIngredient(quantity)) {
container.empty.await(1000, TimeUnit.MILLISECONDS);
//Thread.sleep(500); //For deadlock
}
container.unlock();
:
}
IngredientContainer.java
public boolean getIngredient(int n) throws Exception {
:
lock();
if (quantityHeld >= n) {
TimeUnit.SECONDS.sleep(2);
quantityHeld -= n;
unlock();
return true;
}
unlock();
return false;
}
Everything runs fine until Filler is filling the containers. But if I forget to start the filler, or filler goes on unexpected leave, sub-threads keep changing their states to allow other maker to go and check the container.
I have also create a daemon ThreadTracer which keeps watch on thread states and deadlocks. This the output from console;
2016-09-12 21:31:45.065 :: [Maker_0:WAITING, Maker_1:WAITING, Maker_2:WAITING, Maker_3:WAITING, Maker_4:WAITING, Maker_5:WAITING, Maker_6:WAITING, Maker_7:WAITING, pool-7-thread-1:TIMED_WAITING, pool-7-thread-2:TIMED_WAITING, pool-8-thread-1:TIMED_WAITING, pool-8-thread-2:TIMED_WAITING, pool-6-thread-1:TIMED_WAITING, pool-6-thread-2:TIMED_WAITING, pool-5-thread-1:TIMED_WAITING, pool-5-thread-2:TIMED_WAITING, pool-1-thread-1:TIMED_WAITING, pool-3-thread-1:TIMED_WAITING, pool-2-thread-1:TIMED_WAITING, pool-1-thread-2:TIMED_WAITING, pool-4-thread-1:TIMED_WAITING, pool-4-thread-2:RUNNABLE, pool-3-thread-2:TIMED_WAITING, pool-2-thread-2:TIMED_WAITING]
2016-09-12 21:31:45.065 :: [Maker_0:WAITING, Maker_1:WAITING, Maker_2:WAITING, Maker_3:WAITING, Maker_4:WAITING, Maker_5:WAITING, Maker_6:WAITING, Maker_7:WAITING, pool-7-thread-1:TIMED_WAITING, pool-7-thread-2:TIMED_WAITING, pool-8-thread-1:TIMED_WAITING, pool-8-thread-2:TIMED_WAITING, pool-6-thread-1:TIMED_WAITING, pool-6-thread-2:TIMED_WAITING, pool-5-thread-1:TIMED_WAITING, pool-5-thread-2:TIMED_WAITING, pool-1-thread-1:TIMED_WAITING, pool-3-thread-1:TIMED_WAITING, pool-2-thread-1:TIMED_WAITING, pool-1-thread-2:TIMED_WAITING, pool-4-thread-1:TIMED_WAITING, pool-4-thread-2:TIMED_WAITING, pool-3-thread-2:TIMED_WAITING, pool-2-thread-2:TIMED_WAITING]
WheatPowder Container has 0 only.
2016-09-12 21:31:45.082 :: [Maker_0:WAITING, Maker_1:WAITING, Maker_2:WAITING, Maker_3:WAITING, Maker_4:WAITING, Maker_5:WAITING, Maker_6:WAITING, Maker_7:WAITING, pool-7-thread-1:TIMED_WAITING, pool-7-thread-2:TIMED_WAITING, pool-8-thread-1:TIMED_WAITING, pool-8-thread-2:TIMED_WAITING, pool-6-thread-1:TIMED_WAITING, pool-6-thread-2:TIMED_WAITING, pool-5-thread-1:TIMED_WAITING, pool-5-thread-2:TIMED_WAITING, pool-1-thread-1:TIMED_WAITING, pool-3-thread-1:TIMED_WAITING, pool-2-thread-1:TIMED_WAITING, pool-1-thread-2:TIMED_WAITING, pool-4-thread-1:TIMED_WAITING, pool-4-thread-2:TIMED_WAITING, pool-3-thread-2:TIMED_WAITING, pool-2-thread-2:RUNNABLE]
2016-09-12 21:31:45.082 :: [Maker_0:WAITING, Maker_1:WAITING, Maker_2:WAITING, Maker_3:WAITING, Maker_4:WAITING, Maker_5:WAITING, Maker_6:WAITING, Maker_7:WAITING, pool-7-thread-1:TIMED_WAITING, pool-7-thread-2:TIMED_WAITING, pool-8-thread-1:TIMED_WAITING, pool-8-thread-2:TIMED_WAITING, pool-6-thread-1:TIMED_WAITING, pool-6-thread-2:TIMED_WAITING, pool-5-thread-1:TIMED_WAITING, pool-5-thread-2:TIMED_WAITING, pool-1-thread-1:TIMED_WAITING, pool-3-thread-1:TIMED_WAITING, pool-2-thread-1:TIMED_WAITING, pool-1-thread-2:TIMED_WAITING, pool-4-thread-1:TIMED_WAITING, pool-4-thread-2:TIMED_WAITING, pool-3-thread-2:TIMED_WAITING, pool-2-thread-2:TIMED_WAITING]
You'll notice that sub-threads and changing their states and waiting.
A real (albeit without exact code) example is two competing processes live locking in an attempt to correct for a SQL server deadlock, with each process using the same wait-retry algorithm for retrying. While it's the luck of timing, I have seen this happen on separate machines with similar performance characteristics in response to a message added to an EMS topic (e.g. saving an update of a single object graph more than once), and not being able to control the lock order.
A good solution in this case would be to have competing consumers (prevent duplicate processing as high up in the chain as possible by partitioning the work on unrelated objects).
A less desirable (ok, dirty-hack) solution is to break the timing bad luck (kind of force differences in processing) in advance or break it after deadlock by using different algorithms or some element of randomness. This could still have issues because its possible the lock taking order is "sticky" for each process, and this takes a certain minimum of time not accounted for in the wait-retry.
Yet another solution (at least for SQL Server) is to try a different isolation level (e.g. snapshot).
I coded up the example of 2 persons passing in a corridor. The two threads will avoid each other as soon as they realise their directions are the same.
public class LiveLock {
public static void main(String[] args) throws InterruptedException {
Object left = new Object();
Object right = new Object();
Pedestrian one = new Pedestrian(left, right, 0); //one's left is one's left
Pedestrian two = new Pedestrian(right, left, 1); //one's left is two's right, so have to swap order
one.setOther(two);
two.setOther(one);
one.start();
two.start();
}
}
class Pedestrian extends Thread {
private Object l;
private Object r;
private Pedestrian other;
private Object current;
Pedestrian (Object left, Object right, int firstDirection) {
l = left;
r = right;
if (firstDirection==0) {
current = l;
}
else {
current = r;
}
}
void setOther(Pedestrian otherP) {
other = otherP;
}
Object getDirection() {
return current;
}
Object getOppositeDirection() {
if (current.equals(l)) {
return r;
}
else {
return l;
}
}
void switchDirection() throws InterruptedException {
Thread.sleep(100);
current = getOppositeDirection();
System.out.println(Thread.currentThread().getName() + " is stepping aside.");
}
public void run() {
while (getDirection().equals(other.getDirection())) {
try {
switchDirection();
Thread.sleep(100);
} catch (InterruptedException e) {}
}
}
}
C# version of jelbourn's code:
using System;
using System.Runtime.CompilerServices;
using System.Threading;
using System.Threading.Tasks;
namespace LiveLockExample
{
static class Program
{
public static void Main(string[] args)
{
var husband = new Diner("Bob");
var wife = new Diner("Alice");
var s = new Spoon(husband);
Task.WaitAll(
Task.Run(() => husband.EatWith(s, wife)),
Task.Run(() => wife.EatWith(s, husband))
);
}
public class Spoon
{
public Spoon(Diner diner)
{
Owner = diner;
}
public Diner Owner { get; private set; }
[MethodImpl(MethodImplOptions.Synchronized)]
public void SetOwner(Diner d) { Owner = d; }
[MethodImpl(MethodImplOptions.Synchronized)]
public void Use()
{
Console.WriteLine("{0} has eaten!", Owner.Name);
}
}
public class Diner
{
public Diner(string n)
{
Name = n;
IsHungry = true;
}
public string Name { get; private set; }
private bool IsHungry { get; set; }
public void EatWith(Spoon spoon, Diner spouse)
{
while (IsHungry)
{
// Don't have the spoon, so wait patiently for spouse.
if (spoon.Owner != this)
{
try
{
Thread.Sleep(1);
}
catch (ThreadInterruptedException e)
{
}
continue;
}
// If spouse is hungry, insist upon passing the spoon.
if (spouse.IsHungry)
{
Console.WriteLine("{0}: You eat first my darling {1}!", Name, spouse.Name);
spoon.SetOwner(spouse);
continue;
}
// Spouse wasn't hungry, so finally eat
spoon.Use();
IsHungry = false;
Console.WriteLine("{0}: I am stuffed, my darling {1}!", Name, spouse.Name);
spoon.SetOwner(spouse);
}
}
}
}
}
Consider a UNIX system having 50 process slots.
Ten programs are running, each of which having to create 6 (sub)processes.
After each process has created 4 processes, the 10 original processes and the 40 new processes have exhausted the table. Each of the 10 original processes now sits in an endless loop forking and failing – which is aptly the situation of a livelock. The probability of this happening is very little but it could happen.
One example here might be using a timed tryLock to obtain more than one lock and if you can't obtain them all, back off and try again.
boolean tryLockAll(Collection<Lock> locks) {
boolean grabbedAllLocks = false;
for(int i=0; i<locks.size(); i++) {
Lock lock = locks.get(i);
if(!lock.tryLock(5, TimeUnit.SECONDS)) {
grabbedAllLocks = false;
// undo the locks I already took in reverse order
for(int j=i-1; j >= 0; j--) {
lock.unlock();
}
}
}
}
I could imagine such code would be problematic as you have lots of threads colliding and waiting to obtain a set of locks. But I'm not sure this is very compelling to me as a simple example.
Python version of jelbourn's code:
import threading
import time
lock = threading.Lock()
class Spoon:
def __init__(self, diner):
self.owner = diner
def setOwner(self, diner):
with lock:
self.owner = diner
def use(self):
with lock:
"{0} has eaten".format(self.owner)
class Diner:
def __init__(self, name):
self.name = name
self.hungry = True
def eatsWith(self, spoon, spouse):
while(self.hungry):
if self != spoon.owner:
time.sleep(1) # blocks thread, not process
continue
if spouse.hungry:
print "{0}: you eat first, {1}".format(self.name, spouse.name)
spoon.setOwner(spouse)
continue
# Spouse was not hungry, eat
spoon.use()
print "{0}: I'm stuffed, {1}".format(self.name, spouse.name)
spoon.setOwner(spouse)
def main():
husband = Diner("Bob")
wife = Diner("Alice")
spoon = Spoon(husband)
t0 = threading.Thread(target=husband.eatsWith, args=(spoon, wife))
t1 = threading.Thread(target=wife.eatsWith, args=(spoon, husband))
t0.start()
t1.start()
t0.join()
t1.join()
if __name__ == "__main__":
main()
I modify the answer of #jelbourn.
When one of them notices that the other is hungry, he(her) should release the spoon and wait another notify, so a livelock happens.
public class LiveLock {
static class Spoon {
Diner owner;
public String getOwnerName() {
return owner.getName();
}
public void setOwner(Diner diner) {
this.owner = diner;
}
public Spoon(Diner diner) {
this.owner = diner;
}
public void use() {
System.out.println(owner.getName() + " use this spoon and finish eat.");
}
}
static class Diner {
public Diner(boolean isHungry, String name) {
this.isHungry = isHungry;
this.name = name;
}
private boolean isHungry;
private String name;
public String getName() {
return name;
}
public void eatWith(Diner spouse, Spoon sharedSpoon) {
try {
synchronized (sharedSpoon) {
while (isHungry) {
while (!sharedSpoon.getOwnerName().equals(name)) {
sharedSpoon.wait();
//System.out.println("sharedSpoon belongs to" + sharedSpoon.getOwnerName())
}
if (spouse.isHungry) {
System.out.println(spouse.getName() + "is hungry,I should give it to him(her).");
sharedSpoon.setOwner(spouse);
sharedSpoon.notifyAll();
} else {
sharedSpoon.use();
sharedSpoon.setOwner(spouse);
isHungry = false;
}
Thread.sleep(500);
}
}
} catch (InterruptedException e) {
System.out.println(name + " is interrupted.");
}
}
}
public static void main(String[] args) {
final Diner husband = new Diner(true, "husband");
final Diner wife = new Diner(true, "wife");
final Spoon sharedSpoon = new Spoon(wife);
Thread h = new Thread() {
#Override
public void run() {
husband.eatWith(wife, sharedSpoon);
}
};
h.start();
Thread w = new Thread() {
#Override
public void run() {
wife.eatWith(husband, sharedSpoon);
}
};
w.start();
try {
Thread.sleep(10000);
} catch (InterruptedException e) {
e.printStackTrace();
}
h.interrupt();
w.interrupt();
try {
h.join();
w.join();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
package concurrently.deadlock;
import static java.lang.System.out;
/* This is an example of livelock */
public class Dinner {
public static void main(String[] args) {
Spoon spoon = new Spoon();
Dish dish = new Dish();
new Thread(new Husband(spoon, dish)).start();
new Thread(new Wife(spoon, dish)).start();
}
}
class Spoon {
boolean isLocked;
}
class Dish {
boolean isLocked;
}
class Husband implements Runnable {
Spoon spoon;
Dish dish;
Husband(Spoon spoon, Dish dish) {
this.spoon = spoon;
this.dish = dish;
}
#Override
public void run() {
while (true) {
synchronized (spoon) {
spoon.isLocked = true;
out.println("husband get spoon");
try { Thread.sleep(2000); } catch (InterruptedException e) {}
if (dish.isLocked == true) {
spoon.isLocked = false; // give away spoon
out.println("husband pass away spoon");
continue;
}
synchronized (dish) {
dish.isLocked = true;
out.println("Husband is eating!");
}
dish.isLocked = false;
}
spoon.isLocked = false;
}
}
}
class Wife implements Runnable {
Spoon spoon;
Dish dish;
Wife(Spoon spoon, Dish dish) {
this.spoon = spoon;
this.dish = dish;
}
#Override
public void run() {
while (true) {
synchronized (dish) {
dish.isLocked = true;
out.println("wife get dish");
try { Thread.sleep(2000); } catch (InterruptedException e) {}
if (spoon.isLocked == true) {
dish.isLocked = false; // give away dish
out.println("wife pass away dish");
continue;
}
synchronized (spoon) {
spoon.isLocked = true;
out.println("Wife is eating!");
}
spoon.isLocked = false;
}
dish.isLocked = false;
}
}
}
Example:
Thread 1
top:
lock(L1);
if (try_lock(L2) != 0) {
unlock(L1);
goto top;
Thread 2
top:
lock(L2);
if (try_lock(L1) != 0) {
unlock(L2);
goto top;
The only difference is Thread 1 and Thread 2 try to acquire the locks in a different order. Livelock could happen as follows:
Thread 1 runs acquires L1, then a context switch occurs. Thread 2 runs acquires L2, then another context switch occurs. Thread 1 runs and cannot acquire L2, but before releasing L1 a context switch occurs. Thread 2 runs and cannot acquire L1, releases L2, and a context switch occurs. Thread 1 releases L1, and now we are basically back to the starting state, and in theory these steps could keep repeating forever.