confusion with semaphore definitions - c++

For semaphore implementations, what does process specify? In the context of the producer/consumer problem, is the process the producer method/Consumer method? Or is it P() if we are in P() and the value is less than 0?
P() {
value = value –1;
If value < 0
add the calling process to this semaphore’s list;
block this process
}
EXAMPLE
If Consumer runs first before Producer produces its first item
Consumer would decrement the full value -> full = -1
and then since the value is less than 1, it would add the calling process to this semaphore’s list. But I’m not sure what process is.
And what does it mean to block this process? Does it mean that the entire method for consumer is at halt, and producer method runs?
code:
#define N 100
typedef int semaphore;
Semaphore fullBuffer = 0; // Initially, no item in buffer
Semaphore empty = N; // Initially, num empty buffer
Semaphore mutex = 1; // No thread updating the buffer
void producer(void) {
int item;
while(TRUE){
item = produce_item();
down(&empty);
down(&mutex);
insert_item(item);
up(&mutex);
up(&full);
}
}
void consumer(void) {
int item;
while(TRUE){
down(&full);
down(&mutex);
item = remove_item();
up(&mutex);
up(&empty);
consume_item(item);
}
}

A process, in this usage, is exactly like a thread. Usually when 'multiprocess' is used instead of 'multithreaded', it implies that the kernal handles the threading, which allows the computer to take advantage of multiple cores. However, that isn't important for this specific implementation, and is also false for this specific implementation, because nothing is atomic.
Blocking the process here means that a process that calls P and decrememnts the value to anything negative will halt its own execution when it reaches the 'block this process' command.
Assuming multi threading, your 'producer' command will continually decrease the empty semaphore unless it tries to decrement it below zero, in which case it will be halted, and only the 'consumer' command will run. At least, only 'consumer' will run until it increases the empty semaphore enough that 'producer' can now run. You can also switch both 'empty'<->'full' and 'producer'<->'consumer' in the previous two sentences, and they should remain correct.
Also, I suggest you read up on semaphores elsewhere, because they are a basic part of threading/multiprocessing, and other people have described them better than I ever could. (Look at the producer/consumer example there.)

Related

Bottleneck in parallel packet dispatcher

I will say in advance that huge speed is needed and calling ExecutePackets is very expensive.
Necessary that the ExecutePackets function process many packages in parallel from different threads.
struct Packet {
bool responseStatus;
char data[1024];
};
struct PacketPool {
int packet_count;
Packet* packets[10];
}packet_pool;
std::mutex queue_mtx;
std::mutex request_mtx;
bool ParallelExecutePacket(Packet* p_packet) {
p_packet->responseStatus = false;
struct QueuePacket {
bool executed;
Packet* p_packet;
}queue_packet{ false, p_packet };
static std::list<std::reference_wrapper<QueuePacket>> queue;
//make queue
queue_mtx.lock();
queue.push_back(queue_packet);
queue_mtx.unlock();
request_mtx.lock();
if (!queue_packet.executed)
{
ZeroMemory(&packet_pool, sizeof(packet_pool));
//move queue to pequest_pool and clear queue
queue_mtx.lock();
auto iter = queue.begin();
while (iter != queue.end())
if (!(*iter).get().executed)
{
int current_count = packet_pool.packet_count++;
packet_pool.packets[current_count] = (*iter).get().p_packet;
(*iter).get().executed = true;
queue.erase(iter++);
}
else ++iter;
queue_mtx.unlock();
//execute packets
ExecutePackets(&packet_pool);
}
request_mtx.unlock();
return p_packet->responseStatus;
}
The ParallelExecutePacket function can be called from multiple loops at the same time. I want packets to be processed in batches of several. More precisely, so that each thread processes the entire queue. Then the number of ExecutePackets will be reduced, while not losing the number of processed packets.
However, in my code with multiple threads, the total number of packets processed is equal to the number of packets processed by one thread. And I don't understand why this is happening.
In my test, I created several threads and in each thread called ParallelExecutePacket in a loop.
The results are the number of processed requests per second.
Multithread:
Summ:91902
Thread 0 : 20826
Thread 1 : 40031
Thread 2 : 6057
Thread 3 : 12769
Thread 4 : 12219
Singlethread:
Summ:104902
Thread 0 : 104902
And if my version is not working,how implement what i need?
queue_mtx.lock();
auto iter = queue.begin();
while (iter != queue.end())
queue.erase(iter++);
queue_mtx.unlock();
Only one execution thread locks the queue at a time, drains all messages from it, and then unlocks it. Even if a thousand execution threads are available here only one of them will be able to do any work. All others get blocked.
The length of time the queue_mtx is held must be minimized as much as possible, it should be no more than the absoulte minimum it takes to pluck one messages out of the queue, removing it completely, then unlocking the queue while all the real work is done.
int current_count = packet_pool.packet_count++;
packet_pool.packets[current_count] = (*iter).get().p_packet;
This appears to be the extent of the work that's done here. Currently the shown code enjoys the benefit of being protected by the queue_mtx. If this is no longer protected by it, any more, then thread safety must be implemented here in some other way, if that's needed (it's unclear what any of this is, and whether there's a thread synchronization issue here, at all).
You never drop request_mtx during the while loop. That while loop includes ExecutePackets, so your thread blocks all of the others until it completes executing all the tasks it finds.
Also note that you wont actually see any speed ups from this style of parallelism. To have n threads of parallelism with this code, you need to have n callers calling into ParallelExecutePacket. This is exactly the same parallelism that would happen if you just let each one work on its own. Indeed, statistically speaking you will find that almost always every thread just runs its own task. Every now and then you'll get a threading contention which causes one thread to execute another's task. When this occurs, both threads slow down to the slower of the two.

Using a mutex to block execution from outside the critical section

I'm not sure I got the terminology right but here goes - I have this function that is used by multiple threads to write data (using pseudo code in comments to illustrate what I want)
//these are initiated in the constructor
int* data;
std::atomic<size_t> size;
void write(int value) {
//wait here while "read_lock"
//set "write_lock" to "write_lock" + 1
auto slot = size.fetch_add(1, std::memory_order_acquire);
data[slot] = value;
//set "write_lock" to "write_lock" - 1
}
the order of the writes is not important, all I need here is for each write to go to a unique slot
Every once in a while though, I need one thread to read the data using this function
int* read() {
//set "read_lock" to true
//wait here while "write_lock"
int* ret = data;
data = new int[capacity];
size = 0;
//set "read_lock" to false
return ret;
}
so it basically swaps out the buffer and returns the old one (I've removed capacity logic to make the snippets shorter)
In theory this should lead to 2 operating scenarios:
1 - just a bunch of threads writing into the container
2 - when some thread executes the read function, all new writers will have to wait, the reader will wait until all existing writes are finished, it will then do the read logic and scenario 1 can continue.
The question part is that I don't know what kind of a barrier to use for the locks -
A spinlock would be wasteful since there are many containers like this and they all need cpu cycles
I don't know how to apply std::mutex since I only want the write function to be in a critical section if the read function is triggered. Wrapping the whole write function in a mutex would cause unnecessary slowdown for operating scenario 1.
So what would be the optimal solution here?
If you have C++14 capability then you can use a std::shared_timed_mutex to separate out readers and writers. In this scenario it seems you need to give your writer threads shared access (allowing other writer threads at the same time) and your reader threads unique access (kicking all other threads out).
So something like this may be what you need:
class MyClass
{
public:
using mutex_type = std::shared_timed_mutex;
using shared_lock = std::shared_lock<mutex_type>;
using unique_lock = std::unique_lock<mutex_type>;
private:
mutable mutex_type mtx;
public:
// All updater threads can operate at the same time
auto lock_for_updates() const
{
return shared_lock(mtx);
}
// Reader threads need to kick all the updater threads out
auto lock_for_reading() const
{
return unique_lock(mtx);
}
};
// many threads can call this
void do_writing_work(std::shared_ptr<MyClass> sptr)
{
auto lock = sptr->lock_for_updates();
// update the data here
}
// access the data from one thread only
void do_reading_work(std::shared_ptr<MyClass> sptr)
{
auto lock = sptr->lock_for_reading();
// read the data here
}
The shared_locks allow other threads to gain a shared_lock at the same time but prevent a unique_lock gaining simultaneous access. When a reader thread tries to gain a unique_lock all shared_locks will be vacated before the unique_lock gets exclusive control.
You can also do this with regular mutexes and condition variables rather than shared. Supposedly shared_mutex has higher overhead, so I'm not sure which will be faster. With Gallik's solution you'd presumably be paying to lock the shared mutex on every write call; I got the impression from your post that write gets called way more than read so maybe this is undesirable.
int* data; // initialized somewhere
std::atomic<size_t> size = 0;
std::atomic<bool> reading = false;
std::atomic<int> num_writers = 0;
std::mutex entering;
std::mutex leaving;
std::condition_variable cv;
void write(int x) {
++num_writers;
if (reading) {
--num_writers;
if (num_writers == 0)
{
std::lock_guard l(leaving);
cv.notify_one();
}
{ std::lock_guard l(entering); }
++num_writers;
}
auto slot = size.fetch_add(1, std::memory_order_acquire);
data[slot] = x;
--num_writers;
if (reading && num_writers == 0)
{
std::lock_guard l(leaving);
cv.notify_one();
}
}
int* read() {
int* other_data = new int[capacity];
{
std::unique_lock enter_lock(entering);
reading = true;
std::unique_lock leave_lock(leaving);
cv.wait(leave_lock, [] () { return num_writers == 0; });
swap(data, other_data);
size = 0;
reading = false;
}
return other_data;
}
It's a bit complicated and took me some time to work out, but I think this should serve the purpose pretty well.
In the common case where only writing is happening, reading is always false. So you do the usual, and pay for two additional atomic increments and two untaken branches. So the common path does not need to lock any mutexes, unlike the solution involving a shared mutex, this is supposedly expensive: http://permalink.gmane.org/gmane.comp.lib.boost.devel/211180.
Now, suppose read is called. The expensive, slow heap allocation happens first, meanwhile writing continues uninterrupted. Next, the entering lock is acquired, which has no immediate effect. Now, reading is set to true. Immediately, any new calls to write enter the first branch, and eventually hit the entering lock which they are unable to acquire (as its already taken), and those threads then get put to sleep.
Meanwhile, the read thread is now waiting on the condition that the number of writers is 0. If we're lucky, this could actually go through right away. If however there are threads in write in either of the two locations between incrementing and decrementing num_writers, then it will not. Each time a write thread decrements num_writers, it checks if it has reduced that number to zero, and when it does it will signal the condition variable. Because num_writers is atomic which prevents various reordering shenanigans, it is guaranteed that the last thread will see num_writers == 0; it could also be notified more than once but this is ok and cannot result in bad behavior.
Once that condition variable has been signalled, that shows that all writers are either trapped in the first branch or are done modifying the array. So the read thread can now safely swap the data, and then unlock everything, and then return what it needs to.
As mentioned before, in typical operation there are no locks, just increments and untaken branches. Even when a read does occur, the read thread will have one lock and one condition variable wait, whereas a typical write thread will have about one lock/unlock of a mutex and that's all (one, or a small number of write threads, will also perform a condition variable notification).

Atomic thread counter

I'm experimenting with the C++11 atomic primitives to implement an atomic "thread counter" of sorts. Basically, I have a single critical section of code. Within this code block, any thread is free to READ from memory. However, sometimes, I want to do a reset or clear operation, which resets all shared memory to a default initialized value.
This seems like a great opportunity to use a read-write lock. C++11 doesn't include read-write mutexes out of the box, but maybe something simpler will do. I thought this problem would be a great opportunity to become more familiar with C++11 atomic primitives.
So I thought through this problem for a while, and it seems to me that all I have to do is :
Whenever a thread enters the critical section, increment an
atomic counter variable
Whenever a thread leaves the critical section, decrement the
atomic counter variable
If a thread wishes to reset all
variables to default values, it must atomically wait for the counter
to be 0, then atomically set it to some special "clearing flag"
value, perform the clear, then reset the counter to 0.
Of course,
threads wishing to increment and decrement the counter must also check for the
clearing flag.
So, the algorithm I just described can be implemented with three functions. The first function, increment_thread_counter() must ALWAYS be called before entering the critical section. The second function, decrement_thread_counter(), must ALWAYS be called right before leaving the critical section. Finally, the function clear() can be called from outside the critical section only iff the thread counter == 0.
This is what I came up with:
Given:
A thread counter variable, std::atomic<std::size_t> thread_counter
A constant clearing_flag set to std::numeric_limits<std::size_t>::max()
...
void increment_thread_counter()
{
std::size_t expected = 0;
while (!std::atomic_compare_exchange_strong(&thread_counter, &expected, 1))
{
if (expected != clearing_flag)
{
thread_counter.fetch_add(1);
break;
}
expected = 0;
}
}
void decrement_thread_counter()
{
thread_counter.fetch_sub(1);
}
void clear()
{
std::size_t expected = 0;
while (!thread_counter.compare_exchange_strong(expected, clearing_flag)) expected = 0;
/* PERFORM WRITES WHICH WRITE TO ALL SHARED VARIABLES */
thread_counter.store(0);
}
As far as I can reason, this should be thread-safe. Note that the decrement_thread_counter function shouldn't require ANY synchronization logic, because it is a given that increment() is always called before decrement(). So, when we get to decrement(), thread_counter can never equal 0 or clearing_flag.
Regardless, since THREADING IS HARD™, and I'm not an expert at lockless algorithms, I'm not entirely sure this algorithm is race-condition free.
Question: Is this code thread safe? Are any race conditions possible here?
You have a race condition; bad things happen if another thread changes the counter between increment_thread_counter()'s test for clearing_flag and the fetch_add.
I think this classic CAS loop should work better:
void increment_thread_counter()
{
std::size_t expected = 0;
std::size_t updated;
do {
if (expected == clearing_flag) { // don't want to succeed while clearing,
expected = 0; //take a chance that clearing completes before CMPEXC
}
updated = expected + 1;
// if (updated == clearing_flag) TOO MANY READERS!
} while (!std::atomic_compare_exchange_weak(&thread_counter, &expected, updated));
}

How to make thread synchronization without using mutex, semorphore, spinLock and futex?

This is an interview question, the interview has been done.
How to make thread synchronization without using mutex, semorphore, spinLock and futex ?
Given 5 threads, how to make 4 of them wait for a signal from the left thread at the same point ?
it means that when all threads (1,2,3,4) execute at a point in their thread function, they stop and wait for
signal from thread 5 send a signal otherwise they will not proceed.
My idea:
Use global bool variable as a flag, if thread 5 does not set it true, all other threads wait at one point and also set their
flag variable true. After the thread 5 find all threads' flag variables are true, it will set it flag var true.
It is a busy-wait.
Any better ideas ?
Thanks
the pseudo code:
bool globalflag = false;
bool a[10] = {false} ;
int main()
{
for (int i = 0 ; i < 10; i++)
pthread_create( threadfunc, i ) ;
while(1)
{
bool b = true;
for (int i = 0 ; i < 10 ; i++)
{
b = a[i] & b ;
}
if (b) break;
}
}
void threadfunc(i)
{
a[i] = true;
while(!globalflag);
}
Start with an empty linked list of waiting threads. The head should be set to 0.
Use CAS, compare and swap, to insert a thread at the head of the list of waiters. If the head =-1, then do not insert or wait. You can safely use CAS to insert items at the head of a linked list if you do it right.
After being inserted, the waiting thread should wait on SIGUSR1. Use sigwait() to do this.
When ready, the signaling thread uses CAS to set the head of wait list to -1. This prevents any more threads from adding themselves to the wait list. Then the signaling thread iterates the threads in the wait list and calls pthread_kill(&thread, SIGUSR1) to wake up each waiting thread.
If SIGUSR1 is sent before a call to sigwait, sigwait will return immediately. Thus, there will not be a race between adding a thread to the wait list and calling sigwait.
EDIT:
Why is CAS faster than a mutex? Laymen's answer (I'm a layman). Its faster for some things in some situations, because it has lower overhead when there is NO race. So if you can reduce your concurrent problem down to needing to change 8-16-32-64-128 bits of contiguous memory, and a race is not going to happen very often, CAS wins. CAS is basically a slightly more fancy/expensive mov instruction right where you were going to do a regular "mov" anyway. Its a "lock exchng" or something like that.
A mutex on the other hand is a whole bunch of extra stuff, that gets other cache lines dirty and uses more memory barriers, etc. Although CAS acts as a memory barrier on the x86, x64, etc. Then of course you have to unlock the mutex which is probably about the same amount of extra stuff.
Here is how you add an item to a linked list using CAS:
while (1)
{
pOldHead = pHead; <-- snapshot of the world. Start of the race.
pItem->pNext = pHead;
if (CAS(&pHead, pOldHead, pItem)) <-- end of the race if phead still is pOldHead
break; // success
}
So how often do you think your code is going to have multiple threads at that CAS line at the exact same time? In reality....not very often. We did tests that just looped adding millions of items with multiple threads at the same time and it happens way less than 1% of the time. In a real program, it might never happen.
Obviously if there is a race you have to go back and do that loop again, but in the case of a linked list, what does that cost you?
The downside is that you can't do very complex things to that linked list if you are going to use that method to add items to the head. Try implementing a double linked list. What a pain.
EDIT:
In the code above I use a macro CAS. If you are using linux, CAS = macro using __sync_bool_compare_and_swap. See gcc atomic builtins. If you are using windows, CAS = macro using something like InterlockedCompareExchange. Here is what an inline function in windows might look like:
inline bool CAS(volatile WORD* p, const WORD nOld, const WORD nNew) {
return InterlockedCompareExchange16((short*)p, nNew, nOld) == nOld;
}
inline bool CAS(volatile DWORD* p, const DWORD nOld, const DWORD nNew) {
return InterlockedCompareExchange((long*)p, nNew, nOld) == nOld;
}
inline bool CAS(volatile QWORD* p, const QWORD nOld, const QWORD nNew) {
return InterlockedCompareExchange64((LONGLONG*)p, nNew, nOld) == nOld;
}
inline bool CAS(void*volatile* p, const void* pOld, const void* pNew) {
return InterlockedCompareExchangePointer(p, (PVOID)pNew, (PVOID)pOld) == pOld;
}
Choose a signal to use, say SIGUSR1.
Use pthread_sigmask to block SIGUSR1.
Create the threads (they inherit the signal mask, hence 1 must be done first!)
Threads 1-4 call sigwait, blocking until SIGUSR1 is received.
Thread 5 calls kill() or pthread_kill 4 times with SIGUSR1. Since POSIX specifies that signals will be delivered to a thread which is not blocking the signal, it will be delivered to one of the threads waiting in sigwait(). There is thus no need to keep track of which threads have already received the signal and which haven't, with associated synchronization.
You can do this using SSE3's MONITOR and MWAIT instructions, available via the _mm_mwait and _mm_monitor intrinsics, Intel has an article on it here.
(there is also a patent for using memory-monitor-wait for lock contention here that may be of interest).
I think you are looking the Peterson's algorithm or Dekker's algorithm
They synced threads only based on shared memory

C++ multithreading, simple consumer / producer threads, LIFO, notification, counter

I am new to multi-thread programming, I want to implement the following functionality.
There are 2 threads, producer and consumer.
Consumer only processes the latest value, i.e., last in first out (LIFO).
Producer sometimes generates new value at a faster rate than consumer can
process. For example, producer may generate 2 new value in 1
milli-second, but it approximately takes consumer 5 milli-seconds to process.
If consumer receives a new value in the middle of processing an old
value, there is no need to interrupt. In other words, consumer will finish current
execution first, then start an execution on the latest value.
Here is my design process, please correct me if I am wrong.
There is no need for a queue, since only the latest value is
processed by consumer.
Is notification sent from producer being queued automatically???
I will use a counter instead.
ConsumerThread() check the counter at the end, to make sure producer
doesn't generate new value.
But what happen if producer generates a new value just before consumer
goes to sleep(), but after check the counter???
Here is some pseudo code.
boost::mutex mutex;
double x;
void ProducerThread()
{
{
boost::scoped_lock lock(mutex);
x = rand();
counter++;
}
notify(); // wake up consumer thread
}
void ConsumerThread()
{
counter = 0; // reset counter, only process the latest value
... do something which takes 5 milli-seconds ...
if (counter > 0)
{
... execute this function again, not too sure how to implement this ...
}
else
{
... what happen if producer generates a new value here??? ...
sleep();
}
}
Thanks.
If I understood your question correctly, for your particular application, the consumer only needs to process the latest available value provided by the producer. In other words, it's acceptable for values to get dropped because the consumer cannot keep up with the producer.
If that's the case, then I agree that you can get away without a queue and use a counter. However, the shared counter and value variables will be need to be accessed atomically.
You can use boost::condition_variable to signal notifications to the consumer that a new value is ready. Here is a complete example; I'll let the comments do the explaining.
#include <boost/thread/thread.hpp>
#include <boost/thread/mutex.hpp>
#include <boost/thread/condition_variable.hpp>
#include <boost/thread/locks.hpp>
#include <boost/date_time/posix_time/posix_time_types.hpp>
boost::mutex mutex;
boost::condition_variable condvar;
typedef boost::unique_lock<boost::mutex> LockType;
// Variables that are shared between producer and consumer.
double value = 0;
int count = 0;
void producer()
{
while (true)
{
{
// value and counter must both be updated atomically
// using a mutex lock
LockType lock(mutex);
value = std::rand();
++count;
// Notify the consumer that a new value is ready.
condvar.notify_one();
}
// Simulate exaggerated 2ms delay
boost::this_thread::sleep(boost::posix_time::milliseconds(200));
}
}
void consumer()
{
// Local copies of 'count' and 'value' variables. We want to do the
// work using local copies so that they don't get clobbered by
// the producer when it updates.
int currentCount = 0;
double currentValue = 0;
while (true)
{
{
// Acquire the mutex before accessing 'count' and 'value' variables.
LockType lock(mutex); // mutex is locked while in this scope
while (count == currentCount)
{
// Wait for producer to signal that there is a new value.
// While we are waiting, Boost releases the mutex so that
// other threads may acquire it.
condvar.wait(lock);
}
// `lock` is automatically re-acquired when we come out of
// condvar.wait(lock). So it's safe to access the 'value'
// variable at this point.
currentValue = value; // Grab a copy of the latest value
// while we hold the lock.
}
// Now that we are out of the mutex lock scope, we work with our
// local copy of `value`. The producer can keep on clobbering the
// 'value' variable all it wants, but it won't affect us here
// because we are now using `currentValue`.
std::cout << "value = " << currentValue << "\n";
// Simulate exaggerated 5ms delay
boost::this_thread::sleep(boost::posix_time::milliseconds(500));
}
}
int main()
{
boost::thread c(&consumer);
boost::thread p(&producer);
c.join();
p.join();
}
ADDENDUM
I was thinking about this question recently, and realized that this solution, while it may work, is not optimal. Your producer is using all that CPU just to throw away half of the computed values.
I suggest that you reconsider your design and go with a bounded blocking queue between the producer and consumer. Such a queue should have the following characteristics:
Thread-safe
The queue has a fixed size (bounded)
If the consumer wants to pop the next item, but the queue is empty, the operation will be blocked until notified by the producer that an item is available.
The producer can check if there's room to push another item and block until the space becomes available.
With this type of queue, you can effectively throttle down the producer so that it doesn't outpace the consumer. It also ensures that the producer doesn't waste CPU resources computing values that will be thrown away.
Libraries such as TBB and PPL provide implementations of concurrent queues. If you want to attempt to roll your own using std::queue (or boost::circular_buffer) and boost::condition_variable, check out this blogger's example.
The short answer is that you're almost certainly wrong.
With a producer/consumer, you pretty much need a queue between the two threads. There are basically two alternatives: either your code won't will simply lose tasks (which usually equals not working at all) or else your producer thread will need to block for the consumer thread to be idle before it can produce an item -- which effectively translates to single threading.
For the moment, I'm going to assume that the value you get back from rand is supposed to represent the task to be executed (i.e., is the value produced by the producer and consumed by the consumer). In that case, I'd write the code something like this:
void producer() {
for (int i=0; i<100; i++)
queue.insert(random()); // queue.insert blocks if queue is full
queue.insert(-1.0); // Tell consumer to exit
}
void consumer() {
double value;
while ((value = queue.get()) != -1) // queue.get blocks if queue is empty
process(value);
}
This, relegates nearly all the interlocking to the queue. The rest of the code for both threads pretty much ignores threading issues entirely.
Implementing a pipeline is actually quite tricky if you are doing it ground-up. For example, you'd have to use condition variable to avoid the kind of race condition you described in your question, avoid busy waiting when implementing the mechanism for "waking up" the consumer etc... Even using a "queue" of just 1 element won't save you from some of these complexities.
It's usually much better to use specialized libraries that were developed and extensively tested specifically for this purpose. If you can live with Visual C++ specific solution, take a look at Parallel Patterns Library, and the concept of Pipelines.