26 Concurrency: An Introduction

Each thread is very much like a separate process, except for one difference: they share the same address space and thus can access the same data.

State of a single thread:

  • program counter
  • private set of registers (context switch is needed)

which are stored in process control block.

Any stack-allocated variables, parameters, return values, and other things that we put on the stack will be placed in what is sometimes called thread-local storage.

26.1 Why Use Threads?

  • parallelism: speed up using multi-core processors
  • avoid blocking during I/O

Computers are hard enough to understand without concurrency. Unfortunately, with concurrency, it simply gets worse. Much worse.

26.4 The Heart Of The Problem: Uncontrolled Scheduling

Race condition: the results depend on the timing execution of the code.

Critical section: a piece of code that accesses a shared variable (or more generally, a shared resource) and must not be concurrently executed by more than one thread.

Mutal exclusion: if one thread is executing within the critical section, the others will be prevented from doing so.

27 Interlude: Thread API

27.4 Condition Variables

int pthread_cond_wait(pthread_cond_t *cond, pthread_mutex_t *mutex);
int pthread_cond_signal(pthread_cond_t *cond);

To wait for a signal:

pthread_mutex_lock(&lock);
while (ready == 0)
    pthread_cond_wait(&cond, &lock); // will release the lock when sleep
                                     // and re-acquire before wakeup
pthread_mutex_unlock(&lock);

To send a signal:

pthread_mutex_lock(&lock);
ready = 1;
pthread_cond_signal(&cond);
pthread_mutex_unlock(&lock);

28 Locks

28.4 Evaluating Locks

Criteria:

  • mutal exclusion
  • fairness
  • performance

28.6 A Failed Attempt: Just Using Loads/Stores

Problems:

  • No mutal exclusion
  • Poor performance: spin-waiting

The problem: set after test (maybe after an interrupt).

28.7 Building Working Spin Locks with Test-And-Set

Test-and-set instruction (xchg on x86) is something like:

int testAndSet(int *old_ptr, int new) {
    int old = *old_ptr; // fetch old value at old_ptr
    *old_ptr = new;     // store ’new’ into old_ptr
    return old;         // return the old value
}

Lock and unlock:

void lock(lock_t *lock) {
    while (testAndSet(&lock->flag, 1) == 1)
        ; // spin-wait (do nothing)
}

void unlock(lock_t *lock) {
    lock->flag = 0;
}

28.8 Evaluating Spin Locks

  • correctness: yes
  • fairness: no, maybe spinning forever (starvation)
  • performance: bad in single CPU case

28.9 Compare-And-Swap

int compareAndSwap(int *ptr, int expected, int new) {
    int original = *ptr;
    if (original == expected)
        *ptr = new;
    return original;

28.10 Load-Linked and Store-Conditional

Store-conditional: only succeeds (and updates the value stored at the address just load-linked from) if no intervening store to the address has taken place.

void lock(lock_t *lock) {
    while (loadLinked(&lock->flag) == 1 ||
           storeConditional(&lock->flag, 1) == 0)
        ;
}

28.11 Fetch-And-Add

The pseudo C code:

int fetchAndAdd(int *ptr) {
    int old = *ptr;
    *ptr = old + 1;
    return old;
}

Lock and unlock:

void lock(lock_t *lock) {
    int myTurn = fetchAndAdd(&lock->ticket);
    while (lock->turn != myTurn)
        ;
}

void unlock(lock_t *lock) {
    lock->turn = lock->turn + 1;
}

This ensures progress for all threads.

28.13 A Simple Approach: Just Yield, Baby

Yield is simply a system call that moves the caller from the running state to the ready state, and thus promotes another thread to running. Thus, the yielding thread essentially deschedules itself.

  • Still costly: context switches.
  • Still unfair: endless yield loop.

28.14 Using Queues: Sleeping Instead Of Spinning

The real problem with our previous approaches is that they leave too much to chance.

Solaris provides two calls: park() to put a calling thread to sleep, and unpark(threadID) to wake a particular thread as designated by threadID.

typedef struct __lock_t {
    int flag;
    int guard;
    queue_t *q;
} lock_t;

void lock_init(lock_t *m) {
    m->flag  = 0;
    m->guard = 0;
    queue_init(m->q);
}

void lock(lock_t *m) {
    while (testAndSet(&m->guard, 1) == 1)
        ; // acquire guard lock by spinning
    if (m->flag == 0) {
        m->flag = 1; // lock is acquired
        m->guard = 0;
    } else {
        queue_add(m->q, gettid());
        setpark();
        m->guard = 0;
        park();
    }
}

void unlock(lock_t *m) {
    while (testAndSet(&m->guard, 1) == 1)
        ; // acquire guard lock by spinning
    if (queue_empty(m->q))
        m->flag = 0; // let go of lock; no one wants it
    else
        unpark(queue_remove(m->q)); // hold lock
                                    // (for next thread!)
    m->guard = 0;
}

29 Lock-based Concurrent Data Structures

29.1 Concurrent Counters

Approximate counting: record counter values locally, and periodically transfer them to the global counter.

29.2 Concurrent Linked Lists

A hand-over-hand linked list may not be faster than the simple single lock approach, as the overhead of acquiring and releasing every node along the traversal path is high.

29.3 Concurrent Queues

29.4 Concurrent Hash Table

30 Condition Variables

30.1 Definition and Routines

A condition variable is an explicit queue that threads can put themselves on when some state of execution (i.e., some condition) is not as desired (by waiting on the condition); some other thread, when it changes said state, can then wake one (or more) of those waiting threads and thus allow them to continue (by signaling on the condition).

int pthread_cond_wait(pthread_cond_t *cond, pthread_mutex_t *mutex);
int pthread_cond_signal(pthread_cond_t *cond);

wait() unlocks a mutex (assumed to be locked) and put the calling thread to sleep; it re-acquires the lock before returning.

int done  = 0;
pthread_mutex_t m = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t c  = PTHREAD_COND_INITIALIZER;

void thr_exit() {
    pthread_mutex_lock(&m);
    done = 1;
    pthread_cond_signal(&c);
    pthread_mutex_unlock(&m);
}

void thr_join() {
    pthread_mutex_lock(&m);
    while (done == 0)
        pthread_cond_wait(&c, &m);
    pthread_mutex_unlock(&m);
}

Some questions:

Why is the done state needed?

If the child runs immediately and calls thr_exit() immediately, then although the child will signal, there is no thread asleep. When the parent calls wait(), it will be stuck.

Why is the mutex needed?

The child might change the state before the parent goes to sleep, and the parent will sleep forever. If there is a mutex, the child can only alter the state after the parent goes to sleep (and thus releasing the mutex).

Always hold the lock while signaling!

30.2 The Producer/Consumer (Bounded Buffer) Problem

Thanks to Mesa semantics, a simple rule to remember with condition variables is to always use while loops.

30.3 Covering Conditions

pthread_cond_broadcast(): waking up all waiting threads; downside: negative performace impact.