Skip to content

Commit

Permalink
Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/lin…
Browse files Browse the repository at this point in the history
…ux/kernel/git/tip/tip

Pull RCU changes from Ingo Molnar:
 "SRCU changes:

   - These include debugging aids, updates that move towards the goal of
     permitting srcu_read_lock() and srcu_read_unlock() to be used from
     idle and offline CPUs, and a few small fixes.

  Changes to rcutorture and to RCU documentation:

   - Posted to LKML at https://lkml.org/lkml/2013/1/26/188

  Enhancements to uniprocessor handling in tiny RCU:

   - Posted to LKML at https://lkml.org/lkml/2013/1/27/2

  Tag RCU callbacks with grace-period number to simplify callback
  advancement:

   - Posted to LKML at https://lkml.org/lkml/2013/1/26/203

  Miscellaneous fixes:

   - Posted to LKML at https://lkml.org/lkml/2013/1/26/204"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
  srcu: use ACCESS_ONCE() to access sp->completed in srcu_read_lock()
  srcu: Update synchronize_srcu_expedited()'s comments
  srcu: Update synchronize_srcu()'s comments
  srcu: Remove checks preventing idle CPUs from calling srcu_read_lock()
  srcu: Remove checks preventing offline CPUs from calling srcu_read_lock()
  srcu: Simple cleanup for cleanup_srcu_struct()
  srcu: Add might_sleep() annotation to synchronize_srcu()
  srcu: Simplify __srcu_read_unlock() via this_cpu_dec()
  rcu: Allow rcutorture to be built at low optimization levels
  rcu: Make rcutorture's shuffler task shuffle recently added tasks
  rcu: Allow TREE_PREEMPT_RCU on UP systems
  rcu: Provide RCU CPU stall warnings for tiny RCU
  context_tracking: Add comments on interface and internals
  rcu: Remove obsolete Kconfig option from comment
  rcu: Remove unused code originally used for context tracking
  rcu: Consolidate debugging Kconfig options
  rcu: Correct 'optimized' to 'optimize' in header comment
  rcu: Trace callback acceleration
  rcu: Tag callback lists with corresponding grace-period number
  rcutorture: Don't compare ptr with 0
  ...
  • Loading branch information
torvalds committed Feb 20, 2013
2 parents 19f949f + ac0e320 commit e84cf5d
Show file tree
Hide file tree
Showing 17 changed files with 556 additions and 229 deletions.
2 changes: 2 additions & 0 deletions Documentation/atomic_ops.txt
Original file line number Diff line number Diff line change
Expand Up @@ -253,6 +253,8 @@ This performs an atomic exchange operation on the atomic variable v, setting
the given new value. It returns the old value that the atomic variable v had
just before the operation.

atomic_xchg requires explicit memory barriers around the operation.

int atomic_cmpxchg(atomic_t *v, int old, int new);

This performs an atomic compare exchange operation on the atomic value v,
Expand Down
1 change: 1 addition & 0 deletions Documentation/memory-barriers.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1685,6 +1685,7 @@ explicit lock operations, described later). These include:

xchg();
cmpxchg();
atomic_xchg();
atomic_cmpxchg();
atomic_inc_return();
atomic_dec_return();
Expand Down
15 changes: 11 additions & 4 deletions include/linux/rcupdate.h
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,10 @@ extern int rcutorture_runnable; /* for sysctl */
extern void rcutorture_record_test_transition(void);
extern void rcutorture_record_progress(unsigned long vernum);
extern void do_trace_rcu_torture_read(char *rcutorturename,
struct rcu_head *rhp);
struct rcu_head *rhp,
unsigned long secs,
unsigned long c_old,
unsigned long c);
#else
static inline void rcutorture_record_test_transition(void)
{
Expand All @@ -63,9 +66,13 @@ static inline void rcutorture_record_progress(unsigned long vernum)
}
#ifdef CONFIG_RCU_TRACE
extern void do_trace_rcu_torture_read(char *rcutorturename,
struct rcu_head *rhp);
struct rcu_head *rhp,
unsigned long secs,
unsigned long c_old,
unsigned long c);
#else
#define do_trace_rcu_torture_read(rcutorturename, rhp) do { } while (0)
#define do_trace_rcu_torture_read(rcutorturename, rhp, secs, c_old, c) \
do { } while (0)
#endif
#endif

Expand Down Expand Up @@ -749,7 +756,7 @@ static inline void rcu_preempt_sleep_check(void)
* preemptible RCU implementations (TREE_PREEMPT_RCU and TINY_PREEMPT_RCU)
* in CONFIG_PREEMPT kernel builds, RCU read-side critical sections may
* be preempted, but explicit blocking is illegal. Finally, in preemptible
* RCU implementations in real-time (CONFIG_PREEMPT_RT) kernel builds,
* RCU implementations in real-time (with -rt patchset) kernel builds,
* RCU read-side critical sections may be preempted and they may also
* block, but only when acquiring spinlocks that are subject to priority
* inheritance.
Expand Down
26 changes: 3 additions & 23 deletions include/linux/srcu.h
Original file line number Diff line number Diff line change
Expand Up @@ -151,30 +151,14 @@ void srcu_barrier(struct srcu_struct *sp);
* Checks debug_lockdep_rcu_enabled() to prevent false positives during boot
* and while lockdep is disabled.
*
* Note that if the CPU is in the idle loop from an RCU point of view
* (ie: that we are in the section between rcu_idle_enter() and
* rcu_idle_exit()) then srcu_read_lock_held() returns false even if
* the CPU did an srcu_read_lock(). The reason for this is that RCU
* ignores CPUs that are in such a section, considering these as in
* extended quiescent state, so such a CPU is effectively never in an
* RCU read-side critical section regardless of what RCU primitives it
* invokes. This state of affairs is required --- we need to keep an
* RCU-free window in idle where the CPU may possibly enter into low
* power mode. This way we can notice an extended quiescent state to
* other CPUs that started a grace period. Otherwise we would delay any
* grace period as long as we run in the idle task.
*
* Similarly, we avoid claiming an SRCU read lock held if the current
* CPU is offline.
* Note that SRCU is based on its own statemachine and it doesn't
* relies on normal RCU, it can be called from the CPU which
* is in the idle loop from an RCU point of view or offline.
*/
static inline int srcu_read_lock_held(struct srcu_struct *sp)
{
if (!debug_lockdep_rcu_enabled())
return 1;
if (rcu_is_cpu_idle())
return 0;
if (!rcu_lockdep_current_cpu_online())
return 0;
return lock_is_held(&sp->dep_map);
}

Expand Down Expand Up @@ -236,8 +220,6 @@ static inline int srcu_read_lock(struct srcu_struct *sp) __acquires(sp)
int retval = __srcu_read_lock(sp);

rcu_lock_acquire(&(sp)->dep_map);
rcu_lockdep_assert(!rcu_is_cpu_idle(),
"srcu_read_lock() used illegally while idle");
return retval;
}

Expand All @@ -251,8 +233,6 @@ static inline int srcu_read_lock(struct srcu_struct *sp) __acquires(sp)
static inline void srcu_read_unlock(struct srcu_struct *sp, int idx)
__releases(sp)
{
rcu_lockdep_assert(!rcu_is_cpu_idle(),
"srcu_read_unlock() used illegally while idle");
rcu_lock_release(&(sp)->dep_map);
__srcu_read_unlock(sp, idx);
}
Expand Down
31 changes: 21 additions & 10 deletions include/trace/events/rcu.h
Original file line number Diff line number Diff line change
Expand Up @@ -44,8 +44,10 @@ TRACE_EVENT(rcu_utilization,
* of a new grace period or the end of an old grace period ("cpustart"
* and "cpuend", respectively), a CPU passing through a quiescent
* state ("cpuqs"), a CPU coming online or going offline ("cpuonl"
* and "cpuofl", respectively), and a CPU being kicked for being too
* long in dyntick-idle mode ("kick").
* and "cpuofl", respectively), a CPU being kicked for being too
* long in dyntick-idle mode ("kick"), a CPU accelerating its new
* callbacks to RCU_NEXT_READY_TAIL ("AccReadyCB"), and a CPU
* accelerating its new callbacks to RCU_WAIT_TAIL ("AccWaitCB").
*/
TRACE_EVENT(rcu_grace_period,

Expand Down Expand Up @@ -393,15 +395,15 @@ TRACE_EVENT(rcu_kfree_callback,
*/
TRACE_EVENT(rcu_batch_start,

TP_PROTO(char *rcuname, long qlen_lazy, long qlen, int blimit),
TP_PROTO(char *rcuname, long qlen_lazy, long qlen, long blimit),

TP_ARGS(rcuname, qlen_lazy, qlen, blimit),

TP_STRUCT__entry(
__field(char *, rcuname)
__field(long, qlen_lazy)
__field(long, qlen)
__field(int, blimit)
__field(long, blimit)
),

TP_fast_assign(
Expand All @@ -411,7 +413,7 @@ TRACE_EVENT(rcu_batch_start,
__entry->blimit = blimit;
),

TP_printk("%s CBs=%ld/%ld bl=%d",
TP_printk("%s CBs=%ld/%ld bl=%ld",
__entry->rcuname, __entry->qlen_lazy, __entry->qlen,
__entry->blimit)
);
Expand Down Expand Up @@ -523,22 +525,30 @@ TRACE_EVENT(rcu_batch_end,
*/
TRACE_EVENT(rcu_torture_read,

TP_PROTO(char *rcutorturename, struct rcu_head *rhp),
TP_PROTO(char *rcutorturename, struct rcu_head *rhp,
unsigned long secs, unsigned long c_old, unsigned long c),

TP_ARGS(rcutorturename, rhp),
TP_ARGS(rcutorturename, rhp, secs, c_old, c),

TP_STRUCT__entry(
__field(char *, rcutorturename)
__field(struct rcu_head *, rhp)
__field(unsigned long, secs)
__field(unsigned long, c_old)
__field(unsigned long, c)
),

TP_fast_assign(
__entry->rcutorturename = rcutorturename;
__entry->rhp = rhp;
__entry->secs = secs;
__entry->c_old = c_old;
__entry->c = c;
),

TP_printk("%s torture read %p",
__entry->rcutorturename, __entry->rhp)
TP_printk("%s torture read %p %luus c: %lu %lu",
__entry->rcutorturename, __entry->rhp,
__entry->secs, __entry->c_old, __entry->c)
);

/*
Expand Down Expand Up @@ -608,7 +618,8 @@ TRACE_EVENT(rcu_barrier,
#define trace_rcu_invoke_kfree_callback(rcuname, rhp, offset) do { } while (0)
#define trace_rcu_batch_end(rcuname, callbacks_invoked, cb, nr, iit, risk) \
do { } while (0)
#define trace_rcu_torture_read(rcutorturename, rhp) do { } while (0)
#define trace_rcu_torture_read(rcutorturename, rhp, secs, c_old, c) \
do { } while (0)
#define trace_rcu_barrier(name, s, cpu, cnt, done) do { } while (0)

#endif /* #else #ifdef CONFIG_RCU_TRACE */
Expand Down
12 changes: 11 additions & 1 deletion init/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -453,14 +453,16 @@ config TREE_RCU

config TREE_PREEMPT_RCU
bool "Preemptible tree-based hierarchical RCU"
depends on PREEMPT && SMP
depends on PREEMPT
help
This option selects the RCU implementation that is
designed for very large SMP systems with hundreds or
thousands of CPUs, but for which real-time response
is also required. It also scales down nicely to
smaller systems.

Select this option if you are unsure.

config TINY_RCU
bool "UP-only small-memory-footprint RCU"
depends on !PREEMPT && !SMP
Expand All @@ -486,6 +488,14 @@ config PREEMPT_RCU
This option enables preemptible-RCU code that is common between
the TREE_PREEMPT_RCU and TINY_PREEMPT_RCU implementations.

config RCU_STALL_COMMON
def_bool ( TREE_RCU || TREE_PREEMPT_RCU || RCU_TRACE )
help
This option enables RCU CPU stall code that is common between
the TINY and TREE variants of RCU. The purpose is to allow
the tiny variants to disable RCU CPU stall warnings, while
making these warnings mandatory for the tree variants.

config CONTEXT_TRACKING
bool

Expand Down
75 changes: 65 additions & 10 deletions kernel/context_tracking.c
Original file line number Diff line number Diff line change
@@ -1,3 +1,19 @@
/*
* Context tracking: Probe on high level context boundaries such as kernel
* and userspace. This includes syscalls and exceptions entry/exit.
*
* This is used by RCU to remove its dependency on the timer tick while a CPU
* runs in userspace.
*
* Started by Frederic Weisbecker:
*
* Copyright (C) 2012 Red Hat, Inc., Frederic Weisbecker <fweisbec@redhat.com>
*
* Many thanks to Gilad Ben-Yossef, Paul McKenney, Ingo Molnar, Andrew Morton,
* Steven Rostedt, Peter Zijlstra for suggestions and improvements.
*
*/

#include <linux/context_tracking.h>
#include <linux/rcupdate.h>
#include <linux/sched.h>
Expand All @@ -6,8 +22,8 @@

struct context_tracking {
/*
* When active is false, hooks are not set to
* minimize overhead: TIF flags are cleared
* When active is false, probes are unset in order
* to minimize overhead: TIF flags are cleared
* and calls to user_enter/exit are ignored. This
* may be further optimized using static keys.
*/
Expand All @@ -24,6 +40,15 @@ static DEFINE_PER_CPU(struct context_tracking, context_tracking) = {
#endif
};

/**
* user_enter - Inform the context tracking that the CPU is going to
* enter userspace mode.
*
* This function must be called right before we switch from the kernel
* to userspace, when it's guaranteed the remaining kernel instructions
* to execute won't use any RCU read side critical section because this
* function sets RCU in extended quiescent state.
*/
void user_enter(void)
{
unsigned long flags;
Expand All @@ -39,40 +64,70 @@ void user_enter(void)
if (in_interrupt())
return;

/* Kernel threads aren't supposed to go to userspace */
WARN_ON_ONCE(!current->mm);

local_irq_save(flags);
if (__this_cpu_read(context_tracking.active) &&
__this_cpu_read(context_tracking.state) != IN_USER) {
__this_cpu_write(context_tracking.state, IN_USER);
/*
* At this stage, only low level arch entry code remains and
* then we'll run in userspace. We can assume there won't be
* any RCU read-side critical section until the next call to
* user_exit() or rcu_irq_enter(). Let's remove RCU's dependency
* on the tick.
*/
rcu_user_enter();
}
local_irq_restore(flags);
}


/**
* user_exit - Inform the context tracking that the CPU is
* exiting userspace mode and entering the kernel.
*
* This function must be called after we entered the kernel from userspace
* before any use of RCU read side critical section. This potentially include
* any high level kernel code like syscalls, exceptions, signal handling, etc...
*
* This call supports re-entrancy. This way it can be called from any exception
* handler without needing to know if we came from userspace or not.
*/
void user_exit(void)
{
unsigned long flags;

/*
* Some contexts may involve an exception occuring in an irq,
* leading to that nesting:
* rcu_irq_enter() rcu_user_exit() rcu_user_exit() rcu_irq_exit()
* This would mess up the dyntick_nesting count though. And rcu_irq_*()
* helpers are enough to protect RCU uses inside the exception. So
* just return immediately if we detect we are in an IRQ.
*/
if (in_interrupt())
return;

local_irq_save(flags);
if (__this_cpu_read(context_tracking.state) == IN_USER) {
__this_cpu_write(context_tracking.state, IN_KERNEL);
/*
* We are going to run code that may use RCU. Inform
* RCU core about that (ie: we may need the tick again).
*/
rcu_user_exit();
}
local_irq_restore(flags);
}


/**
* context_tracking_task_switch - context switch the syscall callbacks
* @prev: the task that is being switched out
* @next: the task that is being switched in
*
* The context tracking uses the syscall slow path to implement its user-kernel
* boundaries probes on syscalls. This way it doesn't impact the syscall fast
* path on CPUs that don't do context tracking.
*
* But we need to clear the flag on the previous task because it may later
* migrate to some CPU that doesn't do the context tracking. As such the TIF
* flag may not be desired there.
*/
void context_tracking_task_switch(struct task_struct *prev,
struct task_struct *next)
{
Expand Down
7 changes: 7 additions & 0 deletions kernel/rcu.h
Original file line number Diff line number Diff line change
Expand Up @@ -111,4 +111,11 @@ static inline bool __rcu_reclaim(char *rn, struct rcu_head *head)

extern int rcu_expedited;

#ifdef CONFIG_RCU_STALL_COMMON

extern int rcu_cpu_stall_suppress;
int rcu_jiffies_till_stall_check(void);

#endif /* #ifdef CONFIG_RCU_STALL_COMMON */

#endif /* __LINUX_RCU_H */
Loading

0 comments on commit e84cf5d

Please sign in to comment.