Back to home page

OSCL-LXR

 
 

    


0001 ===================================================
0002 Adding reference counters (krefs) to kernel objects
0003 ===================================================
0004 
0005 :Author: Corey Minyard <minyard@acm.org>
0006 :Author: Thomas Hellstrom <thellstrom@vmware.com>
0007 
0008 A lot of this was lifted from Greg Kroah-Hartman's 2004 OLS paper and
0009 presentation on krefs, which can be found at:
0010 
0011   - http://www.kroah.com/linux/talks/ols_2004_kref_paper/Reprint-Kroah-Hartman-OLS2004.pdf
0012   - http://www.kroah.com/linux/talks/ols_2004_kref_talk/
0013 
0014 Introduction
0015 ============
0016 
0017 krefs allow you to add reference counters to your objects.  If you
0018 have objects that are used in multiple places and passed around, and
0019 you don't have refcounts, your code is almost certainly broken.  If
0020 you want refcounts, krefs are the way to go.
0021 
0022 To use a kref, add one to your data structures like::
0023 
0024     struct my_data
0025     {
0026         .
0027         .
0028         struct kref refcount;
0029         .
0030         .
0031     };
0032 
0033 The kref can occur anywhere within the data structure.
0034 
0035 Initialization
0036 ==============
0037 
0038 You must initialize the kref after you allocate it.  To do this, call
0039 kref_init as so::
0040 
0041      struct my_data *data;
0042 
0043      data = kmalloc(sizeof(*data), GFP_KERNEL);
0044      if (!data)
0045             return -ENOMEM;
0046      kref_init(&data->refcount);
0047 
0048 This sets the refcount in the kref to 1.
0049 
0050 Kref rules
0051 ==========
0052 
0053 Once you have an initialized kref, you must follow the following
0054 rules:
0055 
0056 1) If you make a non-temporary copy of a pointer, especially if
0057    it can be passed to another thread of execution, you must
0058    increment the refcount with kref_get() before passing it off::
0059 
0060        kref_get(&data->refcount);
0061 
0062    If you already have a valid pointer to a kref-ed structure (the
0063    refcount cannot go to zero) you may do this without a lock.
0064 
0065 2) When you are done with a pointer, you must call kref_put()::
0066 
0067        kref_put(&data->refcount, data_release);
0068 
0069    If this is the last reference to the pointer, the release
0070    routine will be called.  If the code never tries to get
0071    a valid pointer to a kref-ed structure without already
0072    holding a valid pointer, it is safe to do this without
0073    a lock.
0074 
0075 3) If the code attempts to gain a reference to a kref-ed structure
0076    without already holding a valid pointer, it must serialize access
0077    where a kref_put() cannot occur during the kref_get(), and the
0078    structure must remain valid during the kref_get().
0079 
0080 For example, if you allocate some data and then pass it to another
0081 thread to process::
0082 
0083     void data_release(struct kref *ref)
0084     {
0085         struct my_data *data = container_of(ref, struct my_data, refcount);
0086         kfree(data);
0087     }
0088 
0089     void more_data_handling(void *cb_data)
0090     {
0091         struct my_data *data = cb_data;
0092         .
0093         . do stuff with data here
0094         .
0095         kref_put(&data->refcount, data_release);
0096     }
0097 
0098     int my_data_handler(void)
0099     {
0100         int rv = 0;
0101         struct my_data *data;
0102         struct task_struct *task;
0103         data = kmalloc(sizeof(*data), GFP_KERNEL);
0104         if (!data)
0105                 return -ENOMEM;
0106         kref_init(&data->refcount);
0107 
0108         kref_get(&data->refcount);
0109         task = kthread_run(more_data_handling, data, "more_data_handling");
0110         if (task == ERR_PTR(-ENOMEM)) {
0111                 rv = -ENOMEM;
0112                 kref_put(&data->refcount, data_release);
0113                 goto out;
0114         }
0115 
0116         .
0117         . do stuff with data here
0118         .
0119     out:
0120         kref_put(&data->refcount, data_release);
0121         return rv;
0122     }
0123 
0124 This way, it doesn't matter what order the two threads handle the
0125 data, the kref_put() handles knowing when the data is not referenced
0126 any more and releasing it.  The kref_get() does not require a lock,
0127 since we already have a valid pointer that we own a refcount for.  The
0128 put needs no lock because nothing tries to get the data without
0129 already holding a pointer.
0130 
0131 In the above example, kref_put() will be called 2 times in both success
0132 and error paths. This is necessary because the reference count got
0133 incremented 2 times by kref_init() and kref_get().
0134 
0135 Note that the "before" in rule 1 is very important.  You should never
0136 do something like::
0137 
0138         task = kthread_run(more_data_handling, data, "more_data_handling");
0139         if (task == ERR_PTR(-ENOMEM)) {
0140                 rv = -ENOMEM;
0141                 goto out;
0142         } else
0143                 /* BAD BAD BAD - get is after the handoff */
0144                 kref_get(&data->refcount);
0145 
0146 Don't assume you know what you are doing and use the above construct.
0147 First of all, you may not know what you are doing.  Second, you may
0148 know what you are doing (there are some situations where locking is
0149 involved where the above may be legal) but someone else who doesn't
0150 know what they are doing may change the code or copy the code.  It's
0151 bad style.  Don't do it.
0152 
0153 There are some situations where you can optimize the gets and puts.
0154 For instance, if you are done with an object and enqueuing it for
0155 something else or passing it off to something else, there is no reason
0156 to do a get then a put::
0157 
0158         /* Silly extra get and put */
0159         kref_get(&obj->ref);
0160         enqueue(obj);
0161         kref_put(&obj->ref, obj_cleanup);
0162 
0163 Just do the enqueue.  A comment about this is always welcome::
0164 
0165         enqueue(obj);
0166         /* We are done with obj, so we pass our refcount off
0167            to the queue.  DON'T TOUCH obj AFTER HERE! */
0168 
0169 The last rule (rule 3) is the nastiest one to handle.  Say, for
0170 instance, you have a list of items that are each kref-ed, and you wish
0171 to get the first one.  You can't just pull the first item off the list
0172 and kref_get() it.  That violates rule 3 because you are not already
0173 holding a valid pointer.  You must add a mutex (or some other lock).
0174 For instance::
0175 
0176         static DEFINE_MUTEX(mutex);
0177         static LIST_HEAD(q);
0178         struct my_data
0179         {
0180                 struct kref      refcount;
0181                 struct list_head link;
0182         };
0183 
0184         static struct my_data *get_entry()
0185         {
0186                 struct my_data *entry = NULL;
0187                 mutex_lock(&mutex);
0188                 if (!list_empty(&q)) {
0189                         entry = container_of(q.next, struct my_data, link);
0190                         kref_get(&entry->refcount);
0191                 }
0192                 mutex_unlock(&mutex);
0193                 return entry;
0194         }
0195 
0196         static void release_entry(struct kref *ref)
0197         {
0198                 struct my_data *entry = container_of(ref, struct my_data, refcount);
0199 
0200                 list_del(&entry->link);
0201                 kfree(entry);
0202         }
0203 
0204         static void put_entry(struct my_data *entry)
0205         {
0206                 mutex_lock(&mutex);
0207                 kref_put(&entry->refcount, release_entry);
0208                 mutex_unlock(&mutex);
0209         }
0210 
0211 The kref_put() return value is useful if you do not want to hold the
0212 lock during the whole release operation.  Say you didn't want to call
0213 kfree() with the lock held in the example above (since it is kind of
0214 pointless to do so).  You could use kref_put() as follows::
0215 
0216         static void release_entry(struct kref *ref)
0217         {
0218                 /* All work is done after the return from kref_put(). */
0219         }
0220 
0221         static void put_entry(struct my_data *entry)
0222         {
0223                 mutex_lock(&mutex);
0224                 if (kref_put(&entry->refcount, release_entry)) {
0225                         list_del(&entry->link);
0226                         mutex_unlock(&mutex);
0227                         kfree(entry);
0228                 } else
0229                         mutex_unlock(&mutex);
0230         }
0231 
0232 This is really more useful if you have to call other routines as part
0233 of the free operations that could take a long time or might claim the
0234 same lock.  Note that doing everything in the release routine is still
0235 preferred as it is a little neater.
0236 
0237 The above example could also be optimized using kref_get_unless_zero() in
0238 the following way::
0239 
0240         static struct my_data *get_entry()
0241         {
0242                 struct my_data *entry = NULL;
0243                 mutex_lock(&mutex);
0244                 if (!list_empty(&q)) {
0245                         entry = container_of(q.next, struct my_data, link);
0246                         if (!kref_get_unless_zero(&entry->refcount))
0247                                 entry = NULL;
0248                 }
0249                 mutex_unlock(&mutex);
0250                 return entry;
0251         }
0252 
0253         static void release_entry(struct kref *ref)
0254         {
0255                 struct my_data *entry = container_of(ref, struct my_data, refcount);
0256 
0257                 mutex_lock(&mutex);
0258                 list_del(&entry->link);
0259                 mutex_unlock(&mutex);
0260                 kfree(entry);
0261         }
0262 
0263         static void put_entry(struct my_data *entry)
0264         {
0265                 kref_put(&entry->refcount, release_entry);
0266         }
0267 
0268 Which is useful to remove the mutex lock around kref_put() in put_entry(), but
0269 it's important that kref_get_unless_zero is enclosed in the same critical
0270 section that finds the entry in the lookup table,
0271 otherwise kref_get_unless_zero may reference already freed memory.
0272 Note that it is illegal to use kref_get_unless_zero without checking its
0273 return value. If you are sure (by already having a valid pointer) that
0274 kref_get_unless_zero() will return true, then use kref_get() instead.
0275 
0276 Krefs and RCU
0277 =============
0278 
0279 The function kref_get_unless_zero also makes it possible to use rcu
0280 locking for lookups in the above example::
0281 
0282         struct my_data
0283         {
0284                 struct rcu_head rhead;
0285                 .
0286                 struct kref refcount;
0287                 .
0288                 .
0289         };
0290 
0291         static struct my_data *get_entry_rcu()
0292         {
0293                 struct my_data *entry = NULL;
0294                 rcu_read_lock();
0295                 if (!list_empty(&q)) {
0296                         entry = container_of(q.next, struct my_data, link);
0297                         if (!kref_get_unless_zero(&entry->refcount))
0298                                 entry = NULL;
0299                 }
0300                 rcu_read_unlock();
0301                 return entry;
0302         }
0303 
0304         static void release_entry_rcu(struct kref *ref)
0305         {
0306                 struct my_data *entry = container_of(ref, struct my_data, refcount);
0307 
0308                 mutex_lock(&mutex);
0309                 list_del_rcu(&entry->link);
0310                 mutex_unlock(&mutex);
0311                 kfree_rcu(entry, rhead);
0312         }
0313 
0314         static void put_entry(struct my_data *entry)
0315         {
0316                 kref_put(&entry->refcount, release_entry_rcu);
0317         }
0318 
0319 But note that the struct kref member needs to remain in valid memory for a
0320 rcu grace period after release_entry_rcu was called. That can be accomplished
0321 by using kfree_rcu(entry, rhead) as done above, or by calling synchronize_rcu()
0322 before using kfree, but note that synchronize_rcu() may sleep for a
0323 substantial amount of time.