strdup/free vs atomic reference counting

As I was working on a little C project of mine, I realized that often times I would have functions returning strings, and couldn't simply return the pointer directly (especially in mutli-threaded app) as the caller needs to be safe using the string.

Typically, what is done is simply return a copy of the string, and the caller must free it once it's done. But this would be happening a lot on my application, and I wondered if it wouldn't be better to save those strdup/free calls, using reference counting.

So I implemented that, and then recently found myself reading about this again, where I'd find both that avoiding those malloc/memcpy calls by using reference counting is good, or that atomic operations (which I used to implement reference counting) could be slow and once shouldn't be afraid of using malloc/free.

And I started wondering...

Since I couldn't really find a definite answer, I went ahead and made a little test app, to see the difference between those two methods. For those interested, here are the results.

Test application

First, the code used for those little tests, compiled simply with: gcc pkg-config --cflags --libs glib-2.0 m.c

  1. #include <glib.h>
  2.  
  3. #define REFCOUNT    1
  4. #define ATOMIC      1
  5. #define NB_THREADS  5
  6.  
  7. #if REFCOUNT
  8. typedef struct
  9. {
  10.     const gchar *str;
  11.     guint ref;
  12. #if ATOMIC
  13. #else
  14.     GMutex mutex;
  15. #endif
  16. } ss_t;
  17.  
  18. static ss_t *
  19. ref (ss_t *ss)
  20. {
  21. #if ATOMIC
  22.     g_atomic_int_inc (&ss->ref);
  23. #else
  24.     g_mutex_lock (&ss->mutex);
  25.     ++ss->ref;
  26.     g_mutex_unlock (&ss->mutex);
  27. #endif
  28.     return ss;
  29. }
  30.  
  31. static void
  32. unref (ss_t *ss)
  33. {
  34. #if ATOMIC
  35.     if (g_atomic_int_dec_and_test (&ss->ref))
  36.         g_free (ss);
  37. #else
  38.     g_mutex_lock (&ss->mutex);
  39.     if (--ss->ref == 0)
  40.     {
  41.         g_mutex_clear (&ss->mutex);
  42.         g_free (ss);
  43.     }
  44.     else
  45.         g_mutex_unlock (&ss->mutex);
  46. #endif
  47. }
  48.  
  49. static ss_t *str;
  50. static ss_t *
  51. get_string (void)
  52. {
  53.     return ref (str);
  54. }
  55. #else
  56. static const gchar *str = "foobar";
  57. static gchar *
  58. get_string (void)
  59. {
  60.     return g_strdup (str);
  61. }
  62. #endif
  63.  
  64. static void
  65. worker (void)
  66. {
  67.     int i;
  68. #if REFCOUNT
  69.     ss_t *s;
  70. #else
  71.     gchar *s;
  72. #endif
  73.  
  74.     for (i = 0; i < 100000; ++i)
  75.     {
  76.         s = get_string ();
  77. #if REFCOUNT
  78.         unref (s);
  79. #else
  80.         g_free (s);
  81. #endif
  82.     }
  83. }
  84.  
  85. int
  86. main (void)
  87. {
  88.     int i;
  89.     GThread *thread[NB_THREADS];
  90.  
  91. #if REFCOUNT
  92.     str = g_new (ss_t, 1);
  93.     str->str = "foobar";
  94.     str->ref = 1;
  95. #if ATOMIC
  96. #else
  97.     g_mutex_init (&str->mutex);
  98. #endif
  99. #endif
  100.  
  101.     for (i = 0; i < NB_THREADS; ++i)
  102.         thread[i] = g_thread_new ("worker", (GThreadFunc) worker, NULL);
  103.  
  104.     for (i = 0; i < NB_THREADS; ++i)
  105.         g_thread_join (thread[i]);
  106.  
  107. #if REFCOUNT
  108.     unref (str);
  109. #endif
  110.  
  111.     return 0;
  112. }

Results

First I ran those with 8 threads :

Reference counting, using mutex

Binary was 8,652 bytes; Average time: 0.442

==17320== HEAP SUMMARY:
==17320==     in use at exit: 5,476 bytes in 12 blocks
==17320==   total heap usage: 38 allocs, 26 frees, 8,092 bytes allocated
==17320== 
==17320== LEAK SUMMARY:
==17320==    definitely lost: 0 bytes in 0 blocks
==17320==    indirectly lost: 0 bytes in 0 blocks
==17320==      possibly lost: 2,016 bytes in 2 blocks
==17320==    still reachable: 3,460 bytes in 10 blocks
==17320==         suppressed: 0 bytes in 0 blocks

./a.out  1.32s user 0.47s system 345% cpu 0.520 total
./a.out  1.29s user 0.37s system 335% cpu 0.495 total
./a.out  1.35s user 0.40s system 334% cpu 0.524 total
./a.out  1.11s user 0.38s system 334% cpu 0.444 total
./a.out  1.37s user 0.53s system 357% cpu 0.531 total

Reference counting, atomic operations

Binary was 8,053 bytes; Average time: 0.199

==17441== HEAP SUMMARY:
==17441==     in use at exit: 5,476 bytes in 12 blocks
==17441==   total heap usage: 37 allocs, 25 frees, 8,044 bytes allocated
==17441== 
==17441== LEAK SUMMARY:
==17441==    definitely lost: 0 bytes in 0 blocks
==17441==    indirectly lost: 0 bytes in 0 blocks
==17441==      possibly lost: 2,016 bytes in 2 blocks
==17441==    still reachable: 3,460 bytes in 10 blocks
==17441==         suppressed: 0 bytes in 0 blocks

./a.out  0.76s user 0.00s system 357% cpu 0.214 total
./a.out  0.65s user 0.00s system 356% cpu 0.183 total
./a.out  0.83s user 0.00s system 372% cpu 0.225 total
./a.out  0.72s user 0.00s system 365% cpu 0.197 total
./a.out  0.61s user 0.00s system 349% cpu 0.175 total

Duplicate strings

Binary was 7,761 bytes; Average time: 0.039

==17601== HEAP SUMMARY:
==17601==     in use at exit: 5,476 bytes in 12 blocks
==17601==   total heap usage: 800,036 allocs, 800,024 frees, 5,608,028 bytes allocated
==17601== 
==17601== LEAK SUMMARY:
==17601==    definitely lost: 0 bytes in 0 blocks
==17601==    indirectly lost: 0 bytes in 0 blocks
==17601==      possibly lost: 2,016 bytes in 2 blocks
==17601==    still reachable: 3,460 bytes in 10 blocks
==17601==         suppressed: 0 bytes in 0 blocks

./a.out  0.10s user 0.00s system 292% cpu 0.035 total
./a.out  0.11s user 0.00s system 252% cpu 0.042 total
./a.out  0.10s user 0.01s system 269% cpu 0.040 total
./a.out  0.10s user 0.01s system 273% cpu 0.039 total
./a.out  0.10s user 0.00s system 273% cpu 0.038 total

After looking at these results, I worried that implementing reference counting might have done more bad than good in the end, slowing things here by a factor of 5. Of course, this is a pretty extreme case (8 threads all doing 100,000 operations). And let's not forget the difference in memory usage, either.

Down to 5 threads

Atomic reference counting

Average time: 0.117

==19689== HEAP SUMMARY:
==19689==     in use at exit: 5,476 bytes in 12 blocks
==19689==   total heap usage: 28 allocs, 16 frees, 7,087 bytes allocated
==19689== 
==19689== LEAK SUMMARY:
==19689==    definitely lost: 0 bytes in 0 blocks
==19689==    indirectly lost: 0 bytes in 0 blocks
==19689==      possibly lost: 2,016 bytes in 2 blocks
==19689==    still reachable: 3,460 bytes in 10 blocks
==19689==         suppressed: 0 bytes in 0 blocks

./a.out  0.43s user 0.00s system 355% cpu 0.122 total
./a.out  0.40s user 0.00s system 346% cpu 0.117 total
./a.out  0.38s user 0.00s system 350% cpu 0.110 total
./a.out  0.37s user 0.00s system 338% cpu 0.108 total
./a.out  0.45s user 0.00s system 350% cpu 0.127 total

Duplicating strings

Average time: 0.028

==19584== HEAP SUMMARY:
==19584==     in use at exit: 5,476 bytes in 12 blocks
==19584==   total heap usage: 500,027 allocs, 500,015 frees, 3,507,071 bytes allocated
==19584== 
==19584== LEAK SUMMARY:
==19584==    definitely lost: 0 bytes in 0 blocks
==19584==    indirectly lost: 0 bytes in 0 blocks
==19584==      possibly lost: 2,016 bytes in 2 blocks
==19584==    still reachable: 3,460 bytes in 10 blocks
==19584==         suppressed: 0 bytes in 0 blocks

./a.out  0.07s user 0.00s system 253% cpu 0.026 total
./a.out  0.06s user 0.01s system 239% cpu 0.028 total
./a.out  0.06s user 0.00s system 250% cpu 0.027 total
./a.out  0.07s user 0.00s system 222% cpu 0.030 total
./a.out  0.07s user 0.00s system 214% cpu 0.031 total

Atomic operations are still slower, but only by a factor or 4.

Down to 2 threads

Things change quite a bit with only two threads, where - apart for memory usage obviously - results were pretty much the same :

Reference counting (atomic)

Average time: 0.018

==19473== HEAP SUMMARY:
==19473==     in use at exit: 5,476 bytes in 12 blocks
==19473==   total heap usage: 19 allocs, 7 frees, 6,130 bytes allocated
==19473== 
==19473== LEAK SUMMARY:
==19473==    definitely lost: 0 bytes in 0 blocks
==19473==    indirectly lost: 0 bytes in 0 blocks
==19473==      possibly lost: 2,016 bytes in 2 blocks
==19473==    still reachable: 3,460 bytes in 10 blocks
==19473==         suppressed: 0 bytes in 0 blocks

./a.out  0.02s user 0.00s system 149% cpu 0.016 total
./a.out  0.02s user 0.00s system 137% cpu 0.017 total
./a.out  0.03s user 0.00s system 170% cpu 0.018 total
./a.out  0.03s user 0.00s system 168% cpu 0.018 total
./a.out  0.04s user 0.00s system 162% cpu 0.022 total

Duplicating strings

Average time: 0.017

==19381== HEAP SUMMARY:
==19381==     in use at exit: 5,476 bytes in 12 blocks
==19381==   total heap usage: 200,018 allocs, 200,006 frees, 1,406,114 bytes allocated
==19381== 
==19381== LEAK SUMMARY:
==19381==    definitely lost: 0 bytes in 0 blocks
==19381==    indirectly lost: 0 bytes in 0 blocks
==19381==      possibly lost: 2,016 bytes in 2 blocks
==19381==    still reachable: 3,460 bytes in 10 blocks
==19381==         suppressed: 0 bytes in 0 blocks

./a.out  0.03s user 0.00s system 171% cpu 0.016 total
./a.out  0.03s user 0.00s system 160% cpu 0.017 total
./a.out  0.03s user 0.00s system 164% cpu 0.016 total
./a.out  0.03s user 0.00s system 156% cpu 0.019 total
./a.out  0.03s user 0.00s system 145% cpu 0.018 total

Reference counting might not be a bad thing after all. Yes, atomic operations might be "slow"/slower, but it remains pretty fast, is definitely better as far as memory usage goes, and will probably only slow things down in extreme cases, if at all.

As a last try, I ran those tests again but having each of the 8 threads only do 1,000 operations, instead of the 100,000 they did so far. In this case, again except for memory usage of course, the results are pretty much exactly the same, averaging at 0.003 both when duplicating strings, or using atomic reference counting.

Atomic reference counting

==20666== HEAP SUMMARY:
==20666==     in use at exit: 5,476 bytes in 12 blocks
==20666==   total heap usage: 37 allocs, 25 frees, 8,044 bytes allocated
==20666== 
==20666== LEAK SUMMARY:
==20666==    definitely lost: 0 bytes in 0 blocks
==20666==    indirectly lost: 0 bytes in 0 blocks
==20666==      possibly lost: 2,016 bytes in 2 blocks
==20666==    still reachable: 3,460 bytes in 10 blocks
==20666==         suppressed: 0 bytes in 0 blocks

./a.out  0.00s user 0.00s system 79% cpu 0.004 total
./a.out  0.00s user 0.00s system 0% cpu 0.002 total
./a.out  0.00s user 0.00s system 0% cpu 0.002 total
./a.out  0.00s user 0.00s system 0% cpu 0.002 total
./a.out  0.00s user 0.00s system 69% cpu 0.005 total

Duplicating strings

==20634== HEAP SUMMARY:
==20634==     in use at exit: 5,476 bytes in 12 blocks
==20634==   total heap usage: 8,036 allocs, 8,024 frees, 64,028 bytes allocated
==20634== 
==20634== LEAK SUMMARY:
==20634==    definitely lost: 0 bytes in 0 blocks
==20634==    indirectly lost: 0 bytes in 0 blocks
==20634==      possibly lost: 2,016 bytes in 2 blocks
==20634==    still reachable: 3,460 bytes in 10 blocks
==20634==         suppressed: 0 bytes in 0 blocks

./a.out  0.00s user 0.00s system 0% cpu 0.003 total
./a.out  0.00s user 0.00s system 0% cpu 0.003 total
./a.out  0.00s user 0.00s system 0% cpu 0.002 total
./a.out  0.00s user 0.00s system 0% cpu 0.003 total
./a.out  0.00s user 0.00s system 0% cpu 0.003 total

So in the end, I'm gonna stick with atomic reference counting in my app, it should not, or barely, have any impact on speed, but will definitely help as far as memory usage goes.

Top of Page