strdup/free vs atomic reference counting @ jjacky.com

strdup/free vs atomic reference counting

As I was working on a little C project of mine, I realized that often times I would have functions returning strings, and couldn't simply return the pointer directly (especially in mutli-threaded app) as the caller needs to be safe using the string.

Typically, what is done is simply return a copy of the string, and the caller must free it once it's done. But this would be happening a lot on my application, and I wondered if it wouldn't be better to save those strdup/free calls, using reference counting.

So I implemented that, and then recently found myself reading about this again, where I'd find both that avoiding those malloc/memcpy calls by using reference counting is good, or that atomic operations (which I used to implement reference counting) could be slow and once shouldn't be afraid of using malloc/free.

And I started wondering...

Since I couldn't really find a definite answer, I went ahead and made a little test app, to see the difference between those two methods. For those interested, here are the results.

Test application

First, the code used for those little tests, compiled simply with: gcc pkg-config --cflags --libs glib-2.0 m.c

#include <glib.h>
#define REFCOUNT 1
#define ATOMIC 1
#define NB_THREADS 5
#if REFCOUNT
typedef struct
{
const gchar *str;
guint ref;
#if ATOMIC
#else
GMutex mutex;
#endif
} ss_t;
static ss_t *
ref (ss_t *ss)
{
#if ATOMIC
g_atomic_int_inc (&ss->ref);
#else
g_mutex_lock (&ss->mutex);
++ss->ref;
g_mutex_unlock (&ss->mutex);
#endif
return ss;
}
static void
unref (ss_t *ss)
{
#if ATOMIC
if (g_atomic_int_dec_and_test (&ss->ref))
g_free (ss);
#else
g_mutex_lock (&ss->mutex);
if (--ss->ref == 0)
{
g_mutex_clear (&ss->mutex);
g_free (ss);
}
else
g_mutex_unlock (&ss->mutex);
#endif
}
static ss_t *str;
static ss_t *
get_string (void)
{
return ref (str);
}
#else
static const gchar *str = "foobar";
static gchar *
get_string (void)
{
return g_strdup (str);
}
#endif
static void
worker (void)
{
int i;
#if REFCOUNT
ss_t *s;
#else
gchar *s;
#endif
for (i = 0; i < 100000; ++i)
{
s = get_string ();
#if REFCOUNT
unref (s);
#else
g_free (s);
#endif
}
}
int
main (void)
{
int i;
GThread *thread[NB_THREADS];
#if REFCOUNT
str = g_new (ss_t, 1);
str->str = "foobar";
str->ref = 1;
#if ATOMIC
#else
g_mutex_init (&str->mutex);
#endif
#endif
for (i = 0; i < NB_THREADS; ++i)
thread[i] = g_thread_new ("worker", (GThreadFunc) worker, NULL);
for (i = 0; i < NB_THREADS; ++i)
g_thread_join (thread[i]);
#if REFCOUNT
unref (str);
#endif
return 0;
}

Results

First I ran those with 8 threads :

Reference counting, using mutex

Binary was 8,652 bytes; Average time: 0.442

==17320== HEAP SUMMARY:
==17320==     in use at exit: 5,476 bytes in 12 blocks
==17320==   total heap usage: 38 allocs, 26 frees, 8,092 bytes allocated
==17320== 
==17320== LEAK SUMMARY:
==17320==    definitely lost: 0 bytes in 0 blocks
==17320==    indirectly lost: 0 bytes in 0 blocks
==17320==      possibly lost: 2,016 bytes in 2 blocks
==17320==    still reachable: 3,460 bytes in 10 blocks
==17320==         suppressed: 0 bytes in 0 blocks

./a.out  1.32s user 0.47s system 345% cpu 0.520 total
./a.out  1.29s user 0.37s system 335% cpu 0.495 total
./a.out  1.35s user 0.40s system 334% cpu 0.524 total
./a.out  1.11s user 0.38s system 334% cpu 0.444 total
./a.out  1.37s user 0.53s system 357% cpu 0.531 total

Reference counting, atomic operations

Binary was 8,053 bytes; Average time: 0.199

==17441== HEAP SUMMARY:
==17441==     in use at exit: 5,476 bytes in 12 blocks
==17441==   total heap usage: 37 allocs, 25 frees, 8,044 bytes allocated
==17441== 
==17441== LEAK SUMMARY:
==17441==    definitely lost: 0 bytes in 0 blocks
==17441==    indirectly lost: 0 bytes in 0 blocks
==17441==      possibly lost: 2,016 bytes in 2 blocks
==17441==    still reachable: 3,460 bytes in 10 blocks
==17441==         suppressed: 0 bytes in 0 blocks

./a.out  0.76s user 0.00s system 357% cpu 0.214 total
./a.out  0.65s user 0.00s system 356% cpu 0.183 total
./a.out  0.83s user 0.00s system 372% cpu 0.225 total
./a.out  0.72s user 0.00s system 365% cpu 0.197 total
./a.out  0.61s user 0.00s system 349% cpu 0.175 total

Duplicate strings

Binary was 7,761 bytes; Average time: 0.039

==17601== HEAP SUMMARY:
==17601==     in use at exit: 5,476 bytes in 12 blocks
==17601==   total heap usage: 800,036 allocs, 800,024 frees, 5,608,028 bytes allocated
==17601== 
==17601== LEAK SUMMARY:
==17601==    definitely lost: 0 bytes in 0 blocks
==17601==    indirectly lost: 0 bytes in 0 blocks
==17601==      possibly lost: 2,016 bytes in 2 blocks
==17601==    still reachable: 3,460 bytes in 10 blocks
==17601==         suppressed: 0 bytes in 0 blocks

./a.out  0.10s user 0.00s system 292% cpu 0.035 total
./a.out  0.11s user 0.00s system 252% cpu 0.042 total
./a.out  0.10s user 0.01s system 269% cpu 0.040 total
./a.out  0.10s user 0.01s system 273% cpu 0.039 total
./a.out  0.10s user 0.00s system 273% cpu 0.038 total

After looking at these results, I worried that implementing reference counting might have done more bad than good in the end, slowing things here by a factor of 5. Of course, this is a pretty extreme case (8 threads all doing 100,000 operations). And let's not forget the difference in memory usage, either.

Down to 5 threads

Atomic reference counting

Average time: 0.117

==19689== HEAP SUMMARY:
==19689==     in use at exit: 5,476 bytes in 12 blocks
==19689==   total heap usage: 28 allocs, 16 frees, 7,087 bytes allocated
==19689== 
==19689== LEAK SUMMARY:
==19689==    definitely lost: 0 bytes in 0 blocks
==19689==    indirectly lost: 0 bytes in 0 blocks
==19689==      possibly lost: 2,016 bytes in 2 blocks
==19689==    still reachable: 3,460 bytes in 10 blocks
==19689==         suppressed: 0 bytes in 0 blocks

./a.out  0.43s user 0.00s system 355% cpu 0.122 total
./a.out  0.40s user 0.00s system 346% cpu 0.117 total
./a.out  0.38s user 0.00s system 350% cpu 0.110 total
./a.out  0.37s user 0.00s system 338% cpu 0.108 total
./a.out  0.45s user 0.00s system 350% cpu 0.127 total

Duplicating strings

Average time: 0.028

==19584== HEAP SUMMARY:
==19584==     in use at exit: 5,476 bytes in 12 blocks
==19584==   total heap usage: 500,027 allocs, 500,015 frees, 3,507,071 bytes allocated
==19584== 
==19584== LEAK SUMMARY:
==19584==    definitely lost: 0 bytes in 0 blocks
==19584==    indirectly lost: 0 bytes in 0 blocks
==19584==      possibly lost: 2,016 bytes in 2 blocks
==19584==    still reachable: 3,460 bytes in 10 blocks
==19584==         suppressed: 0 bytes in 0 blocks

./a.out  0.07s user 0.00s system 253% cpu 0.026 total
./a.out  0.06s user 0.01s system 239% cpu 0.028 total
./a.out  0.06s user 0.00s system 250% cpu 0.027 total
./a.out  0.07s user 0.00s system 222% cpu 0.030 total
./a.out  0.07s user 0.00s system 214% cpu 0.031 total

Atomic operations are still slower, but only by a factor or 4.

Down to 2 threads

Things change quite a bit with only two threads, where - apart for memory usage obviously - results were pretty much the same :

Reference counting (atomic)

Average time: 0.018

==19473== HEAP SUMMARY:
==19473==     in use at exit: 5,476 bytes in 12 blocks
==19473==   total heap usage: 19 allocs, 7 frees, 6,130 bytes allocated
==19473== 
==19473== LEAK SUMMARY:
==19473==    definitely lost: 0 bytes in 0 blocks
==19473==    indirectly lost: 0 bytes in 0 blocks
==19473==      possibly lost: 2,016 bytes in 2 blocks
==19473==    still reachable: 3,460 bytes in 10 blocks
==19473==         suppressed: 0 bytes in 0 blocks

./a.out  0.02s user 0.00s system 149% cpu 0.016 total
./a.out  0.02s user 0.00s system 137% cpu 0.017 total
./a.out  0.03s user 0.00s system 170% cpu 0.018 total
./a.out  0.03s user 0.00s system 168% cpu 0.018 total
./a.out  0.04s user 0.00s system 162% cpu 0.022 total

Duplicating strings

Average time: 0.017

==19381== HEAP SUMMARY:
==19381==     in use at exit: 5,476 bytes in 12 blocks
==19381==   total heap usage: 200,018 allocs, 200,006 frees, 1,406,114 bytes allocated
==19381== 
==19381== LEAK SUMMARY:
==19381==    definitely lost: 0 bytes in 0 blocks
==19381==    indirectly lost: 0 bytes in 0 blocks
==19381==      possibly lost: 2,016 bytes in 2 blocks
==19381==    still reachable: 3,460 bytes in 10 blocks
==19381==         suppressed: 0 bytes in 0 blocks

./a.out  0.03s user 0.00s system 171% cpu 0.016 total
./a.out  0.03s user 0.00s system 160% cpu 0.017 total
./a.out  0.03s user 0.00s system 164% cpu 0.016 total
./a.out  0.03s user 0.00s system 156% cpu 0.019 total
./a.out  0.03s user 0.00s system 145% cpu 0.018 total

Reference counting might not be a bad thing after all. Yes, atomic operations might be "slow"/slower, but it remains pretty fast, is definitely better as far as memory usage goes, and will probably only slow things down in extreme cases, if at all.

As a last try, I ran those tests again but having each of the 8 threads only do 1,000 operations, instead of the 100,000 they did so far. In this case, again except for memory usage of course, the results are pretty much exactly the same, averaging at 0.003 both when duplicating strings, or using atomic reference counting.

Atomic reference counting

==20666== HEAP SUMMARY:
==20666==     in use at exit: 5,476 bytes in 12 blocks
==20666==   total heap usage: 37 allocs, 25 frees, 8,044 bytes allocated
==20666== 
==20666== LEAK SUMMARY:
==20666==    definitely lost: 0 bytes in 0 blocks
==20666==    indirectly lost: 0 bytes in 0 blocks
==20666==      possibly lost: 2,016 bytes in 2 blocks
==20666==    still reachable: 3,460 bytes in 10 blocks
==20666==         suppressed: 0 bytes in 0 blocks

./a.out  0.00s user 0.00s system 79% cpu 0.004 total
./a.out  0.00s user 0.00s system 0% cpu 0.002 total
./a.out  0.00s user 0.00s system 0% cpu 0.002 total
./a.out  0.00s user 0.00s system 0% cpu 0.002 total
./a.out  0.00s user 0.00s system 69% cpu 0.005 total

Duplicating strings

==20634== HEAP SUMMARY:
==20634==     in use at exit: 5,476 bytes in 12 blocks
==20634==   total heap usage: 8,036 allocs, 8,024 frees, 64,028 bytes allocated
==20634== 
==20634== LEAK SUMMARY:
==20634==    definitely lost: 0 bytes in 0 blocks
==20634==    indirectly lost: 0 bytes in 0 blocks
==20634==      possibly lost: 2,016 bytes in 2 blocks
==20634==    still reachable: 3,460 bytes in 10 blocks
==20634==         suppressed: 0 bytes in 0 blocks

./a.out  0.00s user 0.00s system 0% cpu 0.003 total
./a.out  0.00s user 0.00s system 0% cpu 0.003 total
./a.out  0.00s user 0.00s system 0% cpu 0.002 total
./a.out  0.00s user 0.00s system 0% cpu 0.003 total
./a.out  0.00s user 0.00s system 0% cpu 0.003 total

So in the end, I'm gonna stick with atomic reference counting in my app, it should not, or barely, have any impact on speed, but will definitely help as far as memory usage goes.

∞ Posted 16 March 2013

Top of Page