The joys of C API design: stripping whitespace from C-strings

Suppose you are tasked with stripping leading and trailing whitespace from a string. In dynamic, garbage-collected languages whose standard libraries are usually equipped with functions that handle this, you don’t even think about the implementation. In Python (whose strings are immutable), you’d simply call the string’s .strip() method and get a stripped string back. In Golang, you’d call strings.TrimSpaces() and receive a new, stripped slice of the string.

While designing such a function in C (where you are required to manually manage memory) isn’t difficult - it can be done in a seemingly myriad of ways. Do you modify the original string? Do you allocate a new string on the heap? If so, how do you return a pointer to it? Or, do you fill in a provided buffer instead? What if the buffer isn’t large enough? Etc…

Below I’ll document the implementations I managed to come up with. All of them assume that we are working with traditional, null-terminated C-style strings.

strip_inplace, strip_inplace_1

 1char *strip_inplace(char *string) {
 2    char *new_start = string;
 3    while(isspace(*new_start))
 4        new_start++;
 5
 6    // End of string - string was composed of all-whitespace character, or it was empty
 7    if(*new_start == '\0') return new_start;
 8
 9    char *e = string + strlen(string) - 1;
10    while(isspace(*e))
11        e--;
12
13    e[1] = '\0';
14
15    return new_start;
16}

strip_inplace strips the passed in string by modifying it, well, in-place. To strip leading whitespace, it returns a pointer to the string’s first non-whitespace character. To strip trailing whitespace, it replaces the first byte past the last non-whitespace character in the string with a null byte, indicating end of string.

Proper use of this function looks like so:

1/* returns pointer to some heap-allocated, null-terminated string */
2char *str = get_some_string();
3char *str_stripped = strip_inplace(str);

If you are responsible for freeing the memory of the original string, be careful not to overwrite the pointer to the original string, as you will need to call free on it, not on the one returned by strip_inplace!

 1void *strip_inplace_1(char *string, char **ret) {
 2    char *new_start = string;
 3    while(isspace(*new_start))
 4        new_start++;
 5
 6    // End of string - string was composed of all-whitespace character, or it was empty
 7    if(*new_start == '\0') {
 8        *ret = new_start;
 9        return;
10    }
11
12    char *e = string + strlen(string) - 1;
13    while(isspace(*e))
14        e--;
15
16    e[1] = '\0';
17
18    *ret = new_start;
19}

strip_inplace_1 does the exact same thing, except that it yields the pointer to the start of the stripped string differently - instead of performing a regular return, it writes its address to a passed-in pointer. Thus, proper use of the function may look like so:

1char *str = get_some_string();
2char *str_stripped;
3
4strip_inplace_1(str, &str_stripped);

strip_alloc

 1char *strip_alloc(const char *string) {
 2    size_t original_len = strlen(string);
 3
 4    const char *new_start = string;
 5    while(isspace(*new_start)) {
 6        new_start++;
 7    }
 8
 9    if(*new_start == '\0') { 
10        char *buf = malloc(1);
11        if (!buf) return buf;
12        *buf = '\0';
13        return buf;
14    }
15
16    const char *e = string + original_len - 1;
17    while(isspace(*e)) {
18        e--;
19    }
20
21    if(new_start == string && e == (string + original_len - 1))
22        return string;
23
24    char *buf = malloc(e - new_start + 1 + 1);
25    if(!buf) return buf;
26    memcpy(buf, new_start, e - new_start + 1);
27
28    // null-terminate the buffer
29    buf[e - new_start + 1] = '\0';
30
31    return buf;
32}

strip_alloc strips the passed-in string and stores it in a new, heap-allocated buffer, returning its address. The caller is then responsible for freeing the memory returned by this function. If the passed in string was already stripped (i.e. it didn’t need any stripping), the function then simply returns a pointer the same passed-in string. Thus, proper use of the function should look like so:

1char *str = get_some_string();
2char *str_stripped = strip_alloc(str);
3
4/* ... */
5
6if (str_stripped != str) {f
7    free(str_stripped);
8}

Choosing what to return in this case is a design decision that can be tweaked. We could for example choose to return a NULL pointer instead.

As with the first implementation, a different way of returning the pointer to the string by virtue of overwriting a passed-in pointer could be presented, but I’ll spare you the entire definition as it is almost entirely the same, and after all just a matter of taste.

strip_buf

 1char *strip_buf(const char *string, char *buffer, size_t bufsize) {
 2    const char *new_start = string;
 3    while(isspace(*new_start))
 4        new_start++;
 5
 6    if(*new_start == '\0')
 7        return memset(buffer, 0, bufsize);
 8
 9    const char *end = string + strlen(string) - 1;
10    while(isspace(*end))
11        end--;
12
13    ptrdiff_t diff = end - new_start;
14
15    if(bufsize < (diff + 2)) {
16        memcpy(buffer, new_start, bufsize - 1);
17        buffer[bufsize - 1] = '\0';
18    } else {
19        memcpy(buffer, new_start, diff + 1);
20        memset(buffer + diff + 1, 0, bufsize - (diff + 1));
21    }
22
23    return buffer;
24}

strip_buf strips the passed-in string and stores it in the user-provided buffer. It writes at most bufsize bytes, including the null-terminating byte. If the buffer isn’t large enough to accommodate the entire stripped string, it is then truncated. If the buffer is larger than the stripped string, the rest of it is padded with null-terminating bytes. The function returns back the pointer to the passed in buffer.

Returning back the pointer to the passed in buffer is a design decition that at first may not seem that interesting or important. However, it enables the caller to do the following:

1char *str = get_some_string();
2char str_stripped[128];
3
4process_string(strip_buf(str, str_stripped, 128));

If the strip_buf function weren’t returning any value, the above snippet of code would then have to look like:

1char *str = get_some_string();
2char str_stripped[128];
3
4strip_buf(str, str_stripped, 128);
5process_string(str_stripped);

Some might argue that the longer version is more explicit and therefore more readable, but when the stripping function returns the pointer back, it gives us the freedom to write the code either way.

strip_len

 1size_t strip_len(const char *string, char **ptr) {
 2    const char *start = string;
 3    while (isspace(*start)) {
 4        start++;
 5    }
 6
 7    const char *end = string + strlen(string) - 1;
 8    while (isspace(*end)) {
 9        end--;
10    }
11
12    *ptr = start;
13    return (size_t) (end - start + 1);
14}

strip_len does not modify the passed in string, does not fill any memory, and does no allocations. It stores the pointer to the first non-whitespace character of the string at *ptr and returns the count of characters from that character until the last non-whitespace character of the string. Though not inherently useful, these two values together represent a stripped version of the passed-in string.