C

Why C strlen function is considered as not safe

In the world of programming, writing secure and efficient code is a constant challenge, especially when working with low-level languages like C. C provides developers with powerful tools, but with great power comes great responsibility. Among these tools is the strlen function, a staple of string manipulation in C. While seemingly simple and straightforward, improper use of strlen can lead to unexpected behavior, performance issues, and even critical security vulnerabilities. This article explores the potential pitfalls of using strlen, why it’s considered unsafe in certain situations, and how developers can adopt safer practices to protect their applications.

Why using strlen can be risky ?

In C, the strlen function is a part of the standard library (<string.h>) and is used to determine the length of a null-terminated string. It starts at the beginning of the string and iterates through each character until it encounters the null-terminator (\0). The total number of characters encountered before the null-terminator is the string’s length.

#include <string.h>
size_t strlen(const char *str);

But what if the string does not have a null terminator? Let’s look at some example:

#include <stdio.h>
#include <string.h>

int main() {
    const char text[] = "Hello, World!";
    size_t length = strlen(text);

    printf("The length of the string is: %zu\n", length);
    printf("The size of the string is: %zu\n", sizeof(text));
    return 0;
}

When you run this example, you will see two values printed in the terminal. The first is the length of the „Hello, World!” string, and the second is the actual size in memory.

The length of the string is: 13
The size of the string is: 14

We didn’t put a null character at the and of the string, but it works as expected. The length is correct and the size is 1 byte larger (because of the null character at the end, which is automatically added). So where is the risk? Let’s make a copy using dynamic allocation.

// create some buffer bigger than text
char *buffer = malloc(length*2);
// put some not null data
memset(buffer, 0xFF, length*2);
printf("%zu bytes of memory allocated", length*2);
    
// make copy
memcpy(buffer, text, length);
size_t length_copy = strlen(buffer);

printf("\nThe length of the tex copy in buffer: %zu\n", length_copy);

The output will be:

26 bytes of memory allocated
The length of the tex copy in buffer: 26

In this situation strlen functions will iterate past the end of the buffer, which is undefined behavior. As you can see, the length of the copy created in the buffer, which contained some non-zero data, was incorrect. . We did this on purpose with the memset function, but this is a very common situation where dynamically allocated memory has some garbage values.

What you can do about it ?

dThe first and most common solution is to make sure that the buffer or pointer to the allocated memory is empty (filled with a null value) and larger than the actual length to make room for the null terminator.

The second option, which is useful when you can’t flush or buffer, for example when you don’t want to lose other data after the copy, is to explicitly add a null value.

The third option is to use the safer strlen_s function. The strlen_s function is a more secure version of the traditional strlen function in C. It is part of the C11 Bounds Checking Interfaces introduced in Annex K of the C11 standard. This function includes a mechanism to prevent reading beyond the bounds of a buffer. It takes an additional argument that is the maximum allowed size of the string, including the null terminator. This ensures that the function does not read beyond the specified buffer length.

size_t strnlen_s( const char* str, size_t strsz );

Summary

The strlen function in C is a widely used standard library function for determining the length of a null-terminated string. While it seems straightforward, its use can lead to subtle bugs and security vulnerabilities if not handled carefully. The primary issues stem from its lack of bounds checking, reliance on proper null-termination, and potential for undefined behavior when used with invalid or untrusted input. Be careful when using it, especially when copying strings or working with unknown strings. It is worth considering using the safer option of the strlen_s function introduced in the C11 standard.

Links

Hello 👋 Nice to meet you.

Sign up here if you would like to be notified when new content is added to the blog.

We will not spam you. You will only receive important information, such as changes to the blog or new content.

Dodaj komentarz