Skip to content Skip to main navigation Skip to footer

C Strings

c strings

There is no separate string type in C. Strings are treated as a character array, that is, a char type array. For example, the string “Hello” is treated as an array {'H', 'e', 'l', 'l', 'o'}.

The compiler allocates a contiguous section of memory for the array, and all characters are stored in adjacent memory cells. At the end of the string, C programming language  automatically adds a binary 0, written as the \0 character, which indicates the end of the string.

The character \0 differs from the character 0, the ASCII code of the former is 0 (binary value is 0000000000 ), the ASCII code of the latter is 48 (binary value is 00110000 ). Therefore, the actual array stored in the string “Hello” is {'H', 'e', 'l', 'l', 'o', '\0'}.

The advantage of this is that C does not need to know the length of the string to read the string inside memory, as soon as it finds a character that is /0, then it knows that the string is ended.

char localString[10];

The above example declares a character array of 10 members that can be treated as a string. Since one position must be left for \0, it can only hold a string of at most 9 characters.

Writing strings as arrays can be cumbersome, and C provides a shortcut where the characters inside double quotes are automatically treated as character arrays.

{'H', 'e', 'l', 'l', 'o', '\0'}

// Equivalent to

"Hello"

The two strings written above are equivalent and are stored in memory in the same way. For strings inside double quotes, you don’t need to add the ending character \0 yourself, the C compiler will add it automatically.

Note: that double quotes are inside strings and single quotes are inside characters, they are not interchangeable. If you put Hello inside single quotes, the compiler will report an error.

On the other hand, even if there is only one character inside a double quote (e.g. “a”), it is still considered as a string (occupying 2 bytes) instead of a character “a” (occupying 1 byte).

If the string contains double quotes inside, the double quotes need to be escaped with a backslash.

"She replied, \"It does.\""

The backslash can also indicate other special characters, such as line break (\n), tab (\t), etc.

"Hello, world!\n"

If the string is too long, you can break a line into multiple lines by using a backslash (\) at the end where you need to break the line.

"hello \
world"

Declaration of String

A string variable can be declared as an array of characters, or as a pointer to an array of characters.

char s[14] = "Hello, world!";

Or

char* s = "Hello, world!";

The length of the character array can be omitted from the declaration, because the length of the character array can be calculated automatically by the compiler.

char s[] = "Hello, world!";

The length of the character array, which can be larger than the actual length of the string.

char s[50] = "hello";

The length of the character array s is 50, but the actual length of the string “hello” is only 6 (including the ending symbol \0), so the next 44 empty bits will be initialized to \0.

The length of the character array cannot be less than the actual length of the string.

char s[5] = "hello";

The length of the string array s is 5, which is smaller than the actual length of the string “hello“, which is 6, and the compiler will report an error.

Differences between the two string declaration methods

The two methods of declaring string variables, character pointers and character arrays, are essentially equivalent, but there are two differences.

The first difference is that a pointer to a string is treated as a constant inside the C programming language and cannot modify the string itself.

char* s = "Hello, world!";
s[0] = 'z'; // error

If you use arrays to declare string variables, you won’t have this problem and can modify any member of the array.

char s[] = "Hello, world!";
s[0] = 'z';

Why can’t a string be modified when it is declared as a pointer, but can be modified when it is declared as an array?

The reason is that the system stores the literals of strings in the constant area of memory, and this area is not allowed to be modified by the user. When declared as a pointer, the value stored in the pointer variable is a memory address that points to the constant area, so the user cannot modify the constant area through this address.

However, when declared as an array, the compiler allocates a separate section of memory for the array, and the string literals are interpreted by the compiler as an array of characters, which are written one by one into this newly allocated memory and this new memory area is allowed to be modified.

To remind the user that a string cannot be modified after being declared as a pointer, you can use the const specifier in the declaration to ensure that the string is read-only.

const char* s = "Hello, world!";

The second difference is that pointer variables can point to other strings.

char* s = "hello";
s = "world";

However, a character array variable cannot point to another string.

char s[] = "hello";
s = "world"; // error

In the above example, the array name of the character array, which always points to the string address at initialization, cannot be modified.

Why can’t an array variable be assigned to another array?

The reason is that the address of an array variable cannot be changed. In other words, once the compiler assigns an address to an array variable, this address is bound to the array variable and this binding relationship is unchanging; The C programming language specifies that an array variable is an unmodifiable left value, i.e., it cannot be reassigned with the assignment operator.

If you want to reassign the value, you have to use the strcpy() function, which is provided natively in C, to do the assignment by string copying. After doing so, the address of the array variable remains unchanged, i.e., strcpy() just writes a new string at the original address, instead of making the array variable point to the new address.

char s[10];
strcpy(s, "abc");

strlen()

The strlen() function returns the length of a string in bytes, excluding the null character /0 at the end. The prototype of this function is as follows:

// string.h
size_t strlen(const char* s);

Its argument is a string variable that returns an unsigned integer of type size_t, unless it is an extremely long string, which is usually treated as an int type.

Here is an example:

char* str = "hello";
int len = strlen(str); // 5

The prototype of strlen() is defined in the string.h file of the standard library, and the header file string.h needs to be loaded when using it.

#include <stdio.h>

#include <string.h>

int main(void) {

  char* s = "Hello, world!";

  printf("The string is %zd characters long.\n", strlen(s));

}

Note: that the length of a string (strlen()) and the length of a string variable (sizeof()), are two different concepts.

char s[50] = "hello";

printf("%d\n", strlen(s));  // 5

printf("%d\n", sizeof(s));  // 50

If you don’t use this function, you can calculate the length of the string yourself by determining the \0 at the end of the string.

int my_strlen(char *s) {

  int count = 0;

  while (s[count] != '\0')

    count++;

  return count;

}

strcpy()

Using the assignment operator to assign a string directly to a character array variable is not allowed.

char str1[10];

char str2[10];

str1 = "abc"; // error

str2 = str1;  // error

Both of the above methods of copying strings are wrong. This is because the variable name of an array is a fixed address and cannot be modified to make it point to another address.

In the case of character pointers, the assignment operator (=) simply copies the address of one pointer to another, not to a string.

char* s1;

char* s2;

s1 = "abc";

s2 = s1;

The above code is valid and results in two pointer variables s1 and s2 pointing to the same string, instead of copying the contents of string s1 to s2.

C provides the strcpy() function for copying the contents of one string to another, which is equivalent to string assignment. The prototype of this function is defined inside the string.h header file.

strcpy(char dest[], const char source[])

strcpy() accepts two arguments, the first argument is the destination string array, and the second argument is the source string array. Before copying the strings, you must ensure that the length of the first argument is not less than the second argument;

Otherwise, although no error will be reported, the bounds of the first string variable will be overflowed and unpredictable results will occur. The const specifier of the second argument indicates that this function will not modify the second string.

#include <stdio.h>

#include <string.h>

int main(void) {

  char s[] = "Hello, world!";

  char t[100];

  strcpy(t, s);

  t[0] = 'z';

  printf("%s\n", s);  // "Hello, world!"

  printf("%s\n", t);  // "zello, world!"

}

The above example copies the value of variable s into variable t into two different strings, so modifying one does not affect the other. In addition, the variable t is longer than s. The extra position after copying (the position after the end marker/0) is a random value.

strcpy() can also be used to assign values to character arrays.

char str[10];

strcpy(str, "abcd");

The above example assigns the string “abcd” to a character array variable.

The return value of strcpy() is a pointer to the string of the first argument (i.e. char*).

The strcpy() function is a safety risk because it does not check whether the target string is long enough to hold a copy of the source string, which could lead to a write overflow. If there is no guarantee that an overflow will not occur, it is recommended that the strncpy() function be used instead.

strncpy()

strncpy() is exactly the same as strcpy(), except that it takes a third argument to specify the maximum number of characters to be copied to prevent overflowing the bounds of the target string variable.

char* strncpy(

  char* dest,

  char* src,

  size_t n

);

strncpy(str1, str2, sizeof(str1) - 1);

str1[sizeof(str1) - 1] = '\0';

In the above example, the string str2 is copied into str1, but the length of the copy is at most the length of str1 minus 1.

The last bit left in str1 is used to write the end marker \0 to the string. This is because strncpy() does not add \0 by itself, so if the copied string fragment does not contain an end flag, you need to add it manually.

strcat()

The strcat() function is used to concatenate strings. It accepts two strings as arguments and adds a copy of the second string to the end of the first string. This function changes the first string, but leaves the second string unchanged.

The prototype of this function is defined in the string.h header file.

char* strcat(char* s1, const char* s2);

The return value of strcat() is a string pointer to the first argument.

char s1[12] = "hello";

char s2[6] = "world";

strcat(s1, s2);

puts(s1); // "helloworld"

In the above example, after calling strcat(), you can see that the value of the string s1 has changed.

Note: that the length of the first argument to strcat() must be sufficient to accommodate the string to which the second argument is added. Otherwise, the spliced string will overflow the bounds of the first string and be written to an adjacent memory cell, which is dangerous and it is recommended to use strncat() below instead.

strncat()

strncat() is used to concatenate two strings. Its usage is exactly the same as strcat(), just add a third parameter to specify the maximum number of characters to be added. During the addition process, once the specified number of characters is reached, or if the null character \0 is encountered in the source string, it will no longer be added. Its prototype is defined inside the string.h header file.

char* strncat(

  const char* dest,

  const char* src,

  size_t n

);

strncat() returns the first argument, which is a pointer to the destination string.

To ensure that the string being concatenated does not exceed the length of the target string, strncat() is usually written as follows.

strncat(

  str1,

  str2,

  sizeof(str1) - strlen(str1) - 1

);

strncat() always automatically adds the null character \0 at the end of the splice result, so the maximum value of the third argument should be the length of the variable str1 minus the length of the string str1 minus 1.

Here is an example:

char s1[10] = "Monday";

char s2[8] = "Tuesday";

strncat(s1, s2, 3);

puts(s1); // "MondayTue"

The variable length of s1 is 10 and the character length is 6. Subtract both of them and then subtract 1 to get 3. This means that s1 can add at most three more characters, so the result obtained is MondayTue.

strcmp()

The strcmp() function is used to compare the contents of two strings. The prototype of this function is as follows and is defined in the string.h header file.

int strcmp(const char* s1, const char* s2);

Based on the dictionary order, if two strings are the same, the return value is 0; if s1 is less than s2, strcmp() returns less than 0; if s1 is greater than s2, the return value is greater than 0.

Here is an example:

Char* s1 = “Happy New Year”

Char* s2 = “Happy New Year”

Char* s3 = “Happy Holidays”

strcmp(s1, s2) // 0

strcmp(s1, s3) // greater than 0

strcmp(s3, s1) // less than 0

Note: that strcmp() is only used to compare strings, not characters.

Note: that strcmp() is only used to compare strings, not characters.

strncmp()

Since strcmp() compares the entire string, C provides another strncmp() function that compares only the specified position.

This function adds a third argument specifying the number of characters to be compared. Its prototype is defined in the string.h header file.

int strncmp(

  const char* s1,

  const char* s2,

  size_t n

);

It returns the same value as strcmp(). If two strings are the same, the return value is 0; if s1 is less than s2, strcmp() returns less than 0; if s1 is greater than s2, the return value is greater than 0.

Here is an example:

char s1[12] = "hello world";

char s2[12] = "hello C";

if (strncmp(s1, s2, 5) == 0) {

  printf("They all have hello.\n");

}

The above example compares only the first 5 characters of two strings.

Array of Strings in C

If each member of an array is a string, a two-dimensional array of characters is needed to implement it. Each string is itself a character array, and then multiple strings are formed into one array.

char weekdays[7][10] = {

  "Monday",

  "Tuesday",

  "Wednesday",

  "Thursday",

  "Friday",

  "Saturday",

  "Sunday"

};

The above example is an array of strings containing a total of 7 strings, so the length of the first dimension is 7. The length of the longest string is 10 (including the terminator \0), so the length of the second dimension is set to 10 uniformly.

Was This Article Helpful?

1
Related Articles
0 Comments

There are no comments yet

Leave a comment

Your email address will not be published.