Strings are often used to refer to strings of characters, also known as just strings, or strings of bytes, also known as byte strings or binary strings, used in some contexts such as bencoding. Strings are declared in source code as string literals.
Formal theory[edit | edit source]
A string over an alphabet (a finite set) A is a finite sequence of elements from A. The alphabet can be any set of symbols, such as 0 and 1 for bit strings, A to Z for such strings based on the alphabet, and even DNA (A, T, G, C).
The length of a string is the number of elements in it, usually denoted by |c|.
The null string or the empty string, usually denoted by λ, is the string over a set A that contains no elements.
The concatenation of two strings c and d, denoted by cd, is a string consisting of the elements of c followed by the elements of d, in order.
Formally, a substring of a string a is a string d such that there exists strings c and e such that a = cde. Informally, d is a substring of a when the string d can be found in a, in order.
The set of all strings of length n over a finite set A is denoted by An The set of all strings over a finite set A is denoted by A*, and is a countably infinite set and a Kleen closure. The set A* excluding the empty string is denoted by A+
Representation as data[edit | edit source]
This section deals mainly with the definition of strings of text. There are many implementations of character strings. Strings are often encoded using a collection of rules, called character encodings.
An obvious implementation would be an array, with each element of it being a character code for a single character.
Another implementation is by using a terminal string, which defines the end of the string. A common implementation of this is by using null-terminated strings, also called C strings, after the C language. The basic usage is using a fixed array of characters, then using the null character (ASCII NUL control code, having a value of zero) to terminate the string; this way, the string could have an arbitrary size below the array size (minus one) without the programmer having to worry about redefining the size of the array.
Strings can also use a length code, these are an old implementation called Pascal strings, after the programming language Pascal. A "character" in the front defines the length of the string. Using this method, the string can only contains as many characters as the maximum size of that character.
Common operations[edit | edit source]
Common methods associated with string data are:
- Character at index - returns the character at a specified index. This is the same as getting the element in an array from a specified index if the string can be converted to or is implemented as an array. May be implemented from a list interface.
- Compare - compares two strings. The function returns zero if the strings are equal, otherwise it will return a negative or a positive value depending on which comes first lexicographically.
- relational operator - compares two strings. The comparison will return a Boolean value that indicates whether the two strings satisfies the operation (equals, not equal to, less than, greater than, less than or equal to and greater than or equal to) lexicographically.
- Concatenate - concatenates two, or more (depending on the implementation), strings
- Length - returns the length of the string.
- Left substring - returns a substring with a specified length, starting from the beginning of the string.
- Right substring - returns a substring with a specified length, starting from the end of the string towards the beginning.
- Substring - returns a substring with a specified length starting from the specified index.
- Find character in string - returns the index of the first character in a string that matches a specified character.
- Last find character in string - returns the index of the last character in a string that matches a specified character.
- Split - splits a string into a list of strings, given the string is separated by a set of separators (strings).
- Join - concatenates a list of strings into a single string, separated by a separator string.
- Convert to lowercase - converts all characters to lowercase
- Convert to uppercase - converts all characters to uppercase
- Replace - replaces all instances of a string a with b in a string.
- Trim - removes whitespace or characters from a specified set from either the beginning or the end of a string.
See also[edit | edit source]