Reserved word

From Wikipedia, the free encyclopedia

In a computer language, a reserved word (also known as a reserved identifier) is a word that cannot be used as an identifier, such as the name of a variable, function, or label – it is "reserved from use". This is a syntactic definition, and a reserved word may have no user-defined meaning.

A closely related and often conflated notion is a keyword, which is a word with special meaning in a particular context. This is a semantic definition. By contrast, names in a standard library but not built into a language are not considered reserved words or keywords. The terms "reserved word" and "keyword" are often used interchangeably – one may say that a reserved word is "reserved for use as a keyword" – and formal use varies from language to language. For this article, we distinguish as above.

In general reserved words and keywords need not coincide, but in most modern languages keywords are a subset of reserved words, as this makes parsing easier, since keywords cannot be confused with identifiers. In some languages, like C or Python, reserved words and keywords coincide, while in other languages, like Java, all keywords are reserved words, but some reserved words are not keywords, being reserved for future use. In yet other languages, such as the older languages ALGOL, FORTRAN, and PL/I, there are keywords but no reserved words, with keywords being distinguished from identifiers by other means.

Distinction[edit]

The sets of reserved words and keywords in a language often coincide or are almost equal, and the distinction is subtle, so the terms are often used interchangeably. However, in careful use they are distinguished.

Making keywords be reserved words makes lexing easier, as a string of characters will unambiguously be either a keyword or an identifier, without depending on context; thus keywords are usually a subset of reserved words. However, reserved words need not be keywords. For example, in Java, goto is a reserved word, but has no meaning and does not appear in any production rules in the grammar. This is usually done for forward compatibility, so a reserved word may become a keyword in a future version without breaking existing programs.

Conversely, keywords need not be reserved words, with their role understood from context, or they may be distinguished in another manner, such as by stropping. For example, the phrase if = 1 is unambiguous in most grammars, since a control statement of an if clause cannot start with an =, and thus is allowed in some languages, such as FORTRAN. Alternatively, in ALGOL 68, keywords must be stropped – marked in some way to distinguished – in the strict language by listing in bold, and thus are not reserved words. Thus in the strict language the following expression is legal, as the bold keyword if does not conflict with the ordinary identifier if:

if if eq 0 then 1 fi

However, in ALGOL 68 there is also a stropping regime in which keywords are reserved words, an example of how these distinct concepts often coincide; this is followed in many modern languages.

Syntax[edit]

A reserved word is one that "looks like" a normal word, but is not allowed to be used as a normal word. Formally this means that it satisfies the usual lexical syntax (syntax of words) of identifiers – for example, being a sequence of letters – but cannot be used where identifiers are used. For example, the word if is commonly a reserved word, while x generally is not, so x = 1 is a valid assignment, but if = 1 is not.

Keywords have varied uses, but mainly fall into a few classes: part of the phrase grammar (specifically a production rule with nonterminal symbols), with various meanings, often being used for control flow, such as the word if in most procedural languages, which indicates a conditional and takes clauses (the nonterminal symbols); names of primitive types in a language that support a type system, such as int; primitive literal values such as true for Boolean true; or sometimes special commands like exit. Other uses of keywords in phrases are for input/output, such as print.

The distinct definitions are clear when a language is analyzed by a combination of a lexer and a parser, and the syntax of the language is generated by a lexical grammar for the words, and a context-free grammar of production rules for the phrases. This is common in analyzing modern languages, and in this case keywords are a subset of reserved words, as they must be distinguished from identifiers at the word level (hence reserved words) to be syntactically analyzed differently at the phrase level (as keywords).

In this case reserved words are defined as part of the lexical grammar, and are each tokenized as a separate type, distinct from identifiers. In conventional notation, the reserved words if and then for example are tokenized as types IF and THEN, respectively, while x and y are both tokenized as type Identifier.

Keywords, by contrast, syntactically appear in the phrase grammar, as terminal symbols. For example, the production rule for a conditional expression may be IF Expression THEN Expression. In this case IF and THEN are terminal symbols, meaning "a token of type IF or THEN, respectively" – and due to the lexical grammar, this means the string if or then in the original source. As an example of a primitive constant value, true may be a keyword representing the boolean value "true", in which case it should appear in the grammar as a possible expansion of the production BinaryExpression, for instance.

Reserved ranges[edit]

Beyond reserving specific lists of words, some languages reserve entire ranges of words, for use as private spaces for future language version, different dialects, compiler vendor-specific extensions, or for internal use by a compiler, notably in name mangling.

This is most often done by using a prefix, often one or more underscores. C and C++ are notable in this respect: C99 reserves identifiers that start with two underscores or an underscore followed by an uppercase letter, and further reserves identifiers that start with a single underscore (in the ordinary and tag spaces) for use in file scope;[1] with C++03 further reserves identifiers that contain a double underscore anywhere[2] – this allows the use of a double underscore as a separator (to connect user identifiers), for instance.

The frequent use of a double underscores in internal identifiers in Python gave rise to the abbreviation dunder; this was coined by Mark Jackson[3] and independently by Tim Hochberg,[4] within minutes of each other, both in reply to the same question in 2002.[5][6]

Specification[edit]

The list of reserved words and keywords in a language are defined when a language is developed, and both form part of a language's formal specification. Generally one wishes to minimize the number of reserved words, to avoid restricting valid identifier names. Further, introducing new reserved words breaks existing programs that use that word (it is not backwards compatible), so this is avoided. To prevent this and provide forward compatibility, sometimes words are reserved without having a current use (a reserved word that is not a keyword), as this allows the word to be used in future without breaking existing programs. Alternatively, new language features can be implemented as predefineds, which can be overridden, thus not breaking existing programs.

Reasons for flexibility include allowing compiler vendors to extend the specification by including non-standard features, different standard dialects of language to extend it, or future versions of the language to include additional features. For example, a procedural language may anticipate adding object-oriented capabilities in a future version or some dialect, at which point one might add keywords like class or object. To accommodate this possibility, the current specification may make these reserved words, even if they are not currently used.

A notable example is in Java, where const and goto are reserved words — they have no meaning in Java but they also cannot be used as identifiers. By reserving the terms, they can be implemented in future versions of Java, if desired, without breaking older Java source code. For example, there was a proposal in 1999 to add C++-like const to the language, which was possible using the const word, since it was reserved but currently unused; however, this proposal was rejected – notably because even though adding the feature would not break any existing programs, using it in the standard library (notably in collections) would break compatibility.[7] JavaScript also contains a number of reserved words without special functionality; the exact list varies by version and mode.[8]

Languages differ significantly in how frequently they introduce new reserved words or keywords and how they name them, with some languages being very conservative and introducing new keywords rarely or never, to avoid breaking existing programs, while other languages introduce new keywords more freely, requiring existing programs to change existing identifiers that conflict. A case study is given by new keywords in C11 compared with C++11, both from 2011 – recall that in C and C++, identifiers that begin with an underscore followed by an uppercase letter are reserved:[9]

The C committee prefers not to create new keywords in the user name space, as it is generally expected that each revision of C will avoid breaking older C programs. By comparison, the C++ committee (WG21) prefers to make new keywords as normal‐looking as the old keywords. For example, C++11 defines a new thread_local keyword to designate static storage local to one thread. C11 defines the new keyword as _Thread_local. In the new C11 header <threads.h>, there is a macro definition to provide the normal‐looking name:[10]

#define thread_local _Thread_local

That is, C11 introduced the keyword _Thread_local within an existing set of reserved words (those with a certain prefix), and then used a separate facility (macro processing) to allow its use as if it were a new keyword without any prefixing, while C++11 introduce the keyword thread_local despite this not being an existing reserved word, breaking any programs that used this, but without requiring macro processing.

Predefined names[edit]

A related notion to reserved words are predefined functions, methods, subroutines, types, or variables, particularly library routines from the standard library. These are similar in that they are part of the basic language, and may be used for similar purposes. However, these differ in that the name of one of these entities is typically categorized as an identifier instead of a reserved word, and is not treated specially in the syntactic analysis. Further, reserved words may not be redefined by the programmer, but predefineds can often be overridden for the extent of some scope.

Languages vary as to what is provided as a keyword and what is a predefined. Some languages, for instance, provide keywords for input/output operations whereas in others these are library routines. In Python (versions earlier than 3.0) and many BASIC dialects, print is a keyword. In contrast, the C, Lisp, and Python 3.0 equivalents printf, format, and print are functions in the standard library. Similarly, in Python prior to 3.0, None, True, and False were predefined variables, but not reserved words, but in Python 3.0 they were made into reserved words.[11]

Definition[edit]

Some use the terms "keyword" and "reserved word" interchangeably, while others distinguish usage, say by using "keyword" to mean a word that is special only in certain contexts but "reserved word" to mean a special word that cannot be used as a user-defined name. The meaning of keywords, and the meaning of the notion of keyword, differs widely from language to language. Concretely, in ALGOL 68, keywords are stropped (in the strict language, written in bold) and are not reserved words – the unstropped word can be used as an ordinary identifier.

The "Java Language Specification" uses the term "keyword".[12] The ISO 9899 standard for the C language uses the term "keyword".[13]

In many languages, such as C and similar environments like C++, a keyword is a reserved word which identifies a syntactic form. Words used in control flow constructs, such as if, then, and else are keywords. In these languages, keywords cannot also be used as the names of variables or functions.

In some languages, such as ALGOL and ALGOL 68, keywords cannot be written verbatim, but must be stropped. This means that keywords must be marked somehow. E.g. by quoting them or by prefixing them by a special character. As a consequence, keywords are not reserved words, and thus the same word can be used for as a normal identifier. However, one stropping regime was to not strop the keywords, and instead have them simply be reserved words.

Some languages, such as PostScript, are extremely liberal in this approach, allowing core keywords to be redefined for specific purposes.

In Common Lisp, the term "keyword" (or "keyword symbol") is used for a special sort of symbol, or identifier. Unlike other symbols, which usually stand for variables or functions, keywords are self-quoting and self-evaluating[14]:98 and are interned in the KEYWORD package.[15] Keywords are usually used to label named arguments to functions, and to represent symbolic values. The symbols which name functions, variables, special forms and macros in the package named COMMON-LISP are basically reserved words. The effect of redefining them is undefined in ANSI Common Lisp.[16] Binding them is possible. For instance the expression (if if case or) is possible, when if is a local variable. The leftmost if refers to the if operator; the remaining symbols are interpreted as variable names. Since there is a separate namespace for functions and variables, if could be a local variable. In Common Lisp, however, there are two special symbols which are not in the keyword package: the symbols t and nil. When evaluated as expressions, they evaluate to themselves. They cannot be used as the names of functions or variables, so are de facto reserved. (let ((t 42))) is a well-formed expression, but the let operator will not permit the usage.

Typically, when a programmer attempts to use a keyword for a variable or function name, a compilation error will be triggered. In most modern editors, the keywords are automatically set to have a particular text colour to remind or inform the programmers that they are keywords.

In languages with macros or lazy evaluation, control flow constructs such as if can be implemented as macros or functions. In languages without these expressive features, they are generally keywords.

Comparison by languages[edit]

Different languages often have widely varying numbers of reserved words. For example, COBOL has about 400. Java, and other C derivatives, have a rather sparse set, about 50. Pure Prolog and PL/I have none.

Disadvantages[edit]

Definition of reserved words in a language raises problems. The language may be difficult for new users to learn because of a long list of reserved words to memorize which can't be used as identifiers. It may be difficult to extend the language because addition of reserved words for new features might invalidate existing programs or, conversely, "overloading" of existing reserved words with new meanings can be confusing. Porting programs can be problematic because a word not reserved by one system or compiler might be reserved by another.

Because reserved words cannot be used as identifiers, users may choose deliberate misspellings of reserved words as identifiers instead, such as clazz for Java variables of type Class.[17]

Reserved words and language independence[edit]

Microsoft's .NET Common Language Infrastructure (CLI) specification allows code written in 40+ different programming languages to be combined into a final product. Because of this, identifier/reserved word collisions can occur when code implemented in one language tries to execute code written in another language. For example, a Visual Basic (.NET) library may contain a class definition such as:

' Class Definition of This in Visual Basic.NET:

Public Class this
        ' This class does something...
End Class

If this is compiled and distributed as part of a toolbox, a C# programmer, wishing to define a variable of type "this" would encounter a problem: 'this' is a reserved word in C#. Thus, the following will not compile in C#:

// Using This Class in C#:

this x = new this();  // Won't compile!

A similar issue arises when accessing members, overriding virtual methods, and identifying namespaces.

This is resolved by stropping. To work around this issue, the specification allows placing (in C#) the at-sign before the identifier, which forces it to be considered an identifier rather than a reserved word by the compiler:

// Using This Class in C#:

@this x = new @this();  // Will compile!

For consistency, this use is also permitted in non-public settings such as local variables, parameter names, and private members.

See also[edit]

References[edit]

  1. ^ C99 specification, 7.1.3 Reserved identifiers
  2. ^ C++03 specification, 17.4.3.2.1 Global names [lib.global.names]
  3. ^ Jackson, Mark (September 26, 2002). "How do you pronounce "__" (double underscore)?". python-list (Mailing list). Retrieved November 9, 2014.
  4. ^ Hochberg, Tim (Sep 26, 2002). "How do you pronounce "__" (double underscore)?". python-list (Mailing list). Retrieved November 9, 2014.
  5. ^ "DunderAlias - Python Wiki". wiki.python.org.
  6. ^ Notz, Pat (Sep 26, 2002). "How do you pronounce "__" (double underscore)?". python-list (Mailing list). Retrieved November 9, 2014.
  7. ^ "Bug ID: JDK-4211070 Java should support const parameters (like C++) for code maintainence [sic]". Bugs.sun.com. Retrieved 2014-11-04.
  8. ^ "Lexical grammar - JavaScript | MDN". developer.mozilla.org. 8 November 2023.
  9. ^ C99 specification, 7.1.3 Reserved identifiers: "All identifiers that begin with an underscore and either an uppercase letter or another underscore are always reserved for any use."
  10. ^ C11:The New C Standard, Thomas Plum, "A Note on Keywords"
  11. ^ "The story of None, True and False (and an explanation of literals, keywords and builtins thrown in)", The History of Python, November 10, 2013, Guido van Rossum
  12. ^ "The Java Language Specification, 3rd Edition, Section 3.9: Keywords". Sun Microsystems. 2000. Retrieved 2009-06-17. The following character sequences, formed from ASCII letters, are reserved for use as keywords and cannot be used as identifiers[...]
  13. ^ "ISO/IEC 9899:TC3, Section 6.4.1: Keywords" (PDF). International Organization for Standardization JTC1/SC22/WG14. 2007-09-07. The above tokens (case sensitive) are reserved (in translation phases 7 and 8) for use as keywords, and shall not be used otherwise.
  14. ^ Peter Norvig: Paradigms of Artificial Intelligence Programming: Case Studies in Common Lisp, Morgan Kaufmann, 1991, ISBN 1-55860-191-0, Web
  15. ^ Type KEYWORD from the Common Lisp HyperSpec
  16. ^ "CLHS: Section 11.1.2.1.2". www.lispworks.com.
  17. ^ Zammetti, Frank (2007). Practical JavaScript, DOM Scripting and Ajax Projects. Apress. ISBN 9781430201977.