Programming languages


Programming languages

The different notations used to communicate algorithms to a computer. A computer executes a sequence of instructions (a program) in order to perform some task. In spite of much written about computers being electronic brains or having artificial intelligence, it is still necessary for humans to convey this sequence of instructions to the computer before the computer can perform the task. The set of instructions and the order in which they have to be performed is known as an algorithm. The result of expressing the algorithm in a programming language is called a program. The process of writing the algorithm using a programming language is called programming, and the person doing this is the programmer. See Algorithm

In order for a computer to execute the instructions indicated by a program, the program needs to be stored in the primary memory of the computer. Each instruction of the program may occupy one or more memory locations. Instructions are stored as a sequence of binary numbers (sequences of zeros and ones), where each number may indicate the instruction to be executed (the operator) or the pieces of data (operands) on which the instruction is carried out. Instructions that the computer can understand directly are said to be written in machine language. Programmers who design computer algorithms have difficulty in expressing the individual instructions of the algorithm as a sequence of binary numbers. To alleviate this problem, people who develop algorithms may choose a programming language. Since the language used by the programmer and the language understood by the computer are different, another computer program called a compiler translates the program written in a programming language into an equivalent sequence of instructions that the computer is able to understand and carry out. See Computer storage technology

Machine language

For the first machines in the 1940s, programmers had no choice but to write in the sequences of digits that the computer executed. For example, assume we want to compute the absolute value of A + B - C, where A is the value at machine address 3012, B is the value at address 3013, and C is the value at address 3014, and then store this value at address 3015.

It should be clear that programming in this manner is difficult and fraught with errors. Explicit memory locations must be written, and it is not always obvious if simple errors are present. For example, at location 02347, writing 101… instead of 111… would compute |A + B + C| rather than what was desired. This is not easy to detect.

Assembly language

Since each component of a program stands for an object that the programmer understands, using its name rather than numbers should make it easier to program. By naming all locations with easy-to-remember names, and by using symbolic names for machine instructions, some of the difficulties of machine programming can be eliminated. A relatively simple program called an assembler converts this symbolic notation into an equivalent machine language program.

The symbolic nature of assembly language greatly eased the programmer's burden, but programs were still very hard to write. Mistakes were still common. Programmers were forced to think in terms of the computer's architecture rather than in the domain of the problem being solved.

High-level language

The first programming languages were developed in the late 1950s. The concept was that if we want to compute |A + B - C|, and store the result in a memory location called D, all we had to do was write D = |A + B - C| and let a computer program, the compiler, convert that into the sequences of numbers that the computer could execute. FORTRAN (an acronym for Formula Translation) was the first major language in this period.

FORTRAN statements were patterned after mathematical notation. In mathematics the = symbol implies that both sides of the equation have the same value. However, in FORTRAN and some other languages, the equal sign is known as the assignment operator. The action carried out by the computer when it encounters this operator is, “Make the variable named on the left of the equal sign have the same value as the expression on the right.” Because of this, in some early languages the statement would have been written as -DD to imply movement or change, but the use of → as an assignment operator has all but disappeared.

The compiler for FORTRAN converts that arithmetic statement into an equivalent machine language sequence. In this case, we did not care what addresses the compiler used for the instructions or data, as long as we could associate the names A, B, C, and D with the data values we were interested in.

Structure of programming languages

Programs written in a programming language contain three basic components: (1) a mechanism for declaring data objects to contain the information used by the program; (2) data operations that provide for transforming one data object into another; (3) an execution sequence that determines how execution proceeds from start to finish.

Data declarations

Data objects can be constants or variables. A constant always has a specific value. Thus the constant 42 always has the integer value of forty-two and can never have another value. On the other hand, we can define variables with symbolic names. The declaration of variable A as an integer informs the compiler that A should be given a memory location much like the way the variable A in example (2) was given the machine address 03012. The program is given the option of changing the value stored at this memory location as the program executes.

Each data object is defined to be of a specific type. The type of a data object is the set of values the object may have. Types can generally be scalar or aggregate. An object declared to be a scalar object is not divisible into smaller components, and generally it represents the basic data types executable on the physical computer. In a data declaration, each data object is given a name and a type. The compiler will choose what machine location to assign for the declared name.

Data operations

Data operations provide for setting the values into the locations allocated for each declared data variable. In general this is accomplished by a three-step process: a set of operators is defined for transforming the value of each data object, an expression is written for performing several such operations, and an assignment is made to change the value of some data object.

For each data type, languages define a set of operations on objects of that type. For the arithmetic types, there are the usual operations of addition, subtraction, multiplication, and division. Other operations may include exponentiation (raising to a power), as well as various simple functions such as modula or remainder (when dividing one integer by another). There may be other binary operations involving the internal format of the data, such as binary and, or, exclusive or, and not functions. Usually there are relational operations (for example, equal, not equal, greater than, less than) whose result is a boolean value of true or false. There is no limit to the number of operations allowed, except that the programming language designer has to decide between the simplicity and smallness of the language definition versus the ease of using the language.

Execution sequence

The purpose of a program is to manipulate some data in order to produce an answer. While the data operations provide for this manipulation, there must be a mechanism for deciding which expressions to execute in order to generate the desired answer. That is, an algorithm must trace a path through a series of expressions in order to arrive at an answer. Programming languages have developed three forms of execution sequencing: (1) control structures for determining execution sequencing within a procedure; (2) interprocedural communication between procedures; and (3) inheritance, or the automatic passing of information between two procedures.

Corrado Böhm and Giuseppi Jacopini showed in 1966 that a programming language needs only three basic statements for control structures: an assignment statement, an IF statement, and a looping construct. Anything else can simplify programming a solution, but is not necessary. If we add an input and an output statement, we have all that we need for a programming language. Languages execute statements sequentially with the following variations to this rule.

IF statement. Most languages include the IF statement. In the IF-THEN statement, the expression is evaluated, and if the value is true, then Statement1 is executed next. If the value is false, then the statement after the IF statement is the next one to execute. The IF-THEN-ELSE statement is similar, except that specific true and false options are given to execute next. After executing either the THEN or ELSE part, the statement following the IF statement is the next one to execute.

The usual looping constructs are the WHILE statement and the REPEAT statement. Although only one is necessary, languages usually have both.

Inheritance is the third major form of execution sequencing. In this case, information is passed automatically between program segments. This is the basis for the models used in the object-oriented languages C++ and Java.

Inheritance involves the concept of a class object. There are integer class objects, string class objects, file class objects, and so forth. Data objects are instances of these class objects. Objects inherit the properties of the objects from which they were created. Thus, if an integer object were designed with the methods (that is, functions) of addition and subtraction, each instance of an integer object would inherit those same functions. One would only need to develop these operations once and then the functionality would pass on to the derived object.

All objects are derived from one master object called an Object. An Object is the parent class of objects such as magnitude, collection, and stream. Magnitude now is the parent of objects that have values, such as numbers, characters, and dates. Collections can be ordered collections such as an array or an unordered collection such as a set. Streams are the parent objects of files. From this structure an entire class hierarchy can be developed.

If we develop a method for one object (for example, print method for object), then this method gets inherited to all objects derived from that object. Therefore, there is not the necessity to always define new functionality. If we create a new class of integer that, for example, represents the number of days in a year (from 1 to 366), then this new integerlike object will inherit all of the properties of integers, including the methods to add, subtract, and print values. It is this concept that has been built into C++, Java, and current object-oriented languages.

Once we build concepts around a class definition, we have a separate package of functions that are self-contained. We are able to sell that package as a new functionality that users may be willing to pay for rather than develop themselves. This leads to an economic model where companies can build add-ons for existing software, each add-on consisting of a set of class definitions that becomes inherited by the parent class. See Object-oriented programming

Current programming language models

C was developed by AT&T Bell Laboratories during the early 1970s. At the time, Ken Thompson was developing the UNIX operating system. Rather than using machine or assembly language as in (2) or (3) to write the system, he wanted a high-level language. See Operating system

C has a structure like FORTRAN. A C program consists of several procedures, each consisting of several statements, that include the IF, WHILE, and FOR statements. However, since the goal was to develop operating systems, a primary focus of C was to include operations that allow the programmer access to the underlying hardware of the computer. C includes a large number of operators to manipulate machine language data in the computer, and includes a strong dependence on reference variables so that C programs are able to manipulate the addressing hardware of the machine.

C++ was developed in the early 1980s as an extension to C by Bjarne Stroustrup at AT&T Bell Labs. Each C++ class would include a record declaration as well as a set of associated functions. In addition, an inheritance mechanism was included in order to provide for a class hierarchy for any program.

By the early 1990s, the World Wide Web was becoming a significant force in the computing community, and web browsers were becoming ubiquitous. However, for security reasons, the browser was designed with the limitation that it could not affect the disk storage of the machine it was running on. All computations that a web page performed were carried out on the web server accessed by web address (its Uniform Resource Locator, or URL). That was to prevent web pages from installing viruses on user machines or inadvertently (or intentionally) destroying the disk storage of the user.

Java bears a strong similarity to C++, but has eliminated many of the problems of C++. The three major features addressed by Java are:

1. There are no reference variables, thus no way to explicitly reference specific memory locations. Storage is still allocated by creating new class objects, but this is implicit in the language, not explicit.

2. There is no procedure call statement; however, one can invoke a procedure using the member of class operation. A call to CreateAddress for class address would be encoded as address.CreateAddress( ).

3. A large class library exists for creating web-based objects.

The Java bytecodes (called applets) are transmitted from the web server to the client web site and then execute. This saves transmission time as the executing applet is on the user's machine once it is downloaded, and it frees machine time on the server so it can process more web “hits” effectively. See Client-server system

Visual Basic, first released in 1991, grew out of Microsoft's GW Basic product of the 1980s. The language was organized around a series of events. Each time an event happened (for example, mouse click, pulling down a menu), the program would respond with a procedure associated with that event. Execution happens in an asynchronous manner.

Although Prolog development began in 1970, its use did not spread until the 1980s. Prolog represents a very different model of program execution, and depends on the resolution principle and satisfaction of Horn clauses of Robert A. Kowalski at the University of Edinburgh. That is, a Prolog statement is of the form p:- q, r which means p is true if both q is true or r is true.

A Prolog program consists of a series Horn clauses, each being a sequence of relations concerning data in a database. Execution proceeds sequentially through these clauses. Each relation can invoke another Horn clause to be satisfied. Evaluation of a relation is similar to returning a procedure value in imperative languages such as C or C++.

Unlike the other languages mentioned, Prolog is not a complete language. That means there are algorithms that cannot be programmed in Prolog. However, for problems that are amenable for searching large databases, Prolog is an efficient mechanism for describing those algorithms. See Software, Software engineering