Variable/Type

From Jonathan Gardner's Tech Wiki
Jump to: navigation, search

Static vs. Dynamic

Some languages are statically typed, while others are dynamically typed.

Common Types

Here is a list of the common types.

Type Mismatch

In the discussion below, the context with which it must be considered is the Type Mismatch bug. This bug occurs when the wrong type is passed to the wrong bit of code. Although in human languages this isn't generally a problem (we try to make things fit even going so far as to allow metaphor), it is a problem for computers because they are too simple and have no imagination whatsoever.

There are two important aspects of the Type Mismatch bug in terms of programming:

  • When can we see that there is a type mismatch?
  • What can we do about it?

Some typing systems identify type mismatches earlier than others, even at compile time. These languages can even allow plugins in your favorite text editor that can identify type mismatch before you even compile the code. Other typing systems delay identification of it until the very moment in program execution when it occurs.

Concerning what to do about it, there are a few options. On the one hand, the code can throw an exception or log an error. On the other hand, it can try to convert types or make things work anyway.

Static Typing

Statically typed languages allow only certain types of values to be stored in certain variables. Once this has been determined, it can never change. (That's what 'static' means --- never changing.)

The reasons for this is several:

  • If the compiler knows what type every variable is, it can avoid strong typing and use a weak typing system.
  • There are certain optimizations the compiler can make if it knows what type the variable is. For instance, integer addition is different than float addition. If the compiler knew that two numbers were ints, it would just insert an integer addition instruction rather than branching based on the type of the variables.
  • Certain kinds of bugs can be exposed more easily if type analysis is done. Type analysis is an examination of the code to see if the variables match the function signatures. For instance, if a function expects an integer, and you try to pass a string in, this would be a bug. The benefits of this are evident in large projects. Smaller projects where the programmer can be familiar with all the component do not benefit much from this.

If you look at languages historically, statically typed languages dominated the scene until computing power increased sufficiently that dynamically typed languages became possible. Programmers tend to prefer to program in dynamically typed languages because they do not need to work so hard to make everything work.

Explicit Static Typing

Some languages require that the type of the variable be declared explicitly. This includes languages such as Java and C. Every variable must have language that indicates what the type is. This is cumbersome and a source of bugs. I do not prefer this style of programming. I find it tedious, distracting, and ultimately, harmful.

Implicit Static Typing

Languages such as Haskell can infer the type of each variable based on how it is used. Simple type analysis will reveal if the variable is being used in a wrong way. For instance, if a variable holds the results of a function that renders an int, and the variable is passed into a function that requires floats, then it will see that there is a type mismatch.

Dynamic Typing

Dynamically typed languages allow allow any value for any variable. This is equivalent to declaring all variables as type "any" in a statically typed language.

The benefits of dynamic typing are convenience. When we talk to teach other in human languages, the things we refer to do not fit in any typing system well, and so our languages are not bound by a strict typing system. This is how we think and live our lives. Computers, of course, live a completely different kind of life, so there is a mismatch impedance when translating our languages into computer languages.

Dynamic typing leads to no possibility of type analysis. Some argue (myself included) that this is a good thing. After all, any thought that we can express that can be analyzed by a computer could've been written by a computer. We are not computers. There are always going to be areas where human languages beat computer languages and we would do well to teach the computer to understand more and more of human language rather than confining ourselves to arbitrary limits.

Dynamic typing also always leaves open the possibility of a new type that the programmer has not considered. This is a form of future-proofing. For instance, if I write a math routine that uses basic math to compose things, even though I am completely ignorant of Complex Numbers or Quaternions or Octonions, the code should function when those values are passed in, provided, of course, that all of the operations are supported. Now, this could be a problem, particularly with values where the rules of algebra for real numbers do not apply, but the point is valid nonetheless. By leaving the typing system open, all code is inherently ready to adapt future language.

Of note, Inheritance is a form of dynamic typing. In the statically typed language C++, for instance, there is a vtable for every object that will point the way to the right function depending on the type of the variable --- hallmarks of a dynamic typing system.

Dynamic Dispatch

When you have a dynamically typed language, you need some form of dynamic dispatch to perform operations with the types. For instance, in a dynamic language, the addition operator could have the following behavior:

  • If it is a basic type (int, float, etc...) then perform the appropriate form of addition.
  • If it is not a basic type, then check to see if an "add" override method is defined. If so, call it.
  • If not, then attempt to convert the value into a basic type and then perform the operation.
  • Failing all of the above, throw an exception indicating that no addition operation exists for that type.

All of the above must occur at runtime. It is simply impossible for the program to know what the type is until it is confronted with the operation.

Duck Typing

Dynamic dispatch naturally leads to a system called Duck Typing. In a nutshell:

  • If it looks like a duck
  • If it quacks like a duck
  • If it waddles like a duck
  • ... then for all intents and purposes, it's a duck.

The way you write code under duck typing is you don't check for the types of objects. You just assume that you were given something of the proper type and run with it. If you try to perform an operation on the object and it's not possible, then you can throw an exception reading something like, "This is not a duck because it doesn't quack".

Thus, in dynamic typing, a lot of code is spent adapting one type to work as another.

Strong vs. Weak Typing

Languages may also have strong or weak typing. This refers to how well values retain their type.

Strong typing means that you cannot access the internals of the value nor treat the value as if it were a completely different type. These languages carry the type information as part of the value, and restrict how the value can be used based on that.

Weak typing means that you can easily access the internals of the value and treat the value as if it were a completely different type. These languages do not carry the type information as part of the value, and do not restrict how the value can be used.

Strong typing comes at a cost in both memory and processing speed. Each variable or value must have associated with it the type of the object, and every access to the value requires a lookup and comparison of it. On the other hand, weak typing has a performance advantage, since the code can be written to assume data of a particular type.

However, weak typing comes at a cost in terms of robustness. Strongly typed languages are generally more fault-tolerant and don't present as many attack vectors as weakly typed ones. A common security bug in C, for instance, is to store a string that does not have the '/0' at the end. When C goes to process that string, since it doesn't see the '/0', it continues to read well beyond the intended length of the string. To compensate for this, the standard C library includes functions that take a string and a length, the length being a sort of strong typing mechanism.

Python is a dynamic, strong type language. C is a static, weak language. Java is a static, semi-strong language. (Some values are strong, while others are not.)

Analysis of Strong vs. Weak, Dynamic vs. Static

Python is a dynamic, strong type language. C is a static, weak language. Java is a static, semi-strong language. (Some values are strong, while others are not.)

Weak, static typing is the hallmark of languages that were developed before memory and CPU cycles were cheap. Dynamic, strong typing languages came into being as computing resources increased.

As of today, I believe that dynamic, strong languages are more productive than the alternatives. However, Haskell is rocking the boat, showing that static languages can be useful, as long as the syntax allows a proper typing scheme and the compiler does the vast majority of the work. The Python community, under Guido's leadership, seems to be leading towards an optionally static typing system.

Type Conversion

No matter what system is used, a common practice that is universal is the process of type conversion. This is when one type of value is converted to another.

Lossless vs. Lossy Conversion

When converting from one type to another, you must remember that some types cannot contain all the possible values of other types. For instance, an integer cannot hold a float (unless the float has no fractional component and resides within the int range.)

Lossy conversions are not necessarily bad. For instance, I don't want to see "3.3000000001", I'd much rather see "3.3" or even "3". This is because humans are generally not interested in the minute details of the system they are looking at. In other cases, the difference between one value or another is insignificant. For instance, the different between a float and a double is basically nothing if you are drawing graphics on a screen.

Implicit Type Conversion

Many languages offer implicit type conversions, especially when the conversion will not result in a loss of data. The most common example of this is the conversion of ints to floats. Since all of the ints can be represented exactly by a float, there is no loss. However, some languages also implicitly perform lossy type conversions.