Data Structures

From Jonathan Gardner's Tech Wiki
Jump to: navigation, search

Introduction

Common and useful data structures are catalogued here.

What is a data structure?

A data structure is a way of structuring data. That is, taking bits and bytes and turning it into something meaningful to the programmer.

Basic Data Types

Null

The most basic data structure is the null data structure. This is usually represented with a pointer to nowhere or a special object.

The interpretation of Null is different in different applications. Some have several separate null values with different meanings, such as "no value", "invalid value", or "unknown value".

Boolean

A boolean is a special number that is either true or false. It can be no other value.

See boolean math for all the rules about this fascinating fellow.

Unsigned Integer

Signed Integer

Floating Point

Decimal

Character / Code Point

String

Pointer

Reference

1-Dimensional Data Types

Pair / Cons Cell

A pair (AKA cons cell) has only two "things"--- a pair of something.

Vector

Linked Lists

Chunky Lists (?)

A linked list of vectors. Each part may contain 1 or more items.

Iterator

  • Current index or last spot
  • Instructions on how to find the next spot.

Generator

This blurs the line between data structures and algorithms.

A generator is like an iterator, except behind it may not be anything at all. It is what you get if you simply store instructions on how to find the next item in the list.

Set

A list (vector or otherwise) or unordered but unique items.

Multiset

A set where we count the number of items. So it is like a set of pairs where each item is the actual item being stored and then an unsigned integer of how many items there are.

N-Dimensional Data Types

Graph

A graph is a data structure of nodes that point to zero or more other nodes.

A good example of a graph is the network of cities in the world. From each city, you can travel to other cities via roads, airplanes, boats, etc... This is how the nodes are connected to each other.

Graphs are extremely useful in their own right, especially for problems that involve exploring or finding optimum routes. If you think of the nodes as every possible position, and can see how you can go from one position to the next in small, finite steps, then you can describe the problem as finding how to go between two arbitrary points on an arbitrary graph.

Directed Graph

The directed graph is a type of graph where the nodes point to each other but in only one way. If you travel from node A to node B, you may not be able to get back the same way unless there is a reciprocating connection.

Acyclic Directed Graph / Tree

Acyclic directed graphs (trees) are graphs where you cannot travel and arrive back at the same node for every node in the graph. Determining whether a graph is acyclic isn't very hard to describe, but is a difficult thing to do in practice since it involves exploring the entire graph from each node.

Once you have a Tree, however, you can do a number of useful things with it.

Search Tree

A search tree is a tree that stores data in an organized way so that searches are very fast. The best a search tree can hope to do is give you logarithmic time. By comparison, Hash maps can find items in constant time.

Binary Tree

A binary tree is a tree where nodes have 0, 1, or 2 child nodes.

Map

A map is a set of pairs of keys and values. Each of the keys are unique. They are used to retrieve or store values.

There are two common ways to implement maps. The easiest way is to represent it as a list of pairs. If you want to find a key, look at all the pairs until you find the one you are interested in.

The other way is to use a hash table. (See below.)

You could also store a map with a search tree of key-value pairs.

Hash Table

A hash table requires a hashing function that will convert any key into a number. The way to setup the hash table is to create a vector of N entries. Each entry is a linked list of key-value pairs.

To store or retrieve a key in a hash table, look at the hash value of the key according to the hashing function. Then take the modulo of the number of items in the vector. Then you walk down the linked list of key-value pairs until you find the key you are looking for (or discover it isn't there at all.) This can be a very, very fast operation for large maps.

Hash tables can be resized by reallocating all of the keys.

Relational Databases Table

This data structure is so common because it is extremely useful and easy to optimize.

A table is:

  • A list of columns with associated types.
  • A list of rows that are lists of values for each of the columns.

This simple data type is used to represent almost everything you find in a database, especially sequences and indexes. In fact, the descriptions of tables and types are stored as a table in modern databases. It is trivial to invent a table that could store tables as well!

Objects

Attributes

Methods

Inheritance

One Data Structure to rule them all?

Lisp and related languages have had tremendous success by choosing the fundamental data type of the "pair". Most other languages start with the integer and build up from there, and then bolt on pointers to allow any data type. These other languages end up having a very complicated implementation compared to the Lisp languages.