Infiniweb

From Jonathan Gardner's Tech Wiki
Jump to: navigation, search

Abstract

This is an idea I have, but I don't quite know how to implement it yet. The basic concept is that you can setup a website and then start adding stuff to it through the website. It uses some sort of persistence so that you can do things like parallelization and scaling. But you can also load other people's code into your site. All without needing anything from the command line.

Architecture

There are three basic components:

  1. The web client. (Of course, but this is too often an afterthought.)
  2. The web server.
  3. The data persistence server.

Let's talk about the three parts and how they work.

The Client

The client can be a web browser (Firefox or IE), or it can be an application somewhere. The requests may be requests like, "render a page for me", but it could be things as simple as "give me this file I need to cache" or even "give me a tiny bit of data."

The Server

The server is a very simple piece of software. All it does is setup the absolute minimum and then loads all the fun stuff from the persistence server. As new data structures and programs are added to the server, it is really just shuffling them off to the persistence server.

My goal is to make this system so that the server can be anything.

The Persistence Server

This part's job is to remember the last state of the server. It not only keeps track of stuff databases typically keeps track of, but it also keeps track of the code that powers the server, as well as templates and files.

It could be a SQL database, like PostgreSQL or whatnot. Or it could be something like memcached. Or it could be a set of BDB files, or even flat files on the local hard drive. I would prefer it was something network-enabled so that it could be shared among multiple servers.

How it Works

This is the part I don't have quite figured out yet.

I imagine a request coming over the HTTP channel. The request flows through a pipeline of handlers not unlike what Apache does. Each of these handlers are written down in the persistence store, so they have to be loaded and cached on the server.

Eventually it boils down to some file somewhere---or at least it would if this were Apache or Lighttpd. But this being what it is, it would boil down to yet another object on the persistence server that is translated and mogrified into the final result, which is passed back as the response.

For communication between the components, something like WSGI is preferred. I think Pylons has already proven that you can build cool webapps by having the glue between them be as simple as possible.

Language

For the first server, I am going to try Python. I just know so much about Python and it is so flexible. It is also an easy language for people to learn.

Modules

Modules won't come from the hard drive. Instead, they will come from the persistence store. I can override the behavior of import with code like the following:

old_import = __builtins__.__import__
def new_import(...):
    ...
__builtins__.__import__ = new_import

Objects

There are a number of ways to work this. Let's explore a few.

Do Nothing

In this instance, we build in as little logic as possible for storing the server state. What little we do store is in a single dict called the configuration. Or better yet, we simply store a startup module that is used to start the server and that startup module is tasked with restoring everything, including the configuration.

Pickle

We can also use the pickle in Python. This is a powerful method, but it isn't simple. If modules and such change, then we have some problems. However, it is possible to use a version attribute in the object to do intelligent things.

Data-Driven

In this case, we store the objects in some sort of structured database. The data tells the application how to restore the objects.

Startup

This is one method I have imagined. Each server has a name that it starts up with. This is the name of the module to load on startup. This module will load all the dependent modules and at the same time get the server started. For instance, it can be something as simple as:

import BaseHTTPServer
BaseHTTPServer.HTTPServer((, 8001), BaseHTTPServer.HTTPRequestHandler).serve_forever()

Updates

Updates are performed by modifying the code and then re-starting the server. A restart is simply a full shutdown and then another startup. How is this done? Well, we really need to kill the entire python process and then start it up again. There could be a wrapping script that starts the process. When the process exits, it checks the return value. If it signals, "start again", then it starts it up again right away. If it is "kill" then it kills it. If it is "error", then it starts up the basic server with the error message so that someone can debug it.

Sticking Points

These are the parts I am having trouble solving.

Persistence

In Python, and in most languages, persistence of objects in memory is really two parts. The first part is the code or the libraries that are compiled and sit as files somewhere. The second part is the data that completes the code turning it into a full-fledged object. This data comes from the database.

I want to open people's minds to the fact that code is really data and data is really code. There has got to be some way to not only store the code in the persistence layer, but make it available to be manipulated during run-time.

This rules out languages like C and Java, almost right off the bat. Python is more possible, but you have to do some serious hacking. I think something like Scheme or Lisp is going to be the only possible solution for this. Since these languages already define the code as data, storing the code in the DB and reading it back is trivial.

If I were to do something like Python, it would look like this.

During startup, the server would connect to the persistence server and retrieve the configuration file. This is code that details what is needed. It then starts loading the necessary things and then warms up everything.

When the user wants to change something, he has to load an existing file or create a new one and modify it. The modifications are stored in the database, perhaps with some kind of version control. Once he has it setup, then he has to restart it to load the files once again. I had imagined that it would be possible to change class definitions on-the-fly, but really, if this is to work you have to define routines that will move from an existing object to the new object, as well as the routines that will start the object up from the data store. Rather than double the amount of work, I figured it's just easier to handle the start routine.

Versioning is important! I can't imagine someone feeling comfortable messing with a server without some way to test the changes and go back to the old version if something breaks. If something does break, there needs to be some sort of "emergency operation" mode where the user can fix stuff. But how can you tell when something is misbehaving enough that you have to switch back to "emergency operation" mode? Perhaps we allow the user to toggle a switch or something, or leave the emergency system available at all times without the opportunity to modify it.

Interface

How does one get inside to manipulate stuff? Either I hard-code the access mechanism, or I leave that as open as the rest of the system. I think the latter is the right solution. That is, the server will come with a default configuration that is open, but users can replace that with their own thing easily.

Use Cases

This is how I envision the app being used.

Jimmy isn't very technical, but he knows enough to get started. He downloads the server and gets it installed and running. Then he logs on with the password he had setup. The first thing he sees is a welcome screen telling Jimmy what he can do next.

Jimmy decides to download some modules for his brand new machine, so he hands out the URLs to the modules. Because the modules are specific by URL, the machine can fetch whatever it needs and store the right versions of dependencies on its end. The machine will check, from time to time, when new versions of compatible software is available and upgrade under the direction of Jimmy.

Some of the software has conflicting requirements. Foobar-3.5 requires Baz-2.1, but Frundle-0.5 needs the newer Baz-2.5. This is fine, because having multiple versions of the same software running isn't the end of the world.

Each of the software has a dizzying number of parameters and configurations. Some configurations are shared among the entire system, but some are specific to the software. However, there is a common system for configuration that makes Jimmy's life easy.

Once Jimmy gets his machine running, he opens up the server to incoming traffic. The server performs beautifully.

Jimmy decides he wants to tweak one of the modules. He makes a copy of it and starts modifying the files. Then he installs it at a location of the site and admires his hard work. When he gets it working, he opens up traffic to the location--the beta site. When he is pleased with it, he moves it to the production site.

Looking at this list of requirements, there are already people doing things like this. I just want to do it better, with less effort on the part of the user.

Is this different from a Linux box?

Is this really that different from what Linux can be, or is right now (depending on the skil level of the user?)

I think it is. First, we are limiting, drastically, what the server can do by keeping it in the realm of HTTP. Perhaps we can add different protocols such as SMTP or IMAP or POP. But really, at its core it is an HTTP server.

Second, we are really dictating what software is installed and how. We are guiding software developers to write their software in a way that will make it easy to upgrade and build on other's work.

Trying to Build a Universal Scheme

I think that I can divide the universe into distinct realms.

(1) Modules or packages. These contain code, a lot of it. (2) Request handlers. These are special objects that are expected to be able to handle the simple request interface. (3) Files. These are basically one giant string with some metadata attached, such as file type. These are request handlers. (4) Folders. These organize files. These are request handlers. (5) Configuration. This is the description of how the website is setup. It tells (a) what modules there are, (b) what request handlers these create.

But can we go simpler?

(a) Objects. (b) Code. (c) Data.

Or even easier?

(a) Objects.

This is probably where we should begin. Each object has a type and data. The data is a set of name:value pairs, where the names are strings and the values are objects. The type is an object as well, with its own type and data.

But we don't have to do class-based programming. Prototype OO is just as feasible. All objects are simply objects. They may have a parent which acts as the class but is really just "where to go looking if you don't find something here" thing.

There needs to be some special objects defined.

(a) The configuration function. This is called when the server is started. It sets up the required universe of objects. (b) The root request handler object. This is called for every request. What it does is entirely up to the object. (c) The global namespace object. (Can we do without this? Probably not.) This is the top-level of everything.

If this is the way we work, then loading new modules is quite easy. The modules load into the "module" object with their special name and version. Each module that has a dependency on another module simply defines that as a local variable within the module itself.

Modules are really files that are executed when they are looked up.

This appears to be what Javascript is, and most likely, we can simply use Javascript for the server language.

Building a Server That Can't Fail

This part is where I think about what is needed to make sure the user never has to go hunting around on the operating system to fix the web server.

The server operates in two modes: (1) failsafe (2) normal. If the server encounters a serious error that causes it to stop working, then it is started in failsafe mode.

The failsafe mode is given only three parameters: the normal startup file, the port, and the administration password.

It boots with enough features intact to allow the user to figure out what went wrong and fix it. For instance, it will keep track of the last stacktrace and then show it to the user in a way they can figure out what went wrong. From the failsafe mode, they can bootstrap the server by executing the the startup file. They can also execute the startup file in debug mode, where the server can be debugged when it encounters the catastrophic failure.

The failsafe mode is the default, initial state of the server. It is advanced enough that new modules can be added, commands can be executed, etc...

Giving Up

I am giving up on this. The reason why is that there are times when you want to work with .so's on the local machine. Really, you have to live in a file-centric world.

The ideas I like from this:

(1) Configuration doesn't live locally. It is stored somewhere else where it can be easily manipulated. (2) The code doesn't live locally. Again, it is something from somewhere else. (3) A web server that is easy to hack on is a great idea. This is why Apache was great.

I'm going with a new idea: the Service-based Web Server.