termkit.txt

 TermKit
+++ -
Goal: next gen terminal / command application

Addresses following problems:
1) Monospace character grid with ansi colors is not rich enough to display modern files / media / visualizations / metadata. Cannot effectively handle large output, long/wide tables or direct interaction.
2) Piping binary or text streams between apps is bad for everyone:
   * Humans have to suffer syntax, cannot reflow/manipulate output in real-time
   * Computers have to suffer ambiguities
3) Synchronous input/output makes you wait. SSH keystroke latency is frustrating.
4) String-based command line requires arcane syntax, results in mistakes, repeated attempts at escaping, etc.
5) Unix commands are "useless by default", and when asked, will only tell you raw data, not useful facts. e.g. "rwxr-xr-x" instead of "You can't edit this file."

+++ -

Programs / commands
* Output processor for common cli tools
* Custom implementation of ls and friends, with support for mimicking classic shell behaviour with a 2.0 twist
* SQL shell

Cool input scenarios:
* As you type, the command is visually tokenized and highlighted. tokens can display autocomplete suggestions, icons and indicators inline.
* Instead of quoting and escaping, keys like " and > just trigger the creation of special tokens which are visually different and represent e.g. a quoted string, an argument, a regular expression. to type the special characters literally, just press them twice. the 'command' is just the concatenation of these tokens, interpreted the same way a shell interprets a command.
* Man pages are consulted inline with autocomplete options for arguments and (later) required arguments

Cool output scenarios:
* Listings of files, anywhere, show an icon with distinguished typography for filename vs meta. Quicklook integration on the icon.
* Can complete several tasks at once asynchronously, show real-time progress for all of them together in e.g. a task-list widget.
* Command output is interactive: has elements which can be focused, selected/unselected, opened, right clicked, dragged and dropped

Good desktop citizen:
* Dragging a file on the terminal window makes a folder icon appear on the side for dropping it on the CWD. Can also drag the file into the command line to reference it as an argument.
* Can drag files, snippets, JSON/CSV off the terminal
* Tear-off tabs/windows

+++ - Roadmap

[0.1] UI prototype - DONE
  [X] simulated in safari/webkit.app
  [X] functional input line with tokenization, cursor keys and backspace/delete
  [X] functional autocomplete on tokens
  [X] simulated commands
  [X] simulated output with collapsible sections
 
[0.2] App prototype - DONE
  [X] cocoa app
  [X] webkit in single window view
  [X] design back-end protocol
  [X] node JS back-end, running separately
  [X] connect socket
  [X] run session worker
  [X] passive command implementation
  [X] JS module template, integrated both runtimes.
  [X] wrap unix executable
  [X] interactive output
 
0.3: Command suite
  [X] Redesign message protocol
  [X] Viewstream integration
  [X] 5-pipe command execution
  [X] fix tokenfield
  [X] filesystem autocomplete
  [X] OS X icon loading
  [X] inline image display
  [X] json pretty printer
  [X] json grep
  [X] pipelined commands
  [X] unix command execution
  [X] code syntax highlighting
  [ ] http get/post data piping
  [ ] command decoration
  [ ] interactive execution
  [ ] inline man-pages tips
  [ ] version control
  [ ] interactive quicklook
  [ ] wildcard handling
  [ ] regexp hinter
  

0.4: Network & Modularity
  [ ] SSH tunneling
  [ ] Stand-alone daemon
  [ ] network preview / local edit
  [ ] tabs
  [ ] split off command processor rules / autocomplete handlers into separable blocks
  [ ] server-side hook system
  [ ] add plug-in mechanism with drop-in functionality

0.5: Interactive & Unix Upgrades
  [ ] widget system
  [ ] graphing suite
  [ ] alt view for LS with usable permissions, dates, etc
  [ ] active 'top' monitor
  [ ] gnu parallel integration/clone

0.6: Theming
  [ ] server-side push css/js / integrated web
  [ ] themes / stylesheets

+++ Components

Node JS daemon = 'NodeKit'.
 + Fast enough for server work, concurrency/scaling included
 + New JS language features
 + Cross-platform on unix
 + Process / io / everything integration
 + Self-contained binary, can be included in .app
 - separate from UI / front-end, forced layer of indirection makes it unlike web programming
 - no mac-specific APIs or low-level C access

 => back-end platform, runs locally, can run remotely, or perhaps tunnel its interaction over SSH

WebKit/Cocoa front-end
 + Rich, stylable display + jQuery
 + New JS language features
 + Intimate OS X access, Obj-C, bridgeable with JS

The split:
Front-end = display, formatting, interaction. Always local. Runs in an (enhanced) webview/browser with a websocket to back-end.
Back-end:

1) Local NodeKit: Start node daemon on startup, connect using direct websocket ws://localhost:2222.
2) Remote NodeKit SSH: Daemon is running, use ssh to set up tunnel between local rand port # and remote 2222. connect local websocket to tunnel.
3) Remote NodeKit WSS: Daemon is running, use WSS to connect directly, must authenticate? don't want to replicate OpenSSH, but rudimentary auth could be useful.
4) Basic remote shell: No nodekit daemon. Only literal commands supported. Enough to execute e.g. "apt-get install termkit".

+++ UNIX execution model

In the traditional model of computing, a program has an input, an output and does some processing in between. A standard UNIX program follows this model closely, with the following pipes: standard in, out and error. Data flows in through standard in, and output flows either through the out or error pipes. This simple model allows us to chain programs together, redirect specific portions of output into log files and run commands in batch.

<img>

But however powerful, this model ignores an important factor in how programs are used, namely whether there is a person involved.

When run directly by a person, standard in is mainly hooked up to the keyboard while standard out will go directly to the display. The input will be irregular and error prone. The program will be interactive. Output will be unpredictable and formatted mainly for display purposes.

But if the program is part of a processing chain, hooked up to a data source and/or sink, the situation changes dramatically. The data is now structured, the operation of each step is predictable and certain guarantees have to be made about the formatting of the output. Programs might run for a long time, chugging away at invisible streams and passing around massive amounts of data.

These two contexts are radically different, but both are forced through the same standard pipes. Oddly enough, this multiplexing seems universally accepted and is woven throughout computing. And yet, it has some noticeable effects.

For instance, programs don't communicate anything about the data they're sending through: it's just an anonymous, binary stream. As a result, many programs have options to toggle their input/output between various formats, some for people, some for programs. Tools like git examine their environment and disable interactive mode if nobody's there to answer. Machine-friendly data may be requested in several formats. As a result, scripting together a command chain is a careful dance of matching impedances at each step.

However, because most tools are launched by people for people, the default interchange format of choice is still "somewhat parseable text". As a result, we still have to be careful with things like spaces or Unicode in filenames, in case someone puts text where it doesn't belong. It still matters sometimes whether there is a newline at the end of a file. A misplaced keystroke, easily missed in the cryptic bash wash, can wreak havoc. Relying on text makes us prone to errors, and has lead to a culture where new recruits have to pass a constant trial by fire—to not destroy their system at one of the many opportunities it gives them.

In fact, where it seems to matter most is the interaction. Text has literally locked down our abililty to evolve the UI, by forcing us to use the same monospace character terminals the previous generations used. Even worse, we are still often limited to a headache inducing, fluorescent EGA color palette from 1984. A terminal can show you the load on your system, but it can't show you a historic graph. It can make little progress bars out of ASCII characters, but it can't show LaTeX math inline. Even the command-line itself with its clunky tab-complete, lack of text selection and standard copy/paste bears little resemblence to the standard text field widget we use every day in modern UIs.

In the past decade, user interaction has made astounding leaps. We consume and produce vast quantities of information daily. The web has changed from blobs of markup into full apps, with rich typography, complex infographics, inline interactivity, dynamic layout, etc. UIs like OS X have raised the bar on how we can present and interact with information naturally. Sure, fancy icons and smooth graphs are great in iLife, but they can also be used in technical applications. You are almost certainly sitting in front of a display with more than a million pixels. Why are you telling it to draw a dinky character terminal from the 80s?

<h2>A new model</h2>

Criticism is easy of course, but what can we do about it? Well, after giving it some thought, I think we need to rebuild the model from the ground up.

First of all, we need a way to separate out the data between people and programs. The easiest is to split our pipes into 5:
 * Data In, Data Out
 * View In, View Out
 * Error Out

When chaining together programs, we make a chain out of the Data pipes like usual. The View pipes however are hooked up to a global controller which proxies to the user's terminal.

Processing of data proceeds normally, but with the additional property that meta-data is explicitly attached to the data pipes, indicating content type, format, name, etc. as available.

Next, we need a way for programs to output a rich and dynamic UI through the View pipes and receive callback events. This is where things get tricky, because it essentially comes down to network-transparency of UI. This is a complex problem and has only worked for a couple of things, i.e. X11 and the Web, and each comes with a huge set of baggage. We need something simpler, that can be implemented as a thin library in a hundred lines of code. It should still feel like a command-line terminal, but be flexible enough to allow all the things we expect from modern applications.

Finally, we need a better way to enter commands. We need a modern textfield widget that is aware of strings, regular expressions, pipes, and can do tricks like inline icons, visual tokens, autocomplete dropdowns, etc.

After all this pontificating, it should be no surprise I'm building something to try and solve this problem. I call it TermKit, a next-gen Terminal and systems tool platform. There's even a shiny icon:

<img>

It consists of a desktop WebKit app acting as the front-end, with the UI built out of HTML/CSS and JavaScript. OS services like QuickLook are integrated through Cocoa.

On the flip side is a Node.js daemon, connected through Socket.IO, which maintains shells, runs programs and streams the visible output back to the front-end.

The front-end and back-end exchange messages, which are routed to views on the front, and to worker processes on the back.

Combining the oil and water of command-line Unix and web isn't an easy job, but it's better than trying to make WebKit sing Unix, or making a rich front-end from scratch. After working on this on and off for a couple of months, I managed to get a prototype of the UI and daemon working, with QuickLook icons as icing on the cake:

<img>

It actually works, and the platform is pretty compelling. With Node.js you get a fast, asynchronous, scriptable back-end with excellent Unix libraries and modules. And using WebKit means you can display anything you can make in HTML5/CSS3, and millions know how to do it already.

However, despite the clear power of HTML5, it would be a mistake to make HTML the defacto output format of Unix: we'd be swapping one parseable text soup for a much bigger one.

<h2>Smart views</h2>

Instead, I wrote a custom View layer. Using simple building blocks, like "item list", "sortable table", "image", "line graph" or "file with icon", the display is built. The idea is to have a wide range of UNIX concepts easily expressed, so it's easier to make nice UIs quickly. The View is interactive and can be updated asynchronously. Making a streaming dashboard is child's play for example.

On the back end, a bridge makes it easy to interface by instantiating objects and streaming them to the front-end.

At the same time, you have access to HTML/CSS if you need it, or you can skip it altogether and just print plain text as well.

But what about the actual data? While it may sometimes be piped into a file, usually the data has to be displayed at the end. In the new model, data doesn't go to the terminal anymore, which complicates matters.

Imagine the case of "ls | grep": we're filtering items in a directory, which is displayed as a listing of files and icons. When the listing is output by "ls", sending it on the View out would send it directly to the terminal, preventing grep from filtering the items. If we instead pipe the items through grep, we lose the ability to view the data at all since it will remain in its raw form.

To solve this, I use the meta-data added to each data pipe, with another unholy web ingredient: MIME. This allows me to keep the data pure, and identify it through its Content-Type and other values. In the case of "ls", we keep piping around plain-text filenames separated by newlines like before, but we annotate it so we can format it at the end of the command chain.

The formatter reads the final data and turns it into a View graph, and can be extended to display files, images, JSON, source code, diffs, etc. This can be enabled for existing tools by deriving the Content-Type based on the command. It also makes TermKit web-transparent: you can hook up the data pipes to HTTP GET and POST, and the header information will be correctly interpreted.

<h2>Possibilities</h2>

In practice, the Data/View split has been there all along. It comes down to people vs programs, and the line is not hard to draw. "Wget" outputs a progressbar while streaming data into a file. "Top" displays an interactive dashboard that updates continuously. In this model, each application can do both interactive tasks and data processing tasks at the same time, and do each in the most natural format for the job.

Additionally, routing view commands this way allows parallel or background processes to update the view independently in the scrollback, instead of colliding with each other and the prompt.

The separation between front-end and back-end also brings other benefits: the back-end can be run remotely, tunneling the websocket over SSH to a local front-end. You get latency-free interaction for remote operations. Even more, we can replace the consoles-within-terminals of SQL and SFTP, and elevate them to first-class shells with all the benefits of the new interaction model.

Doesn't this mean rewriting all our tools? Not necessarily. Traditional Unix tools can be slotted in transparently, they just don't get the benefits of asynchronous output. Wrapper scripts can add additional functionality, e.g. to give GIT colored, side-by-side diffs or show version control status in an 'ls' listing.

<h2>Status</h2>
I've been slowly working on TermKit for about a year now, but it's still mostly vaporware. When I started, I didn't know what the right solution would look like, I just knew I wanted something more modern and usable. There's been a lot of writing and rewriting of code just to get to this stage. It also doesn't help that I'm a code perfectionist in unfamiliar territory.

The reason I wanted to blog about it is because all of the above is starting to sound like a really compelling argument to me. The idea is to create a flexible platform for making better tools, built out of cross-platform parts and with enough useful assumptions built-in to make a difference.

Feedback is welcome, and I invite you to browse the GitHub repo which has mockups and some code diagrams.


+++ Protocol considerations


The output of a termkit command is split into data and view. The data is the raw information that is piped from one process to the next. 

The view is a stream of dom-like objects and operators on them.

View and data are fundamentally different:
 * Data is a raw binary stream with meta-data annotation, from one process' stdout to another's stdin
 * View is a packetized stream of UI updates and callback events, going directly to the terminal.


+++ Command architecture

The webkit front end is intended to be a view first and foremost. It is an active view that maintains its own contents based on messages to and from the back-end.

problem: if front-end is agnostic, then how to make commands smarter?

 > shell-OS interface is server-side
 > server-side only executes processes/commands, routes streams and provides output.
 > separeate data in/out from ui in/out. datastream vs viewstream
 
 data in/out:
 content-type: unix/pipe
               application/jsonstream
 ...
 
 
 [ Front-End ]   ----->     [  Back-end ]
 [  WebKit   ]   websocket  [  Node.js  ]
                                 |   
                                 |
                                 v *
                            [   Shell   ]
                            [ Worker.js ]
 
 
 Worker sets up process chain:
 
 
  view in - callbacks / command stream
  view out - view stream
  stdin = data in (mime typed)
  stdout = data out (mime typed)
  
                | callbacks         ^ view stream: view updates
  stdin         v                   |
  ---->     [   "command [args...]"   ]  ---->
                                            stdout

  View streams are multiplexed and sent to the front-end.
  Front-end callbacks are routed back to the right process.
  
  Each process has its own view, referenced by ID number.
  Processes cannot break out of their view.
 
 
+++ Data vs UI glue

e.g.
 get http://.../rss.xml
 content-type: application/xml
 
 -> process streams out xml data

   get | grep
   Data Stream:  .txt|html|... > .txt|html|...
   View Stream: file metadata, download progress, grep stats (# hits)
   Output Formatting:  turn application/xml into dynamic XML tree.
  
   get | ungzip | untar
   Data Stream:  .gz  > .tar  >   nothing
   View Stream:  file metadata, download progress, unzip progress (# files)
   Output Formatting: no data output.
    
   The output formatter takes a mime-typed blob and turns into a display.
   - Plain text or HTML
   - Images
   - Binary data (hex or escaped)
   - Url/file reference with icon
   ...
   
   Existing tools are wrapped to provide richer output, either by parsing
   old school output or by interacting directly with the raw library.
    
+++ stream structure

><{
  termkit: '1'
}

>{
  query: 1
  method: session.open
  args:
}
<{
  answer: 1
  success: true
  args: { 
    session: 1
  }
}
>{
  query: 2
  method: shell.environment
  session: 1
}
<{
  answer: 2,
  success: true
  args: { ... }
}
>{
  query: 3
  method: shell.run
  session: 1
  args: {
    command: ...
    id: 0
  }
}
<{
  method: view.allocate,
  session: 1
  args: {
    id: 0
    views: [0, 1]
  }
}
<{
  method: view.add,
  session: 1  
  args: {
    view: 0
    target: null
    objects: [ ... ]
  }
}
<{
  method: view.add,
  session: 1  
  args: {
    view: 0
    target: [ 0 ]
    objects: [ ... ]
  }
}
>{
  method: view.callback,
  session: 1  
  args: {
    view: 0,
    message: { .. }
  }
}
<{
  method: view.update
  args: { ... }
}
<{
  answer: 3,
  success: true,
  args: {
    code: 0,
  }
}


viewstream:

 > shell-specific interaction rules are server-side.
 > rich widget lib for display, extensible
 > widgets are streamed to client like termkit-ML. objects are smartly typed and have callback commands defined for them. callbacks can be stateful or stateless. though stateful is only intended to be used for interactive commands.

 tableview / listcontainer -> generic, scales from simple list to tabled headers w/ simple syntax
 object references for files and other things. are multi-typed and annotated on server-side.

+++ --
references:

textmate html output features
http://blog.macromates.com/2005/html-output-for-commands/

bcat: browser cat tool

terminator: java-based terminal++

protocol:
 fastcgi, json-rpc

command output:
* rich output, consisting of an element tree
* simple container types (list, table) and various viewers to represent common objects and data types

current widget model:
* use a viewcontroller for each widget
* viewcontroller own its own markup
* simple object instantiation, widgets can create child widgets

more ideas:
* autocorrection of mistyped commands
* unified progress indicator during command
* autocompleted/predicted history access
* guard if command is dangerous / doesn't make sense

+++ -

Code base on May 9th 2011

front
1200 tokenfield + autocomplete
690  output view
393  command handling
184  shell client
181  misc
465  CSS
55   HTML
= 3168

back
204  builtin commands
214  autocomplete
255  output formatter
250  command handling
103  router
166  misc
162  view bridge
160  processor
83   shell
= 1597

tests = 162

4927 lines of code
 + jquery + socket.io + syntax highlighter + node + node-mime