ThingFish: Developer Notes

This is a scratchpad for jotting down notes as we develop so we can keep them in a single place.

Request/Response/Filter Stuff

The following relies on a patch to Mongrel... either a fairly large and unwieldy monkeypatch or an official update to the Mongrel trunk.

  • Encapsulate Mongrel's request/response stuff into our own request/response objects that can be passed through filters to the handlers, then back up the chain to be unwrapped mongrel-style inside ThingFish::Handler.
  • The response object contains:
    • the HTTP status
    • the data to be returned (as either a String or an IO)
    • bindings to the handler methods of all of the responding handlers, which will allow things like ERb to eval back into them after the content-type filter knows the client wants HTML.
  • The request object contains:
    • A more-sane headers hash, keyed by the actual header name
      • request.headers['Accept']
      • request.headers[:accept]
      • request.headers.accept
    • The Mongrel-style 'params' hash, for backward compatibility
    • An IO for uploaded data, if any
    • Convenience methods for multipart parsing and Accept-Header decision makin'
  • Incoming filters are applied to the request via the interface: Filter#filter_request( request )
  • Outgoing filters are applied to the response via the interface: Filter#filter_response( response )

Another argument for the header-method addition to ThingFish::Table:

response.headers.cookie << a_cookie

Benchmark goals

Here are some stream-of-consciousness thoughts around what we'd like to get out of our benchmarking tasks.

A benchmark task inherits from Rake::Task, and should look something like this.

benchmark :barebones => [TESTIMAGE.to_s] do
	config = ThingFish::Config.new do |config|
		...
	end

	with_config( config, :count => 500, :concurrency => 5 ) do

		datapoint 'GET /',	 :get,  "/"
		datapoint 'GET /[uuid]', :get,  "/#{resource.uuid}"
		datapoint 'POST /',	 :post, '/', :entity_body => TESTIMAGE

		...
	end
end

Each datapoint fires up a ThingFish::Daemon with the supplied configuration options, then runs ab against it. This will generate two things under 'benchmarks/r#':

  • A ruby marshalled dump file, with ThingFish::Config specifics, each datapoint's dtime/ctime/time/wait, and ab's statistical output
  • Ab generated tab separated files, for each datapoint

Then separate tasks should be able to load the marshalled file(s) and do reports/graphs/differences/whatever.

Graphs we'd like to have generated for us:

Datapoint collections:

  • GET /
  • GET /uuid
  • POST /

Single datapoint "diffs" across separate benchmarks

  • GET / [r517] vs
  • GET / [r515]

Focused graphs for a single datapoint

  • GET / (ctime, ttime, dtime, wait)

Content-Disposition

The default handler on UUID GETs should set a content-disposition header, with the original filename (if any) and the modification date of the file requested.

http://www.ietf.org/rfc/rfc2183.txt

RSpec Annoyances

This is a list of things that RSpec does that make testing harder, give unexpected results, or just generally get in the way:

  • Implicit .once
  • .and_return(arg1, arg2) doesn't work. You have to use: .and_return([ arg1, arg2 ]).
  • Can't .and_yield( something ).and_return( something_else ) -- apparently you'd never want to yield to a block and return anything other than what the block produces.

Mongrel misc

What works effectively with mongrel

  • HTTP header parsing
  • URI handler registrations

What makes Mongrel hard to work with

  • Large methods hard to monkey patch
  • Makes assumptions about handler usage (too much control per handler)
  • "Headers" has CGI environment variables mixed with real headers, inconsistent prefixing

Metadata API issues

  • Multi-update: REST seems to suggest that requests should not aggregate data, but

I think multi-update is necessary for atomic updates of multiple items. How should we handle the case of multi-PUT, though? Include an X-ThingFish-UUID header or something?

  • Multipart-POST has it's own problems too. Since the Location header of a `201 CREATED` response "consists of a single absolute URI" (per RFC 2616), what do you set the Location to when you have multiple locations? The only thing we can think of that is still correct according to the RFC is to reply with a multipart ([http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.30 multipart/related] maybe?) response, set the Location header of each part to the URL for one of the new resources, and then set the Location of the container response to the URL of a metadata container that references all of the uploaded resources.

Search/Metadata Structure

The kinds of stuff we want to support with the REST search interface:

/search?tag=preview;filename=~logo;created:before+1/12/2007;owner=(mahlon|michael|borat)

would result in a query like:

tag = 'preview'
AND filename LIKE '%logo%'
AND created < date '1/12/2007'
AND owner IN (mahlon,michael,borat)

Crazy/Fun Ideas

DAV interface

  • Implement enough of the DAV methods that we can mount a ThingFish as a trophy on our MacOS desktop.
    From the RFC:
  1. Client retrieves representation of WebDAV collection "/user42/inbox/".
       GET /user42/inbox/ HTTP/1.1
       Host: www.example.com
    
  1. Server returns representation.
       HTTP/1.1 200 OK
       Content-Type: text/html
       Content-Length: xxx
    
       ..
       <a href="?action=davmount">View this collection in your
       WebDAV client</a>
       ..
    

(note that the example shows only that part of the HTML page that contains the relevant link)

  1. Client follows link to "davmount" document
       GET /user42/inbox/?action=davmount HTTP/1.1
       Host: www.example.com
    
  1. Server returns "davmount" document
       HTTP/1.1 200 OK
       Content-Type: application/davmount+xml
       Content-Length: xxx
       Cache-Control: private
    
       <dm:mount xmlns:dm="http://purl.org/NET/webdav/mount">
         <dm:url>http://www.example.com/user42/</dm:url>
         <dm:open>inbox/</dm:open>
       </dm:mount>
    

Remote 'require'

  • The index response for an Accept header of text/x-ruby should be the source code for the corresponding client library. This would work great with require-uri.

Attachments