Skip to content
tomkralidis edited this page Jun 21, 2012 · 2 revisions

(n.b. all descriptions below are intended to provide a high level overview of how pycsw is implemented. For full details, please refer to the codebase)

Overview

pycsw is a CGI based application written in Python, which accepts HTTP GET and POST requests as per OGC:CSW 2.0.2. The basic flow of events is:

client request --> pycsw (handle request, produce response) --> server response

pycsw is always called from csw.py, and always instantiates a server.Csw object and then uses its dispatch() method to handle the request and generate a response.

The server.Csw class sets up the server to be able to handle OGC:CSW requests accordingly:

  • setup configuration (default.cfg)
  • initialize the underlying repository (database) connection and queryables model
  • set default HTTP properties (gzip compression)
  • generate !GetDomain model
  • load any profile code (e.g. as apiso)
  • setup transactions (if specified)
  • setup distributed search (if specified)
  • setup logging (if specified)

At this point, pycsw is ready to handle the request, using server.Csw.dispatch(), which does the following:

  • parse request (GET or POST or SOAP)
  • do basic parameter checking (service, version, request)
  • process the request accordingly

(server.Csw.exceptionreport() is always used when pycsw encounters an error and returns an OGC ExceptionReport)

All server.Csw methods return lxml.etree.Element objects, which are then processed by server.Csw.write_response() and returned to the client as XML.

server.Csw.getcapabilities()

  • handle SECTIONS parameter if specified
  • handle extra profile parameters if specified
  • set / process updatesequence
  • return response XML as lxml.etree.Element

server.Csw.describerecord()

  • perform GET validation
  • process the output of schemas as csw:SchemaComponent elements
  • return response XML as lxml.etree.Element

server.Csw.getdomain()

  • perform GET validation
  • process parameter name
    • validate against internal domain model
  • process property name
  • validate existence of property against self.repository.queryables['all']
  • query repository (SQL distinct query against XPath of queryable in records.xml
  • return response XML as lxml.etree.Element

server.Csw.getrecords()

  • perform GET validation
  • query repository. SQL query, one of:
  • spatial (util.query_spatial())
  • aspatial (util.query_xpath())
  • spatial + aspatial
  • sorting (if specified)
  • do distributed searching (if specified)
  • write out results (based on outputschema)
  • distributed search results are returned verbatim
  • return response XML as lxml.etree.Element

server.Csw.getrecordbyid()

  • perform GET validation
  • query repository. SQL query by id (against records.identifier)
  • write out results (based on outputschema)
  • return response XML as lxml.etree.Element

server.Csw.getrepositoryitem()

  • wrapper around server.Csw.getrecordbyid()
  • gets raw XML record
  • return response XML as lxml.etree.Element

server.Csw.transaction()

  • validate XML document
  • insert mode
  • update mode
  • delete mode

server.Csw.harvest()

  • fetch XML from URL
  • insert into repository, or update if identifier exists

Other Notes

  • server.Csw._gen_soap_wrapper() is the generic SOAP wrapper private method
  • server/config.py sets the server's operation model in config.MODEL. Any modifications are then made by calling code (e.g. to add more queryables, typenames, etc.)
  • spatial query magic is via Shapely in server/util.py:query_spatial(), called via SQL function bound back to this method
  • full text (e.g. '*:!AnyText') style queries are via server.util.py:query_anytext(), called via SQL function bound back to this method
  • XPath style queries are via server.util.py:query_xpath(), called via SQL function bound back to this method