API or Application Programming Interface is a means for computer applications or code libraries to talk to each other and exchange bits of information consistently. A programmer will write code that leverages APIs for example to show Yelp reviews in another application or to perform geocoding among many other things.
Data Custodians are individuals that assist with the technical implementation of individual databases, datasets, or information systems. Not all systems or data sources will have a data custodian. General responsibilities likely include:
- Implementing technical changes requested by the data steward
- Administration and maintenance for the database or system
- Coordinating any technical work necessary for automated open data publishing. You can learn more about this in the automation services guidance
Data Stewards are individuals in charge of individual databases, datasets, or information systems. In general, a data steward has business knowledge of the data and can answer questions about the data itself. General responsibilities likely include:
- Managing the dataset or source and authorizing changes to it
- Managing access to and use of the data, including documentation
- Managing accuracy, quality and completeness of the data
- Answering questions related to data as needed
Note: You should not be publishing a dataset without input from the data steward.
ETL stands for Extract-Transform-Load, which is a process for moving data from it's source to another data storage system. Extraction involves connecting to a source system and pulling data (sometimes validating on extraction), transformation involves reformatting data to match an expected output, loading involves pushing data to the target system. The primary target system for DataSF is the open data portal.
A unique ID is assigned to every record in the City's dataset and systems inventory. Composed of 3 letter department code plus 4 digits. This ID is used to:
- assure departments and DataSF staff are referring to the same thing
- tie records together across systems and processes
- ease in the discovery of related information about a particular dataset and it's collateral