New staff UI for Oral History (Phase 2)
This is the new staff user interface for the Oral History project. It allows staff to create & maintain metadata, and upload & process media files, for UCLA's Oral History site.
This is the second phase of development. The first was a temporary solution which enabled staff to upload files with the old DLCS application. That functionality will be integrated into this new application.
Related repositories:
The development environment requires:
- git (at least version 2)
- docker (current version recommended: 20.10.12)
- docker-compose (at least version 1.25.0; current recommended: 1.29.2)
The development database is a Docker container running PostgreSQL 12, which matches our deployment environment.
This uses Django 4.1, in a Debian 11 (Bullseye) container running Python 3.11. All code runs in the container, so local version of Python does not matter.
The container runs via docker_scripts/entrypoint.sh
, which
- Updates container with any new requirements, if the image hasn't been rebuilt (DEV environment only).
- Waits for the database to be completely available. This can take 5-10 seconds, depending on your hardware.
- Applies any pending migrations (DEV environment only).
- Creates a generic Django superuser, if one does not already exist (DEV environment only).
- Loads fixtures to populate lookup tables and to add a few sample records.
- Starts the Django application server.
-
Clone the repository.
$ git clone [email protected]:UCLALibrary/oral-history-staff-ui.git
-
Change directory into the project.
$ cd oral-history-staff-ui
-
Build using docker-compose.
$ docker-compose build
-
Bring the system up, with containers running in the background.
$ docker-compose up -d
-
Logs can be viewed, if needed (
-f
to tail logs).$ docker-compose logs -f db $ docker-compose logs -f django
-
Run commands in the containers, if needed.
# Open psql client in the dev database container $ docker-compose exec db psql -d oral_history -U oral_history # Open a shell in the django container $ docker-compose exec django bash # Django-aware Python shell $ docker-compose exec django python manage.py shell # Apply new migrations without a restart $ docker-compose exec django python manage.py migrate # Populate database with seed data (once it exists...) $ docker-compose exec django python manage.py loaddata --app oh_staff_ui seed-data # # Load full set of item data $ docker-compose exec django python manage.py import_projectitems export_scripts/project-items-export.tsv # Load full set of name data $ docker-compose exec django python manage.py import_names export_scripts/Name.tsv
-
Connect to the running application via browser
- Application and Admin
- Username:
dev_admin
- Password:
dev_admin
- Username:
- Application and Admin
-
Edit code locally. All changes are immediately available in the running container, but if a restart is needed:
$ docker-compose restart django
-
Shut down the system when done.
$ docker-compose down
The local docker-based development database can be reset as needed. This is helpful when switching between branches which involve migrations, or when testing data loading and transformation.
To do this, run:
docker_scripts/nuke_dev_db.sh
It does the following:
- Shuts down the application
- Drops the docker volume for the database
- Starts the application
- Reloads data files via management commands.
- This currently is just
ProjectItem
andName
files; others will be added as loaders are finished.
- This currently is just
The script warns users, and requires them to confirm their decision.
Developers can opt out of reloading data by adding a specific parameter:
docker_scripts/nuke_dev_db.sh NOLOAD
This will not normally need to be done during development, but may be needed for data migration. It requires the ability to establish a tunneled SSH connection to the production database server, and a file with production database credentials.
-
In a separate terminal window, create a tunneled SSH connection through the
jump
server to the production database server. This will forward all local connections on port 5433:$ ssh -NT -L 0.0.0.0:5433:p-d-postgres.library.ucla.edu:5432 jump
-
If you don't already have it, get
.docker-compose_secrets.env
from a teammate and add it to the top level of your project (same directory asdocker-compose_REMOTE.yml
). -
Shut down your local application, if running:
$ docker-compose down
-
Start your local application, using the specially-configured
docker-compose_REMOTE.yml
. This will start just the Django container, skipping the usual local database container and instead connecting to the remote database via information in.docker-compose_secrets.env
:$ docker-compose -f docker-compose_REMOTE.yml up -d
-
Be very careful - your local application is now connected to the production database!
-
When finished, shut down your local application as usual. Switch to your other terminal window and
CTRL-C
orCMD-C
to stop the tunnel.
For data migration, it's necessary to clear out existing data from the production database before doing the final load. Doing this via a tunneled connection is possible, but very slow (8+ hours for a full load), so it's better to do this while connected directly to the production container, which only DevSupport staff can currently do.
Once connected (either directly, or via tunnel as documented above), run either:
# If connected directly
$ docker_scripts/reset_prod_db.sh
# If connected from local system via tunnel
$ docker-compose exec django docker_scripts/reset_prod_db.sh
Basic logging is available, with logs captured in logs/application.log
. At present, logs from both the custom application code and Django itself are captured.
Logging level is set to INFO
via .docker-compose_django.env
. If there's a regular need/desire for DEBUG level, we can discuss that.
Logging can be used in any Python file in the project. For example, in views.py
:
# Include the module with other imports
import logging
# Instantiate a logger, generally before any functions in the file
logger = logging.getLogger(__name__)
def project_table():
logger.info('This is a log message from project_table')
query_results = ProjectItems.objects.all()
for r in query_results:
logger.info(f'{r.node_title} === {r.item_ark}')
try:
1/0
except Exception as e:
logger.exception('Example exception')
logger.debug('This DEBUG message only appears if DJANGO_LOG_LEVEL=DEBUG')
The current log format includes:
- Level: DEBUG, INFO, WARNING, ERROR, or CRITICAL
- Timestamp via
asctime
- Logger name: to distinguish between sources of messages (
django
vsoral_history
application) - Module: somewhat redundant with logger name
- Message: The main thing being logged
Local development environment: view logs/application.log
.
In deployed container:
/logs/
: see latest 200 lines of the log/logs/nnn
: see latestnnn
lines of the log
Tests focus on code which has significant side effects, like creating & changing files.
Run tests in the container:
$ docker-compose exec django python manage.py test
In the local development environment, samples files for testing are in the samples
directory. This directory and its files are under version control, so only add small files, and be sure any real files have no copyright restrictions.
As files are processed locally, they go into subdirectories of MEDIA_ROOT
- for developers, /tmp/media_dev
in the container.
The top-level subdirectories are defined via environment variables:
DJANGO_OH_MASTERS
DJANGO_OH_WOWZA
DJANGO_OH_STATIC
Processed files go into various subdirectories below these top-level ones, depending on file use (master, submaster, thumbnail) and content type (audio, image, pdf, text). Subdirectories are created automatically by Django as needed.
To check the contents of /tmp/media_dev
, which is only in the container:
# Easiest, but harder to read
$ docker-compose exec django bash -c "ls -lR /tmp/media_dev"
# Extra step needed after container rebuilds, but easier to read
# Install tree utility as root (in container):
$ docker-compose exec -u root django bash -c "apt-get install tree"
# Then run tree (in container) as normal django user
$ docker-compose exec django bash -c "tree /tmp/media_dev"
Example, after uploading 1 of each type of file, showing masters and derivatives:
/tmp/media_dev
├── oh_masters
│ ├── audio
│ │ └── masters
│ │ └── fake-bdef357512-1-master.wav
│ ├── masters
│ │ └── fake-bdef357512-3-master.tif
│ ├── pdf
│ │ └── masters
│ │ └── fake-bdef357512-4-master.pdf
│ └── text
│ └── masters
│ └── fake-bdef357512-2-master.xml
├── oh_static
│ ├── nails
│ │ └── fake-bdef357512-3-thumbnail.jpg
│ ├── pdf
│ │ └── submasters
│ │ └── fake-bdef357512-4-submaster.pdf
│ ├── submasters
│ │ └── fake-bdef357512-3-submaster.jpg
│ └── text
│ └── submasters
│ └── fake-bdef357512-2-submaster.xml
└── oh_wowza
└── audio
└── submasters
└── fake-bdef357512-1-submaster.mp3
18 directories, 9 files
Currently, there is no support in the application for deleting files - a feature to be added later, once specs are agreed on.
To delete files locally, use the Django shell:
$ docker-exec exec django python manage.py shell
>>> from oh_staff_ui.models import MediaFile
>>> for mf in MediaFile.objects.all():
... mf.file.delete()
... mf.delete()
...
(1, {'oh_staff_ui.MediaFile': 1})
(1, {'oh_staff_ui.MediaFile': 1})
(1, {'oh_staff_ui.MediaFile': 1})
(1, {'oh_staff_ui.MediaFile': 1})
Deleting a MediaFile
object does not currently delete any FileField
object associated with it, so it's necessary to do a file.delete()
first.
A barebones OAI Provider is publically available at /oai. Only MODS output is supported, Dublin Core is not available.
Two verbs are supported:
GetRecord
- Given an ark, will return information about the item related to that ark, a single record.ListRecords
- No arguments are required, all qualified records will be returned, without pagination.
The implementation details are located in the oh_staff_ui\classes\OralHistoryMods.py
class.
The populate_fields()
method contains the methods called for each element in the MODS record.
If an item contains the following subjects, the subject value is not included in the MODS subject output:
Arts, Literature, Music, and Film
Donated Oral Histories
Latinas and Latinos in Music
Latinas and Latinos in Politics
Mexican American Civil Rights
Descriptions of following type are not included in the MODS Description element:
adminnote
tableOfContents
If a Description is of type abstract
an Abstract
MODS element is created for this data, not a Description element.
The human readable label for the Description field can be different than the internal Description type, the mapping is as follows:
Description Type | Display Label |
---|---|
biographicalnote | Biographical Information |
interviewerhistory | Interviewer Background and Preparation |
personpresent | Persons Present |
place | Place Conducted |
processinterview | Processing of Interview |
supportingdocuments | Supporting Documents |
The Location element contains information associated with the file location (url) of files associated with an item.
This also has some label mapping to be aware of based on the internal file_code
:
File Code | Display Label |
---|---|
pdf_master | Interview Full Transcript (PDF) |
text_master_transcript (file ends with .html ) |
Interview Full Transcript (Printable Version) |
text_master_transcript (all other cases) | Interview Full Transcript (TEI/P5 XML) |
text_master_biography | Interviewee Biography |
text_master_interview_history | Interview History |
pdf_master_appendix | Appendix to Interview |
text_master_appendix | Appendix to Interview |
pdf_master_resume | Narrator's Resume |
Our deployment system is triggered by changes to the Helm chart. Typically, this is done by incrementing image:tag
(on or near line 9) in charts/prod-ohstaff-values.yaml
. We use a simple semantic versioning system:
- Bug fixes: update patch level (e.g.,
v1.0.1
tov1.0.2
) - Backward compatible functionality changes: update minor level (e.g.,
v1.0.1
tov1.1.0
) - Breaking changes: update major level (e.g.,
v1.0.1
tov2.0.0
)
In addition to updating version in the Helm chart, update the Release Notes in oh_staff_ui/templates/oh_staff_ui/release_notes.html
. Put the latest changes first, following the established format.