- Introduction
- Running the Google Colab script
- Installation of raga_pose_estimation Python library
- Running the command line script
- Post-processing options
- CSV format
- Other details
- License
Raga Pose Estimation comprises a set of tools to facilitate the use of pose estimation software for the analysis of human movement in music performance. It was developed with the study of Indian classical music in mind – hence the name – but can be applied to any other type of small musical ensemble. Raga Pose Estimation was developed by staff in the Music Department and Advanced Research Computing at Durham University, as part of the EU Horizon 2020 FET project EnTimeMent – Entrainment & synchronization at multiple TIME scales in the MENTal foundations of expressive gesture.
The code, which is available as a library and Colab script, utilizes the OpenPose tool to facilitate the analysis of musical performance. The Colab script does this by enabling the user to carry out pose estimation online, avoiding the need for a suitable local GPU. Post-processing functions turn the large numbers of JSON files (one per video frame) output by OpenPose into single CSV files per performer. Cropping and trimming of videos, applying confidence levels and smoothing functions to the output, are amongst the other functions that are built into a single process.
We include some example_files to use with the system. Other suitable video examples can be found on Open Science Framework, see for example North Indian Raga Performance. OpenPose proves effective for the extraction of movement information from static videos of music performance, although its performance can suffer in cases of poor lighting, occlusion (body parts are obscured by other objects), when limbs of different individuals overlap, or when tracked body parts leave the video frame altogether. The system is not dependent on any particular video resolution, aspect ratio or frame rate.
This Readme describes how to use the scripts, whether on Colab or by installing the library locally. Even if you use the Colab script, the text below on use of the library may come in useful as it contains some more detailed information about the functions.
Open RagaPoseEstimationColab.ipynb and click 'Run in Colab'.
Run the first four cells in the Colab. This will take approximately half an hour to run.
Once the first cells have run you will see this form:
The inputs required depend on what type of files you are using.
There are additional arguments you can pass to OpenPose. See the OpenPose documentation for a full list of options.
(Performance of OpenPose can sometimes be improved by cropping a video to include only the person or persons of interest. Cropping can be carried out as part of the Colab script by entering the parameters here.)
Type their names with commas inbetween
You can create a model video (just showing the OpenPose skeleton), and/or an overlay video (the skeleton from the model video added to the input video). We recommend writing the output directly to a Google Drive folder: if you choose not to do this, you can click the ‘Download results’ box here.
If you have already run the system and want to re-run the post-processing with different parameters, you can start with the JSON folder
from the previous run. Input its url or path here.
Input a folder containing folders in the same structure as example_files with videos if output videos desired.
OpenPose outputs a confidence level for each estimate of the x- and y- position of a body part. It may sometimes be useful to set a confidence threshold to eliminate some poorly estimated data (which can then be interpolated). Trial and error is necessary to establich a suitable confidence level. (Set to 0 by default.)
To reduce jitter in the model, a smoothing function (Savitzky-Golay filter) can be applied. You can choose the Smoothing window, which determines the number of previous frames used in the smoothing calculation, and the Smoothing polynomial order, which is the order of the polynomial used in the fitting function. The smoother is set to off by default.
You can select which body parts to detect by choosing from options such as all body parts, upper body parts only (which is best for Indian classical music), lower body parts only, or choose specific body parts.
Select whether you want the CSV data in the output to be flattened (a flattened CSV file includes only one header row, while the unflattened file includes two header rows).
Enter a name for the trial, which will be included in the file names.
This cell will execute the OpenPose/post-processing process using the specified parameters. The duration of the process will depend on the size of the video and may take anywhere from 5 minutes to several hours to complete.
- pandas=1.3.5
- numpy
- scipy
- python=3.7
- opencv
- click
- pytest
- pytest-cov
- coverage[toml]
- black
- pre-commit
- ffmpeg-python
- pymediainfo=6.0.1
Install dependencies of raga_pose_estimation
using conda or miniconda:
conda env create -f environment.yml
Alternatively use pip
to install the packages listed in environment.yml
.
$ python run_pose_estimation.py --help
Usage: run_pose_estimation.py [OPTIONS]
Runs openpose on the video, does post-processing, and outputs CSV files.
See cli docs for parameter details.
Options:
-v, --input-video TEXT Path to the video file on which to run
openpose
-j, --input-json TEXT Path to a directory of previously generated
openpose json files
-bf, --batch-folder TEXT Path to directory of folders with subfolders
with JSON files and optional videos
-o, --output-dir TEXT Path to the directory in which to output CSV
files (and videos if required).
-r, --crop-rectangle INTEGER...
Coordinates of rectangle to crop the video
before processing, in the form w h x y where
w and h are the width and height of the
cropped rectangle and (x,y) is the top-left
of the rectangle, as measured in pixels from
the top-left corner.
-n, --number-of-people INTEGER Number of people to include in output.
-O, --openpose-dir TEXT Path to the directory in which openpose is
installed.
-a, --openpose-args TEXT Additional arguments to pass to OpenPose.
See https://github.com/CMU-Perceptual-
Computing-Lab/openpose/blob/master/include/o
penpose/flags.hpp for a full list of
options.
-m, --create-model-video Whether to create a video showing the poses
on a blank background
-V, --create-overlay-video Whether to create a video showing the poses
as an oVerlay
-w, --width INTEGER Width of original video (mandatory for
creating video if not providing input-
video)
-h, --height INTEGER Height of original video (mandatory for
creating video if not providing input-
video)
-c, --confidence-threshold FLOAT
Confidence threshold. Items with a
confidence lower than the threshold will be
replaced by values from a previous frame.
-s, --smoothing-parameters <INTEGER INTEGER>...
Window and polynomial order for smoother.
See README for details.
-b, --body-parts TEXT Body parts to include in output. Should be a
comma-separated list of strings as in the
list at https://github.com/CMU-Perceptual-
Computing-Lab/openpose/blob/master/doc/outpu
t.md#keypoint-ordering-in-cpython, e.g.
"LEye,RElbow". Overrides --upper-body-parts
and --lower-body-parts.
-u, --upper-body-parts Output upper body parts only
-l, --lower-body-parts Output lower body parts only
-f, --flatten TEXT Export CSV in flattened format, i.e. with a
single header row (see README)
-tn, --trial_number INTEGER Trial number of run.
-p, --performer_names" Names of performers from left to right.
--help Show this message and exit.
Run OpenPose on a video to produce output CSVs (1 per person) in the output
directory
and an overlay video, cropping the video to width 720, height 800 from (600, 50):
python run_pose_estimation.py --input-video=example_files/example_1person/short_video.mp4 --openpose-dir=../openpose --output-dir=output --create-overlay-video --crop-rectangle 720 800 600 50
Parse existing JSON files created by OpenPose to produce 1 CSV per person in the output
folder, showing only upper body parts, outputting up to 3 people, and using the confidence_threshold and smoothing to improve the output (using short form of arguments):
python run_pose_estimation.py -j example_files/example_3people/output_json -o output -u -n 3 -s 21 2 -c 0.7
The script run_samples.sh
runs a sensible set of default options on the two examples in example_files, producing both overlay and model videos.
./run_samples.sh
There are two ways to reduce the jitter from the OpenPose output: using a confidence threshold and using the smoother.
Applying a confidence threshold removes values with a confidence level below the threshold and replaces them with the value from the previous frame.
Setting the smoothing parameters applies a smoothing function to reduce the jitter between frames. The parameters are:
-
smoothing window: the length of the smoothing window, which needs to be odd. The higher it is, the more frames are taken into account for smoothing; between 11 and 33 or so seems to work well. If it's too big it will affect/delay genuine movements.
-
polyorder: the order of the polynomial used in the fitting function. It needs to be smaller than the smoothing window (2 seems to work well, 1 connects with straight lines, etc.)
The CSVs which are output give the positions for a single person. They contain one line per frame, and columns showing the x, y and confidence for each body part.
With flatten=False
(default), the CSV output looks like this (note that this is a minimal example showing only the eyes):
Body Part,LEye,LEye,LEye,REye,REye,REye
Variable,x,y,c,x,y,c
0,977.627,285.955,0.907646,910.046,271.252,0.92357
1,974.83,286.009,0.910094,909.925,271.277,0.925763
...
which in table form looks like this:
Body Part | LEye | LEye | LEye | REye | REye | REye |
---|---|---|---|---|---|---|
Variable | x | y | c | x | y | c |
0 | 977.627 | 285.955 | 0.907646 | 910.046 | 271.252 | 0.92357 |
1 | 974.83 | 286.009 | 0.910094 | 909.925 | 271.277 | 0.925763 |
If you export CSVs with flatten=False
, they can be read back into a pandas.DataFrame
as follows:
df = pd.read_csv(csv_path, header=[0,1], index_col=0)
The dataframe will have MultiIndex
columns to allow easier access to the values.
With flatten=True
, the CSV output looks like this:
,LEye_x,LEye_y,LEye_c,REye_x,REye_y,REye_c
0,977.627,285.955,0.907646,910.046,271.252,0.92357
1,974.83,286.009,0.910094,909.925,271.277,0.925763
...
which in table form looks like:
LEye_x | LEye_y | LEye_c | REye_x | REye_y | REye_c | |
---|---|---|---|---|---|---|
0 | 977.627 | 285.955 | 0.907646 | 910.046 | 271.252 | 0.92357 |
1 | 974.83 | 286.009 | 0.910094 | 909.925 | 271.277 | 0.925763 |
The files with suffix like '_3d' and '_adaptive' correspond to the process of specific pose data. The folder 'utils' includes some useful tools to process the data. Please find more details from 'utils/README.md'.§
The code in this repository is licensed under the MIT License.
Note that in order to run OpenPose you must read and accept the OpenPose License. OpenPose is free for non-commercial use only; see the OpenPose README for further details.