Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TTAHUB-3061] Generate processed dataset to remove PII in CI #2299

Draft
wants to merge 148 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 127 commits
Commits
Show all changes
148 commits
Select commit Hold shift + click to select a range
46b30fd
first pass
GarrettEHill Aug 3, 2024
67716f7
Update config.yml
GarrettEHill Aug 5, 2024
fb1c806
Update config.yml
GarrettEHill Aug 5, 2024
8a63615
Update manifest-process.yml
GarrettEHill Aug 5, 2024
57ad06a
Update config.yml
GarrettEHill Aug 5, 2024
6141c05
limit to current test
GarrettEHill Aug 5, 2024
11132c3
force unbind to limit cross job access risk
GarrettEHill Aug 5, 2024
44f0152
relocate the unbind when stopping
GarrettEHill Aug 5, 2024
8b89158
Update cf_lambda.sh
GarrettEHill Aug 5, 2024
9321960
Update db_restore.sh
GarrettEHill Aug 5, 2024
5b3b808
match logic in latest_backup.sh
GarrettEHill Aug 5, 2024
553a5a5
Update db_restore.sh
GarrettEHill Aug 5, 2024
f2f12ae
Update db_restore.sh
GarrettEHill Aug 5, 2024
2bdc889
more magic
GarrettEHill Aug 6, 2024
a206588
Update db_restore.sh
GarrettEHill Aug 6, 2024
ce04813
switch to zenc
GarrettEHill Aug 6, 2024
92ef784
Update cf_lambda.sh
GarrettEHill Aug 6, 2024
c72f0f3
tweak to remove warning
GarrettEHill Aug 6, 2024
81f7b68
fix for content being printed to terminal
GarrettEHill Aug 6, 2024
7241b52
extra char on extension
GarrettEHill Aug 6, 2024
ab07803
Update db_restore.sh
GarrettEHill Aug 6, 2024
b5d7df7
Update cf_lambda.sh
GarrettEHill Aug 7, 2024
19216d3
Update db_restore.sh
GarrettEHill Aug 7, 2024
57f5bb5
Update cf_lambda.sh
GarrettEHill Aug 7, 2024
5ca09ec
add configurable timeout and extend the time for the restore
GarrettEHill Aug 7, 2024
4cbd741
Update db_restore.sh
GarrettEHill Aug 7, 2024
b64f0a8
Update config.yml
GarrettEHill Aug 7, 2024
8ff92b8
Update cf_lambda.sh
GarrettEHill Aug 7, 2024
02fe773
Update cf_lambda.sh
GarrettEHill Aug 7, 2024
c559431
Update db_restore.sh
GarrettEHill Aug 7, 2024
0d78ca3
Update cf_lambda.sh
GarrettEHill Aug 7, 2024
caf8a38
Update manifest-restore.yml
GarrettEHill Aug 7, 2024
2ac2164
reduce resources as more are not needed
GarrettEHill Aug 7, 2024
10cc8f3
Update config.yml
GarrettEHill Aug 7, 2024
7222079
Update config.yml
GarrettEHill Aug 7, 2024
44da07b
debug
GarrettEHill Aug 9, 2024
0fe788e
more debugging
GarrettEHill Aug 9, 2024
4798895
Merge branch 'main' into TTAHUB-3061/process-data
GarrettEHill Aug 9, 2024
434add5
Merge branch 'main' into TTAHUB-3061/process-data
GarrettEHill Aug 23, 2024
c09a7e6
refactor
GarrettEHill Aug 23, 2024
269f4fe
Update yarn-audit-known-issues
GarrettEHill Aug 23, 2024
a48ed6d
Update yarn-audit-known-issues
GarrettEHill Aug 23, 2024
15caf4f
Update yarn-audit-known-issues
GarrettEHill Aug 23, 2024
7c4f329
fix formating
GarrettEHill Aug 23, 2024
f7c8a1c
try to get the manifest file to work
GarrettEHill Aug 23, 2024
027629c
Update cf_lambda.sh
GarrettEHill Aug 23, 2024
9eec2f8
try a different method for BOUND_SERVICES
GarrettEHill Aug 23, 2024
48c15fe
switch to var_files model
GarrettEHill Aug 26, 2024
ca352a7
Update cf_lambda.sh
GarrettEHill Aug 26, 2024
4d09a3a
missing values
GarrettEHill Aug 26, 2024
8a82bb3
Update config.yml
GarrettEHill Aug 26, 2024
457ab01
old package no longer available
GarrettEHill Aug 26, 2024
ad8aa57
Update config.yml
GarrettEHill Aug 26, 2024
da46894
make a method to run commands within the lifecycle shell as a task
GarrettEHill Aug 26, 2024
7f27f6a
another test
GarrettEHill Aug 27, 2024
2c8f781
Update config.yml
GarrettEHill Aug 27, 2024
e19a38a
Update run.sh
GarrettEHill Aug 27, 2024
793373d
Update run.sh
GarrettEHill Aug 27, 2024
ec8df6f
Update config.yml
GarrettEHill Aug 27, 2024
d86366f
another test
GarrettEHill Aug 27, 2024
fecc958
Update run.sh
GarrettEHill Aug 27, 2024
af989ec
Update run.sh
GarrettEHill Aug 27, 2024
8f8dbba
Update process.yml
GarrettEHill Aug 27, 2024
615fac7
Update process.yml
GarrettEHill Aug 27, 2024
3f778cd
change needed to not require redis if no redis service is in env
GarrettEHill Aug 27, 2024
37f3c24
change to not require s3 if there is no s3 in the env
GarrettEHill Aug 27, 2024
2e98cb2
remove unneeded services
GarrettEHill Aug 27, 2024
fe112c6
process reports in batches to reduce the memory load
GarrettEHill Aug 27, 2024
9a34d05
better check for s3 existing
GarrettEHill Aug 27, 2024
c15f0fe
second attempt to remove the requirement for redis when using the cod…
GarrettEHill Aug 27, 2024
972dfb9
Update run.sh
GarrettEHill Aug 27, 2024
5026424
give it more memory and make the node limit dynamic
GarrettEHill Aug 27, 2024
4f8d0ab
fix memory format check, add force garbage collection
GarrettEHill Aug 27, 2024
c63becd
Update run.sh
GarrettEHill Aug 28, 2024
6f2f4a7
Update run.sh
GarrettEHill Aug 28, 2024
10803d7
Update run.sh
GarrettEHill Aug 28, 2024
adaf7db
Update process.yml
GarrettEHill Aug 28, 2024
2672172
env var is not correct
GarrettEHill Aug 28, 2024
b671c02
bc not available in buildpack
GarrettEHill Aug 28, 2024
6ad94c0
Update process.yml
GarrettEHill Aug 28, 2024
45d7cc9
more ram
GarrettEHill Aug 28, 2024
912d05b
try different manifest structure
GarrettEHill Aug 28, 2024
ef51f18
Merge branch 'main' into TTAHUB-3061/process-data
GarrettEHill Aug 28, 2024
df93fd3
Update yarn-audit-known-issues
GarrettEHill Aug 28, 2024
06c47b5
Update s3.test.js
GarrettEHill Aug 28, 2024
3cd6fd6
Update s3.test.js
GarrettEHill Aug 28, 2024
538109f
Update s3.test.js
GarrettEHill Aug 28, 2024
9dd4219
updates to support running without s3
GarrettEHill Aug 28, 2024
0547f12
test changes to support running without s3
GarrettEHill Aug 28, 2024
c44f86c
try to get the right configuration to get enough memory
GarrettEHill Aug 28, 2024
932051f
memory needs to be passed into run-task for it to not use the default
GarrettEHill Aug 28, 2024
676b902
yq not available
GarrettEHill Aug 28, 2024
a44ccff
Update cf_lambda.sh
GarrettEHill Aug 29, 2024
3cf44fa
Update cf_lambda.sh
GarrettEHill Aug 29, 2024
3cc3921
try to clean up the app more for each use
GarrettEHill Aug 29, 2024
f6ae629
move where the memory is parsed out
GarrettEHill Aug 29, 2024
c2573ef
refresh log monitoring if it gets disconnected
GarrettEHill Aug 29, 2024
6089249
correct the path
GarrettEHill Aug 29, 2024
ed3806b
revert that last change
GarrettEHill Aug 29, 2024
4b336df
Merge branch 'main' into TTAHUB-3061/process-data
GarrettEHill Aug 29, 2024
2ed3b65
adjust ram
GarrettEHill Aug 29, 2024
6183180
increase memory
GarrettEHill Aug 29, 2024
b5bdaa3
Refactor Process data script to perform the operation more on the pos…
GarrettEHill Aug 30, 2024
e8ecdc6
corrections for refactor
GarrettEHill Aug 30, 2024
972fc60
add comments
GarrettEHill Aug 30, 2024
d8952b2
name change required
GarrettEHill Aug 30, 2024
699dbda
Update latest_backup.sh
GarrettEHill Aug 30, 2024
cc20c78
Update latest_backup.sh
GarrettEHill Aug 30, 2024
1cccdc1
set up daily job
GarrettEHill Aug 30, 2024
dac4772
lint
GarrettEHill Aug 31, 2024
ef20090
lint
GarrettEHill Aug 31, 2024
22b08a4
clean up
GarrettEHill Sep 3, 2024
64dc065
Merge branch 'main' into TTAHUB-3061/process-data
GarrettEHill Sep 3, 2024
751c026
Update processData.test.js
GarrettEHill Sep 3, 2024
1044171
Update s3.test.js
GarrettEHill Sep 3, 2024
2fbb9f9
Update processData.js
GarrettEHill Sep 3, 2024
a91c0b3
Merge branch 'main' into TTAHUB-3061/process-data
GarrettEHill Oct 2, 2024
a28e242
refactor tests
GarrettEHill Oct 3, 2024
54d5e05
add obfuscation to training report data
GarrettEHill Oct 8, 2024
d2782e3
process all og the granteenames on activity reports
GarrettEHill Oct 8, 2024
80e0449
lint
GarrettEHill Oct 8, 2024
42364c9
Merge branch 'main' into TTAHUB-3061/process-data
GarrettEHill Oct 8, 2024
8213711
Merge branch 'main' into TTAHUB-3061/process-data
GarrettEHill Oct 9, 2024
419c127
Merge branch 'main' into TTAHUB-3061/process-data
GarrettEHill Oct 10, 2024
969eb01
refactor to have less branches
GarrettEHill Oct 11, 2024
e333c39
Fix one test, cleanup
thewatermethod Oct 11, 2024
012ce71
Merge remote-tracking branch 'origin/TTAHUB-3061/process-data' into T…
thewatermethod Oct 11, 2024
4e2ceb7
Update s3.test.js
GarrettEHill Oct 11, 2024
3389c6c
Update s3.test.js
GarrettEHill Oct 11, 2024
e4277f0
Update processData.test.js
GarrettEHill Oct 11, 2024
e963e04
only allow mock for tests
GarrettEHill Oct 11, 2024
12cfde6
Update s3.test.js
GarrettEHill Oct 11, 2024
c9b7385
Update s3.test.js
GarrettEHill Oct 12, 2024
8269b49
Update s3.test.js
GarrettEHill Oct 12, 2024
30b1109
try this
GarrettEHill Oct 14, 2024
32ec01e
Update s3.js
GarrettEHill Oct 14, 2024
f5598e3
Update s3.test.js
GarrettEHill Oct 14, 2024
586cdca
Update s3.test.js
GarrettEHill Oct 14, 2024
ca911b0
Update s3.test.js
GarrettEHill Oct 14, 2024
f61ce34
Update s3.test.js
GarrettEHill Oct 14, 2024
44c724a
Update s3.test.js
GarrettEHill Oct 14, 2024
af0ba2d
Update s3.js
GarrettEHill Oct 15, 2024
73ec01b
Update cf_lambda.sh
GarrettEHill Oct 15, 2024
e4f604f
e2e
GarrettEHill Oct 15, 2024
0190c3d
Update config.yml
GarrettEHill Oct 15, 2024
09979ba
Update process.yml
GarrettEHill Oct 15, 2024
695de26
Update dynamic-manifest.yml
GarrettEHill Oct 16, 2024
cebfade
Update cf_lambda.sh
GarrettEHill Oct 16, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
252 changes: 214 additions & 38 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -167,7 +167,6 @@ commands:
else
echo "Slack notification sent successfully"
fi

notify_slack_deploy:
parameters:
slack_bot_token:
Expand Down Expand Up @@ -378,34 +377,44 @@ commands:
# name: Push maintenance application
# command: |
# cd maintenance_page && cf push -s cflinuxfs4 --vars-file ../<<parameters.deploy_config_file >>
cf_backup:
description: "Login to cloud foundry space with service account credentials, Connect to DB & S3, backup DB to S3"
cf_automation_task:
description: "Login to Cloud Foundry space, run automation task, and send notification"
parameters:
auth_client_secret:
description: "Name of CircleCi project environment variable that
holds authentication client secret, a required application variable"
description: "Name of CircleCi project environment variable that holds authentication client secret"
type: env_var_name
cloudgov_username:
description: "Name of CircleCi project environment variable that
holds deployer username for cloudgov space"
description: "Name of CircleCi project environment variable that holds deployer username for Cloud Foundry space"
type: env_var_name
cloudgov_password:
description: "Name of CircleCi project environment variable that
holds deployer password for cloudgov space"
description: "Name of CircleCi project environment variable that holds deployer password for Cloud Foundry space"
type: env_var_name
cloudgov_space:
description: "Name of CircleCi project environment variable that
holds name of cloudgov space to target for application deployment"
description: "Name of CircleCi project environment variable that holds name of Cloud Foundry space to target for application deployment"
type: env_var_name
rds_service_name:
description: "Name of the rds service to backup"
task_name:
description: "Name of the automation task to run"
type: string
s3_service_name:
description: "Name of the s3 service access"
task_command:
description: "Command to run for the automation task"
type: string
backup_prefix:
description: "prefix name to use for backups"
task_args:
description: "Arguments for the automation task"
type: string
config:
description: "Config prefix for the automation task"
type: string
success_message:
description: "Success message for Slack notification"
type: string
timeout:
description: "Max duration allowed for task"
type: string
default: "300"
directory:
description: 'directory to root to push'
type: string
default: "./automation"
steps:
- run:
name: Install Dependencies
Expand Down Expand Up @@ -456,57 +465,68 @@ commands:
name: Start Log Monitoring
command: |
#!/bin/bash

CONTROL_FILE="/tmp/stop_tail"
rm -f $CONTROL_FILE

# Start tailing logs
cf logs tta-automation &
# Function to start tailing logs
start_log_tailing() {
echo "Starting cf logs for tta-automation..."
cf logs tta-automation &
TAIL_PID=$!
}

# Get the PID of the cf logs command
TAIL_PID=$!
# Start tailing logs for the first time
start_log_tailing

# Wait for the control file to be created
# Monitor the cf logs process
while [ ! -f $CONTROL_FILE ]; do
sleep 1
# Check if the cf logs process is still running
if ! kill -0 $TAIL_PID 2>/dev/null; then
echo "cf logs command has stopped unexpectedly. Restarting..."
start_log_tailing
fi
sleep 1
done

# Kill the cf logs command
kill -9 $TAIL_PID
echo "cf logs command for tta-automation has been terminated."
background: true
- run:
name: cf_lambda - script to trigger backup
name: cf_lambda - script to trigger task
command: |
set -x
json_data=$(jq -n \
--arg automation_dir "./automation" \
--arg manifest "manifest.yml" \
--arg task_name "backup" \
--arg command "cd /home/vcap/app/db-backup/scripts; bash ./db_backup.sh" \
--argjson args '["<< parameters.backup_prefix >>", "<< parameters.rds_service_name >>", "<< parameters.s3_service_name >>"]' \
--arg directory "<< parameters.directory >>" \
--arg config "<< parameters.config >>" \
--arg task_name "<< parameters.task_name >>" \
--arg command "<< parameters.task_command >>" \
--arg timeout_active_tasks "<< parameters.timeout >>" \
--arg timeout_ensure_app_stopped "<< parameters.timeout >>" \
--argjson args '<< parameters.task_args >>' \
'{
automation_dir: $automation_dir,
manifest: $manifest,
directory: $directory,
config: $config,
task_name: $task_name,
command: $command,
timeout_active_tasks: $timeout_active_tasks,
timeout_ensure_app_stopped: $timeout_ensure_app_stopped,
args: $args
}')

# Set execute permission
find ./automation -name "*.sh" -exec chmod +x {} \;

./automation/ci/scripts/cf_lambda.sh "$json_data"
environment:
CF_RDS_SERVICE_NAME: ttahub-prod
CF_S3_SERVICE_NAME: ttahub-db-backups
- run:
name: Generate Message
command: |
if [ ! -z "$CIRCLE_PULL_REQUEST" ]; then
PR_NUMBER=${CIRCLE_PULL_REQUEST##*/}
echo ":download::database: Production backup before PR <$CIRCLE_PULL_REQUEST|$PR_NUMBER> successful!" > /tmp/message_file
echo "<< parameters.success_message >> before PR <$CIRCLE_PULL_REQUEST|$PR_NUMBER> successful!" > /tmp/message_file
else
echo ":download::database: Production backup successful!" > /tmp/message_file
echo "<< parameters.success_message >> successful!" > /tmp/message_file
fi
- notify_slack:
slack_bot_token: $SLACK_BOT_TOKEN
Expand All @@ -524,7 +544,76 @@ commands:

# Logout from Cloud Foundry
cf logout

cf_backup:
description: "Backup database to S3"
parameters:
auth_client_secret: { type: env_var_name }
cloudgov_username: { type: env_var_name }
cloudgov_password: { type: env_var_name }
cloudgov_space: { type: env_var_name }
rds_service_name: { type: string }
s3_service_name: { type: string }
backup_prefix: { type: string }
steps:
- cf_automation_task:
auth_client_secret: << parameters.auth_client_secret >>
cloudgov_username: << parameters.cloudgov_username >>
cloudgov_password: << parameters.cloudgov_password >>
cloudgov_space: << parameters.cloudgov_space >>
task_name: "backup"
task_command: "cd /home/vcap/app/db-backup/scripts; bash ./db_backup.sh"
task_args: '["<< parameters.backup_prefix >>", "<< parameters.rds_service_name >>", "<< parameters.s3_service_name >>"]'
config: "<< parameters.backup_prefix >>-backup"
success_message: ':download::database: "<< parameters.backup_prefix >>" backup'
cf_restore:
description: "Restore backup database from S3"
parameters:
auth_client_secret: { type: env_var_name }
cloudgov_username: { type: env_var_name }
cloudgov_password: { type: env_var_name }
cloudgov_space: { type: env_var_name }
rds_service_name: { type: string }
s3_service_name: { type: string }
backup_prefix: { type: string }
steps:
- run:
name: Validate Parameters
command: |
if [ "<< parameters.rds_service_name >>" = "ttahub-prod" ]; then
echo "Error: rds_service_name cannot be 'ttahub-prod'"
exit 1
fi
- cf_automation_task:
auth_client_secret: << parameters.auth_client_secret >>
cloudgov_username: << parameters.cloudgov_username >>
cloudgov_password: << parameters.cloudgov_password >>
cloudgov_space: << parameters.cloudgov_space >>
task_name: "restore"
task_command: "cd /home/vcap/app/db-backup/scripts; bash ./db_restore.sh"
task_args: '["<< parameters.backup_prefix >>", "<< parameters.rds_service_name >>", "<< parameters.s3_service_name >>"]'
config: "<< parameters.backup_prefix >>-restore"
success_message: ':database: "<< parameters.backup_prefix >>" Restored to "<< parameters.rds_service_name >>"'
timeout: "900"
cf_process:
description: "Process database from S3"
parameters:
auth_client_secret: { type: env_var_name }
cloudgov_username: { type: env_var_name }
cloudgov_password: { type: env_var_name }
cloudgov_space: { type: env_var_name }
steps:
- cf_automation_task:
auth_client_secret: << parameters.auth_client_secret >>
cloudgov_username: << parameters.cloudgov_username >>
cloudgov_password: << parameters.cloudgov_password >>
cloudgov_space: << parameters.cloudgov_space >>
task_name: "process"
task_command: "cd /home/vcap/app/automation/nodejs/scripts; bash ./run.sh"
task_args: '["/home/vcap/app/build/server/src/tools/processDataCLI.js"]'
config: "process"
success_message: ':database: Restored data processed'
directory: "./"
timeout: "1200"
parameters:
cg_org:
description: "Cloud Foundry cloud.gov organization name"
Expand Down Expand Up @@ -580,6 +669,15 @@ parameters:
manual-trigger:
type: boolean
default: false
manual-restore:
type: boolean
default: false
manual-process:
type: boolean
default: false
manual-backup:
type: boolean
default: false
jobs:
build_and_lint:
executor: docker-executor
Expand Down Expand Up @@ -1249,10 +1347,64 @@ jobs:
rds_service_name: ttahub-prod
s3_service_name: ttahub-db-backups
backup_prefix: production
restore_production_for_processing:
docker:
- image: cimg/base:2024.05
steps:
- sparse_checkout:
directories: 'automation'
branch: << pipeline.git.branch >>
- cf_restore:
auth_client_secret: PROD_AUTH_CLIENT_SECRET
cloudgov_username: CLOUDGOV_PROD_USERNAME
cloudgov_password: CLOUDGOV_PROD_PASSWORD
cloudgov_space: CLOUDGOV_PROD_SPACE
rds_service_name: ttahub-process
s3_service_name: ttahub-db-backups
backup_prefix: production
process_production:
executor: docker-executor
steps:
- checkout
- create_combined_yarnlock
- restore_cache:
keys:
# To manually bust the cache, increment the version e.g. v7-yarn...
- v14-yarn-deps-{{ checksum "combined-yarnlock.txt" }}
# If checksum is new, restore partial cache
- v14-yarn-deps-
- run: yarn deps
- run:
name: Build backend assets
command: yarn build
- cf_process:
auth_client_secret: PROD_AUTH_CLIENT_SECRET
cloudgov_username: CLOUDGOV_PROD_USERNAME
cloudgov_password: CLOUDGOV_PROD_PASSWORD
cloudgov_space: CLOUDGOV_PROD_SPACE
process_backup:
docker:
- image: cimg/base:2024.05
steps:
- sparse_checkout:
directories: 'automation'
branch: << pipeline.git.branch >>
- cf_backup:
auth_client_secret: PROD_AUTH_CLIENT_SECRET
cloudgov_username: CLOUDGOV_PROD_USERNAME
cloudgov_password: CLOUDGOV_PROD_PASSWORD
cloudgov_space: CLOUDGOV_PROD_SPACE
rds_service_name: ttahub-process
s3_service_name: ttahub-db-backups
backup_prefix: processed
workflows:
build_test_deploy:
when:
equal: [false, << pipeline.parameters.manual-trigger >>]
and:
- equal: [false, << pipeline.parameters.manual-trigger >>]
- equal: [false, << pipeline.parameters.manual-restore >>]
- equal: [false, << pipeline.parameters.manual-process >>]
- equal: [false, << pipeline.parameters.manual-backup >>]
jobs:
- build_and_lint
- build_and_lint_similarity_api
Expand Down Expand Up @@ -1355,8 +1507,32 @@ workflows:
- << pipeline.parameters.prod_git_branch >>
jobs:
- backup_upload_production
- restore_production_for_processing:
requires:
- backup_upload_production
- process_production:
requires:
- restore_production_for_processing
- process_backup:
requires:
- process_production
manual_backup_upload_production:
when:
equal: [true, << pipeline.parameters.manual-trigger >>]
jobs:
- backup_upload_production
manual_restore_production:
when:
equal: [true, << pipeline.parameters.manual-restore >>]
jobs:
- restore_production_for_processing
manual_process_production:
when:
equal: [true, << pipeline.parameters.manual-process >>]
jobs:
- process_production
manual_process_backup:
when:
equal: [true, << pipeline.parameters.manual-backup >>]
jobs:
- process_backup
Loading