Service Catalogue Production Database Restore to Test environment

Service catalog schema is stored in Postgres database, everyday at 23:00 Test database gets refreshed from production database export dump.

Service catalog Github Repository has refresh script helm_deploy/hmpps-service-catalogue/db_backup_restore.sh and Job schedule config is in helm_deploy/hmpps-service-catalogue/templates/db-backup-restore-cronjob.yaml.

Status of job can be found with kubectl describe jobs -n hmpps-portfolio-management-prod. Job starting with sc-database-backup-restore are for service catalog schema refresh. After successful completion pod status shows completed and describe command shows 0 failed for relevant sc-database-backup-restore job.
Pods Statuses: 0 Active (0 Ready) / 1 Succeeded / 0 Failed

In case of job failure , reasons will be test and production schema’s have mismatches. To investigate/debug issues perform following steps.

  1. Create job based on the failed one.
 kubectl -n hmpps-portfolio-management-prod get jobs sc-database-backup-restore-28847460 -o yaml >  sc-database-backup-restore-debug.yaml 
  1. Update output file, remove all status metadata, rename the job and edit the command/args parameters as below
    - command:
        - /usr/bin/sleep
        args:
            - "10000" 
  1. Apply the config changes, this step will run the job and will keep the pod running.
kubectl -n hmpps-portfolio-management-prod apply -f sc-database-backup-restore-debug.yaml
  1. Connect to the pod with an interactive terminal
kubectl -n hmpps-portfolio-management-prod exec -it sc-database-backup-restore-debug-lsz58 -- /bin/bash
  1. On connected pod script is /tmp/entrypoint.sh , it has both backup and restore commands specified. Debug part is same as any other bash scripts, not covered in this document.

  2. Once the issue is sorted, delete the debug job.

      kubectl delete job [job name] -n hmpps-portfolio-management-prod
    
  3. Alerts will not be cleared even after successful job completion as the previous failed job will still exist. Manually delete any failed jobs to clear the alert.

In case you need to refresh the developement database from production.

  1. Get the cron job details
$ kubectl get cronjob -n hmpps-portfolio-management-prod
NAME                         SCHEDULE        SUSPEND   ACTIVE   LAST SCHEDULE   AGE
sc-database-backup-restore   0 23 * 1-12 *   False     0        12h             190d
update-dependency-info       0 */6 * * *     False     0        5h45m           209d
  1. Create new job restore job
$  kubectl create job --from=cronjob/sc-database-backup-restore manual-sc-database-backup-restore -n hmpps-portfolio-management-prod
job.batch/manual-sc-database-backup-restore created
$ kubectl get pods -n hmpps-portfolio-management-prod| grep restore
manual-sc-database-backup-restore-t7bwd      0/1     ContainerCreating   0              16s
sc-database-backup-restore-28856100-42kfz    0/1     Completed           0              36h
$ kubectl get events -n hmpps-portfolio-management-prod
LAST SEEN   TYPE      REASON             OBJECT                                        MESSAGE
46s         Normal    Scheduled          pod/manual-sc-database-backup-restore-t7bwd   Successfully assigned hmpps-portfolio-management-prod/manual-sc-database-backup-restore-t7bwd to ip-172-20-149-165.eu-west-2.compute.internal
43s         Normal    Pulling            pod/manual-sc-database-backup-restore-t7bwd   Pulling image "ghcr.io/ministryofjustice/hmpps-devops-tools"
46s         Normal    SuccessfulCreate   job/manual-sc-database-backup-restore         Created pod: manual-sc-database-backup-restore-t7bwd
46s         Warning   UnexpectedJob      cronjob/sc-database-backup-restore            Saw a job that the controller did not create or forgot: manual-sc-database-backup-restore

Pod status will remain at ContainerCreating for few mins and later job status will be completed.

$ kubectl get jobs -n hmpps-portfolio-management-prod
NAME                                  COMPLETIONS   DURATION   AGE
manual-sc-database-backup-restore     1/1           6m1s       7m32s.