Step: ocm-e2e-clusterpool-cluster-deploy

This step deploys ACM on the selected clusters. All name matching is done after stripping the suffix from the cluster claim name. For example, the cluster claim name hub-1-abc12 would be reduced to hub-1 before being matched. Unless CLUSTER_NAMES is set to "none", at least one cluster must be selected for deployment. The latest edge version of ACM for a particular stream (edge or integration) will be deployed first. After this version of ACM is running, the CSV CR for the multiclusterhub-operator will be modified to use the component image from the CI Registry that was built for the PR.

Container image used for this step: open-cluster-management/builder:go1.16-linux

open-cluster-management/builder:go1.16-linux resolves to an image imported from the specified imagestream tag on the build farm (documentation).

Environment

In addition to the default environment, the step exposes the following:

Variable Name Type Variable Content
COMPONENT_IMAGE_REF Dependency[?] Pull specification for bin image
MAKEFILE Parameter[?] Location of the build harness Makefile for use on OSCI. (default: /opt/build-harness/Makefile.prow)
CLUSTER_CLAIM_FILE Parameter[?] File name that has the cluster claim names. (default: cluster-claims)
CLUSTER_INCLUSION_FILTER Parameter[?] A filter used by grep to pick a set of cluster claims from the CLUSTER_CLAIM_FILE to deploy to. If empty, all clusters will be included. By default, ACM will only be deployed to hub clusters. (default: ^hub-)
CLUSTER_EXCLUSION_FILTER Parameter[?] A filter used by grep to eliminate a set of cluster claims from the CLUSTER_CLAIM_FILE to deploy to. This filter is applied after the CLUSTER_INCLUSION_FILTER. If empty, no clusters will be excluded. By default, ACM will only be deployed to hub clusters.
CLUSTER_NAMES Parameter[?] A comma separated list of cluster claims to deploy to. The names are matched after stripping the suffix from the cluster claim name. If a metadata file for one of the names is not found, this step will throw an error If set, this overrides CLUSTER_CLAIM_FILE, CLUSTER_INCLUSION_FILTER, and CLUSTER_EXCLUSION_FILTER. The special value "none" can be used to indicate that no deploy is to take place.
COMPONENT_NAME Parameter[?] The name of the component used in manifest files. If not given, the value stored in the file COMPONENT_NAME will be used. It can be overridden here if you're using another component from your repo such as an end-to-end test image.
DEPLOY_TIMEOUT Parameter[?] The timeout, in seconds, to wait for an ACM deployment to complete before cancelling it. (default: 1800)
DEPLOY_HUB_ADDITIONAL_YAML Parameter[?] A base64 encoded yaml file to apply to the hub cluster during deployment. Multiple files can be included by separating them with three dashes on a single line before base64 encoding them.
GITHUB_USER Parameter[?] The GitHub user name. (default: acm-cicd-prow-bot)
GITHUB_TOKEN_FILE Parameter[?] The file that stores the GitHub token. Should match credentials stanza mount path. (default: /etc/acm-cicd-github/token)
PIPELINE_REPO Parameter[?] The GitHub repo where CICD pipeline data is stored. Do not include the "https://" prefix or the ".git" suffix. (default: github.com/open-cluster-management/pipeline)
RELEASE_REPO Parameter[?] The GitHub repo where ACM release data is stored. Do not include the "https://" prefix or the ".git" suffix. (default: github.com/open-cluster-management/release)
PIPELINE_STAGE Parameter[?] The pipeline stage to use as the base deployment of ACM. Value is either "edge" or "integration". An invalid value will cause an error. (default: edge)
DEPLOY_REPO Parameter[?] The GitHub repo where the ACM deployment code is stored. Do not include the "https://" prefix or the ".git" suffix. (default: github.com/open-cluster-management/deploy)
QUAY_TOKEN_FILE Parameter[?] The file that stores the Quay token. Should match credentials stanza mount path. (default: /etc/acm-cicd-quay-pull/token)

Source Code

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
#!/bin/bash

shopt -s extglob

ocm_dir=$(mktemp -d -t ocm-XXXXX)
cd "$ocm_dir" || exit 1
export HOME="$ocm_dir"

logf() {
    local logfile="$1" ; shift
    local ts
    ts=$(date --iso-8601=seconds)
    echo "$ts" "$@" | tee -a "$logfile"
}

log_file="${ARTIFACT_DIR}/deploy.log"
log() {
    local ts
    ts=$(date --iso-8601=seconds)
    echo "$ts" "$@" | tee -a "$log_file"
}

# No deployment if CLUSTER_NAMES is "none".
log "Checking for CLUSTER_NAME=none flag."
if [[ "$CLUSTER_NAMES" == "none" ]]; then
    log "CLUSTER_NAME is set to none. Exiting."
    exit 0
fi

# Early validation of PIPELINE_STAGE
case "${PIPELINE_STAGE}" in
    edge)
        ;;
    integration)
        ;;
    *)
        log "ERROR Invalid PIPELINE_STAGE $PIPELINE_STAGE must be either edge or integration."
        exit 1
        ;;
esac

if [[ -z "$COMPONENT_IMAGE_REF" ]]; then
    log "ERROR COMPONENT_IMAGE_REF is empty"
    exit 1
fi

log "Using COMPONENT_IMAGE_REF: $COMPONENT_IMAGE_REF"

cp "$MAKEFILE" ./Makefile || {
    log "ERROR Could not find make file: $MAKEFILE"
    exit 1
}

log "Using MAKEFILE: $MAKEFILE"

# The actual clusters to deploy to.
clusters=()

# If CLUSTER_NAMES is set, convert it to an array.
if [[ -n "$CLUSTER_NAMES" ]]; then
    log "CLUSTER_NAMES is set: $CLUSTER_NAMES"
    IFS="," read -r -a clusters <<< "$CLUSTER_NAMES"
else
    log "CLUSTER_NAMES is not set. Using CLUSTER_CLAIM_FILE '$CLUSTER_CLAIM_FILE'"
    # If CLUSTER_NAMES is not provided, build it from the CLUSTER_CLAIM_FILE,
    # CLUSTER_INCLUSION_FILTER, and CLUSTER_EXCLUSION_FILTER variables.
    # variables.
    
    # strip suffix claims are in the form hub-1-abcde
    log "Strip suffix from cluster claim names."
    while IFS= read -r claim; do
        # strip off the -abcde suffix
        cluster=$( sed -e "s/-[[:alnum:]]\+$//" <<<"$claim" )
        echo "$cluster" >> deployments
    done < "${SHARED_DIR}/${CLUSTER_CLAIM_FILE}"

    # apply inclusion filter
    if [[ -n "$CLUSTER_INCLUSION_FILTER" ]]; then
        log "Applying CLUSTER_INCLUSION_FILTER /$CLUSTER_INCLUSION_FILTER/"
        grep "$CLUSTER_INCLUSION_FILTER" deployments > deployments.bak

        if [[ $(cat deployments.bak | wc -l) == 0 ]]; then
            log "ERROR No clusters left after applying inclusion filter."
            log "Inclusion filter: $CLUSTER_INCLUSION_FILTER"
            log "Original clusters:"
            cat deployments > >(tee -a "$log_file")
            exit 1
        fi

        mv deployments.bak deployments
    fi

    # apply exclusion filter
    if [[ -n "$CLUSTER_EXCLUSION_FILTER" ]]; then
        log "Applying CLUSTER_EXCLUSION_FILTER /$CLUSTER_INCLUSION_FILTER/"
        grep -v "$CLUSTER_EXCLUSION_FILTER" > deployments.bak

        if [[ $(cat deployments.bak | wc -l) == 0 ]]; then
            log "ERROR No clusters left after applying exclusion filter."
            log "Exclusion filter: $CLUSTER_EXCLUSION_FILTER"
            log "Original clusters:"
            cat deployments > >(tee -a "$log_file")
            exit 1
        fi

        mv deployments.bak deployments
    fi

    # read cluster names into array
    read -r -a clusters < deployments
fi

# Verify all clusters have kubeconfig files.
log "Verify that all clusters have kubeconfig files."
for cluster in "${clusters[@]}"; do
    kc_file="${SHARED_DIR}/${cluster}.kc"
    if [[ ! -f "$kc_file" ]]; then
        log "ERROR kubeconfig file not found for $cluster: $kc_file"
        log "Contents of shared directory ${SHARED_DIR}"
        ls "${SHARED_DIR}" > >(tee -a "$log_file")
        exit 1
    fi
done

# Set up git credentials.
log "Setting up git credentials."
if [[ ! -r "${GITHUB_TOKEN_FILE}" ]]; then
    log "ERROR GitHub token file missing or not readable: $GITHUB_TOKEN_FILE"
    exit 1
fi
GITHUB_TOKEN=$(cat "$GITHUB_TOKEN_FILE")
COMPONENT_REPO="github.com/${REPO_OWNER}/${REPO_NAME}"
{
    echo "https://${GITHUB_USER}:${GITHUB_TOKEN}@${PIPELINE_REPO}.git"
    echo "https://${GITHUB_USER}:${GITHUB_TOKEN}@${RELEASE_REPO}.git"
    echo "https://${GITHUB_USER}:${GITHUB_TOKEN}@${DEPLOY_REPO}.git"
    echo "https://${GITHUB_USER}:${GITHUB_TOKEN}@${COMPONENT_REPO}.git"
} >> ghcreds
git config --global credential.helper 'store --file=ghcreds' 

# Set up repo URLs.
pipeline_url="https://${PIPELINE_REPO}.git"
release_url="https://${RELEASE_REPO}.git"
deploy_url="https://${DEPLOY_REPO}.git"
component_url="https://${COMPONENT_REPO}.git"

# Get release branch. This is a Prow variable as described here:
# https://github.com/kubernetes/test-infra/blob/master/prow/jobs.md#job-environment-variables
release="${PULL_BASE_REF}"
log "INFO This PR's base branch is $release"

# See if we need to get release from the release repo.
if [[ "$release" == "main" || "$release" == "master" ]]; then
    log "INFO Current PR is against the $release branch."
    log "INFO Need to get current release version from release repo at $release_url"
    release_dir="${ocm_dir}/release"
    git clone "$release_url" "$release_dir" || {
        log "ERROR Could not clone release repo $release_url"
        exit 1
    }
    release=$(cat "${release_dir}/CURRENT_RELEASE")
    log "INFO Branch from CURRENT_RELEASE is $release"
fi

# Validate release branch. We can only run on release-x.y branches.
if [[ ! "$release" =~ ^release-[0-9]+\.[0-9]+$ ]]; then
    log "ERROR Branch ($release) is not a release branch."
    log "Base branch of PR must match release-x.y"
    exit 1
fi

# Trim "release-" prefix.
release=${release#release-}

# Get pipeline branch.
pipeline_branch="${release}-${PIPELINE_STAGE}"

# Clone pipeline repo.
log "Cloning pipeline repo at branch $pipeline_branch"
pipeline_dir="${ocm_dir}/pipeline"
git clone -b "$pipeline_branch" "$pipeline_url" "$pipeline_dir" || {
    log "ERROR Could not clone branch $pipeline_branch from pipeline repo $pipeline_url"
    exit 1
}

# Get latest snapshot.
log "Getting latest snapshot for $pipeline_branch"
snapshot_dir="$pipeline_dir/snapshots"
cd "$snapshot_dir" || exit 1
manifest_file=$(find . -maxdepth 1 -name 'manifest-*' | sort | tail -n 1)
manifest_file="${manifest_file#./}"
if [[ -z "$manifest_file" ]]; then
    log "ERROR no manifest file found in pipeline/snapshots"
    log "Contents of pipeline/snapshots"
    ls "$snapshot_dir" > >(tee -a "$log_file")
    exit 1
fi

log "Using manifest file name: $manifest_file"

# Trim manifest file name
manifest=${manifest_file#manifest-}
manifest=${manifest%.json}

# Get timestamp from manifest name.
timestamp=$(sed -E 's/-[[:digit:].]+$//' <<< "$manifest")
log "Using timestamp: $timestamp"

# Get version from manifest file name.
version=$(sed -E 's/^[[:digit:]]{4}(-[[:digit:]]{2}){5}-//' <<< "$manifest")
log "Using version: $version"

# Get snapshot.
snapshot="${version}-SNAPSHOT-${timestamp}"
log "Using snapshot: $snapshot"

# Return to work directory.
cd "$ocm_dir" || exit 1

# See if COMPONENT_NAME was provided.
log "Checking COMPONENT_NAME"
if [[ -z "$COMPONENT_NAME" ]]; then
    # It wasn't, so get it from the COMPONENT_NAME file in the COMPONENT_REPO

    # Clone the COMPONENT_REPO
    log "COMPONENT_NAME not provided. Getting it from $COMPONENT_REPO"
    component_dir="${ocm_dir}/component"
    git clone -b "${PULL_BASE_REF}" "$component_url" "$component_dir" || {
        log "ERROR Could not clone branch ${PULL_BASE_REF} of component repo $component_url"
        exit 1
    }

    # Verify COMPONENT_NAME file exists
    component_name_file="${component_dir}/COMPONENT_NAME"
    if [[ ! -r "$component_name_file" ]]; then
        log "ERROR COMPONENT_NAME file does not exist in branch ${PULL_BASE_REF} of component repo $component_url"
        exit 1
    fi

    # Get COMPONENT_NAME
    COMPONENT_NAME=$(cat "$component_name_file")
    if [[ -z "$COMPONENT_NAME" ]]; then
        log "ERROR COMPONENT_NAME file was empty in branch ${PULL_BASE_REF} of component repo $component_url"
        exit 1
    fi
fi

log "Using COMPONENT_NAME: $COMPONENT_NAME"

# Verify COMPONENT_NAME is in the manifest file
image_name_query=".[] | select(.[\"image-name\"]==\"${COMPONENT_NAME}\")"
IMAGE_NAME=$(jq -r "$image_name_query" "$snapshot_dir/$manifest_file" 2> >(tee -a "$log_file"))
if [[ -z "$IMAGE_NAME" ]]; then
    log "ERROR Could not find image $COMPONENT_NAME in manifest $manifest_file"
    log "Contents of manifest $manifest_file"
    cat "$manifest_file" > >(tee -a "$log_file")
    exit 1
fi
IMAGE_NAME="$COMPONENT_NAME"
log "Using IMAGE_NAME: $IMAGE_NAME"
IMAGE_QUERY="quay.io/open-cluster-management/${IMAGE_NAME}@sha256:[[:alnum:]]+"
log "Using IMAGE_QUERY: $IMAGE_QUERY"

# Set up Quay credentials.
log "Setting up Quay credentials."
if [[ ! -r "${QUAY_TOKEN_FILE}" ]]; then
    log "ERROR Quay token file missing or not readable: $QUAY_TOKEN_FILE"
    exit 1
fi
QUAY_TOKEN=$(cat "$QUAY_TOKEN_FILE")

# Set up additional deploy variables
NAMESPACE=open-cluster-management
OPERATOR_DIR=acm-operator

# Function to deploy ACM to a cluster.
# The first parameter is the cluster name without the suffix.
deploy() {
    local _cluster="$1"
    local _log="${ARTIFACT_DIR}/deploy-${_cluster}.log"
    local _status="${ARTIFACT_DIR}/deploy-${_cluster}.status"
    local _kc="${SHARED_DIR}/${_cluster}.kc"

    # Cloning deploy repo
    logf "$_log" "Deploy $_cluster: Cloning deploy repo"
    echo "CLONE" > "${_status}"
    local _deploy_dir="${ocm_dir}/deploy-$_cluster"
    git clone "$deploy_url" "$_deploy_dir" > >(tee -a "$_log") 2>&1 || {
        logf "$_log" "ERROR Deploy $_cluster: Could not clone deploy repo"
        echo "ERROR CLONE" > "${_status}"
        return 1
    }

    echo "CHANGE_DIR" > "${_status}"
    cd "$_deploy_dir" || return 1

    # Save snapshot version
    logf "$_log" "Deploy $_cluster: Using snapshot $snapshot"
    echo "SNAPSHOT" > "${_status}"
    echo "$snapshot" > snapshot.ver

    # Test cluster connection
    logf "$_log" "Deploy $_cluster: Waiting up to 2 minutes to connect to cluster"
    echo "WAIT_CONNECT" > "${_status}"
    local _timeout=120 _elapsed='' _step=10
    while true; do
        # Wait for _step seconds, except for first iteration.
        if [[ -z "$_elapsed" ]]; then
            _elapsed=0
        else
            sleep $_step
            _elapsed=$(( _elapsed + _step ))
        fi

        KUBECONFIG="$_kc" oc project > >(tee -a "$_log") 2>&1 && {
            logf "$_log" "Deploy $_cluster: Connected to cluster after ${_elapsed}s"
            break
        }

        # Check timeout
        if (( _elapsed > _timeout )); then
                logf "$_log" "ERROR Deploy $_cluster: Timeout (${_timeout}s) connecting to cluster"
                echo "ERROR WAIT_CONNECT" > "${_status}"
                return 1
        fi

        logf "$_log" "WARN Deploy $_cluster: Could not connect to cluster. Will retry (${_elapsed}/${_timeout}s)"
    done

    # Generate YAML files
    logf "$_log" "Deploy $_cluster: Waiting up to 2 minutes for start.sh to generate YAML files"
    echo "WAIT_YAML" > "${_status}"
    local _timeout=120 _elapsed='' _step=10
    while true; do
        # Wait for _step seconds, except for first iteration.
        if [[ -z "$_elapsed" ]]; then
            _elapsed=0
        else
            sleep $_step
            _elapsed=$(( _elapsed + _step ))
        fi

        KUBECONFIG="$_kc" QUAY_TOKEN="$QUAY_TOKEN" ./start.sh --silent -t \
            > >(tee -a "$_log") 2>&1 && {
            logf "$_log" "Deploy $_cluster: start.sh generated YAML files after ${_elapsed}s"
            break
        }

        # Check timeout
        if (( _elapsed > _timeout )); then
                logf "$_log" "ERROR Deploy $_cluster: Timeout (${_timeout}s) waiting for start.sh to generate YAML files"
                echo "ERROR WAIT_YAML" > "${_status}"
                return 1
        fi

        logf "$_log" "WARN Deploy $_cluster: Could not create YAML files. Will retry (${_elapsed}/${_timeout}s)"
    done

    # Create namespace
    logf "$_log" "Deploy $_cluster: Creating namespace $NAMESPACE"
    echo "NAMESPACE" > "${_status}"
    KUBECONFIG="$_kc" oc create ns $NAMESPACE \
        > >(tee -a "$_log") 2>&1 || {
        logf "$_log" "ERROR Deploy $_cluster: Error creating namespace $NAMESPACE"
        echo "ERROR NAMESPACE" > "${_status}"
        return 1
    }

    # Wait for namespace 
    logf "$_log" "Deploy $_cluster: Waiting up to 2 minutes for namespace $NAMESPACE"
    echo "WAIT_NAMESPACE" > "${_status}"
    local _timeout=120 _elapsed='' _step=10
    while true; do
        # Wait for _step seconds, except for first iteration.
        if [[ -z "$_elapsed" ]]; then
            _elapsed=0
        else
            sleep $_step
            _elapsed=$(( _elapsed + _step ))
        fi

        # Check for namespace to be created.
        KUBECONFIG="$_kc" oc get ns $NAMESPACE -o name > >(tee -a "$_log") 2>&1 && {
            logf "$_log" "Deploy $_cluster: Namespace $NAMESPACE created after ${_elapsed}s"
            break
        }

        # Check timeout
        if (( _elapsed > _timeout )); then
                logf "$_log" "ERROR Deploy $_cluster: Timeout (${_timeout}s) waiting for namespace $NAMESPACE to be created."
                echo "ERROR WAIT_NAMESPACE" > "${_status}"
                return 1
        fi

        logf "$_log" "WARN Deploy $_cluster: Namespace not yet created. Will retry (${_elapsed}/${_timeout}s)"
    done

    # Apply YAML files in prereqs directory
    logf "$_log" "Deploy $_cluster: Waiting up to 5 minutes to apply YAML files from prereqs directory"
    echo "WAIT_APPLY_PREREQS" > "${_status}"
    local _timeout=300 _elapsed='' _step=15
    local _mch_name='' _mch_status=''
    while true; do
        # Wait for _step seconds, except for first iteration.
        if [[ -z "$_elapsed" ]]; then
            _elapsed=0
        else
            sleep $_step
            _elapsed=$(( _elapsed + _step ))
        fi

        KUBECONFIG="$_kc" oc -n $NAMESPACE apply --openapi-patch=true -k prereqs/ \
            > >(tee -a "$_log") 2>&1 && {
            logf "$_log" "Deploy $_cluster: YAML files from prereqs directory applied after ${_elapsed}s"
            break
        }

        # Check timeout
        if (( _elapsed > _timeout )); then
                logf "$_log" "ERROR Deploy $_cluster: Timeout (${_timeout}s) waiting to apply prereq YAML files"
                echo "ERROR WAIT_APPLY_PREREQS" > "${_status}"
                return 1
        fi

        logf "$_log" "WARN Deploy $_cluster: Unable to apply YAML files from prereqs directory. Will retry (${_elapsed}/${_timeout}s)"
    done

    # Apply YAML files in multicluster hub operator directory
    logf "$_log" "Deploy $_cluster: Waiting up to 5 minutes to apply YAML files from MCH operator directory"
    echo "WAIT_APPLY_MCHO" > "${_status}"
    local _timeout=300 _elapsed='' _step=15
    local _mch_name='' _mch_status=''
    while true; do
        # Wait for _step seconds, except for first iteration.
        if [[ -z "$_elapsed" ]]; then
            _elapsed=0
        else
            sleep $_step
            _elapsed=$(( _elapsed + _step ))
        fi

        KUBECONFIG="$_kc" oc -n $NAMESPACE apply -k "${OPERATOR_DIR}/" \
            > >(tee -a "$_log") 2>&1 && {
            logf "$_log" "Deploy $_cluster: YAML files from MCH operator directory applied after ${_elapsed}s"
            break
        }

        # Check timeout
        if (( _elapsed > _timeout )); then
                logf "$_log" "ERROR Deploy $_cluster: Timeout (${_timeout}s) waiting to apply MCH operator YAML files"
                echo "ERROR WAIT_APPLY_MCHO" > "${_status}"
                return 1
        fi

        logf "$_log" "WARN Deploy $_cluster: Unable to apply YAML files from MCH operator directory. Will retry (${_elapsed}/${_timeout}s)"
    done

    # Wait for MCH pod
    logf "$_log" "Deploy $_cluster: Waiting up to 5 minutes for multiclusterhub-operator pod"
    echo "WAIT_MCHO" > "${_status}"
    local _timeout=300 _elapsed='' _step=15
    local _mcho_name='' _path='' _total='' _ready=''
    while true; do
        # Wait for _step seconds, except for first iteration.
        if [[ -z "$_elapsed" ]]; then
            logf "$_log" "INFO Deploy $_cluster: Setting elapsed time to 0"
            _elapsed=0
        else
            logf "$_log" "INFO Deploy $_cluster: Sleeping for $_step s"
            sleep $_step
            _elapsed=$(( _elapsed + _step ))
        fi
        logf "$_log" "INFO Deploy $_cluster: Elapsed time is ${_elapsed}/${_timeout}s"

        # Get pod names
        logf "$_log" "INFO Deploy $_cluster: Getting pod names."
        KUBECONFIG="$_kc" oc -n $NAMESPACE get pods -o name > pod_names 2> >(tee -a "$_log") || {
            logf "$_log" "WARN Deploy $_cluster: Failed to get pod names. Will retry (${_elapsed}/${_timeout}s)"
            continue
        }
        logf "$_log" "INFO Deploy $_cluster: Current pod names:"
        cat pod_names > >(tee -a "$_log") 2>&1

        # Check for multiclusterhub-operator pod name
        logf "$_log" "INFO Deploy $_cluster: Checking for multiclusterhub-operator pod."
        if ! grep -E --max-count=1 "^pod/multiclusterhub-operator(-[[:alnum:]]+)+$" pod_names > mcho_name 2> /dev/null ; then
            logf "$_log" "WARN Deploy $_cluster: multiclusterhub-operator pod not created yet. Will retry (${_elapsed}/${_timeout}s)"
            continue
        fi
        logf "$_log" "INFO Deploy $_cluster: MCHO pod name:"
        cat mcho_name > >(tee -a "$_log") 2>&1

        _mcho_name=$(cat mcho_name 2> /dev/null)
        logf "$_log" "INFO Deploy $_cluster: Found MCHO pod: '$_mcho_name'"

        # Get IDs of all containers in MCH pod.
        logf "$_log" "INFO Deploy $_cluster: Getting IDs of all containers in MCH-O pod $_mcho_name"
        _path='{range .status.containerStatuses[*]}{@.containerID}{"\n"}{end}'
        KUBECONFIG="$_kc" oc -n $NAMESPACE get "$_mcho_name" \
            -o jsonpath="$_path" > total_containers 2> >(tee -a "$_log") || {
            logf "$_log" "WARN Deploy $_cluster: Failed to get all container IDs. Will retry (${_elapsed}/${_timeout}s)"
            continue
        }

        # Get IDs of all ready containers in MCH pod.
        logf "$_log" "INFO Deploy $_cluster: Getting IDs of all ready containers in MCH-O pod $_mcho_name"
        _path='{range .status.containerStatuses[?(@.ready==true)]}{@.containerID}{"\n"}{end}'
        KUBECONFIG="$_kc" oc -n $NAMESPACE get "$_mcho_name" \
            -o jsonpath="$_path" > ready_containers 2> >(tee -a "$_log") || {
            logf "$_log" "WARN Deploy $_cluster: Failed to get all ready container IDs. Will retry (${_elapsed}/${_timeout}s)"
            continue
        }

        # Check if all containers are ready.
        logf "$_log" "INFO Deploy $_cluster: Checking if all containers are ready in MCH-O pod."
        _total=$(wc -l < total_containers) # redirect into wc so it doesn't print file name as well
        _ready=$(wc -l < ready_containers)
        if (( _total > 0 && _ready == _total )); then
            logf "$_log" "Deploy $_cluster: multiclusterhub-operator pod is ready after ${_elapsed}s"
            break
        fi

        # Check timeout
        logf "$_log" "INFO Deploy $_cluster: Checking for timeout."
        if (( _elapsed > _timeout )); then
                logf "$_log" "ERROR Deploy $_cluster: Timeout (${_timeout}s) waiting for multiclusterhub-operator pod"
                echo "ERROR WAIT_MCHO" > "${_status}"
                return 1
        fi

        logf "$_log" "WARN Deploy $_cluster: Not all containers ready ($_ready/$_total). Will retry (${_elapsed}/${_timeout}s)"
    done

    # Apply YAML files in DEPLOY_HUB_ADDITIONAL_YAML environment variable
    logf "$_log" "Deploy $_cluster: Checking DEPLOY_HUB_ADDITIONAL_YAML environment variable"
    echo "CHECK_ADDITIONAL_YAML" > "${_status}"
    if [[ -z "$DEPLOY_HUB_ADDITIONAL_YAML" ]]; then
        logf "$_log" "Deploy $_cluster: .... DEPLOY_HUB_ADDITIONAL_YAML is empty."
    else
        logf "$_log" "Deploy $_cluster: .... decoding DEPLOY_HUB_ADDITIONAL_YAML"
        echo "DECODE_ADDITIONAL_YAML" > "${_status}"
        cat <<<"$DEPLOY_HUB_ADDITIONAL_YAML" | base64 -d > additional.yaml 2> >(tee -a "$_log") || {
            logf "$_log" "ERROR Deploy $_cluster: Unable to decode contents of DEPLOY_HUB_ADDITIONAL_YAML variable"
            echo "ERROR DECODE_ADDITIONAL_YAML" > "${_status}"
            return 1
        }
        logf "$_log" "Deploy $_cluster: Wait up to 5 minutes to apply YAML files from DEPLOY_HUB_ADDITIONAL_YAML environment variable"
        echo "WAIT_APPLY_ADDITIONAL_YAML" > "${_status}"
        local _timeout=300 _elapsed='' _step=15
        local _mch_name='' _mch_status=''
        while true; do
            # Wait for _step seconds, except for first iteration.
            if [[ -z "$_elapsed" ]]; then
                _elapsed=0
            else
                sleep $_step
                _elapsed=$(( _elapsed + _step ))
            fi
    
            KUBECONFIG="$_kc" oc -n $NAMESPACE apply -f additional.yaml \
                > >(tee -a "$_log") 2>&1 && {
                logf "$_log" "Deploy $_cluster: Additional YAML files applied after ${_elapsed}s"
                break
            }
    
            # Check timeout
            if (( _elapsed > _timeout )); then
                    logf "$_log" "ERROR Deploy $_cluster: Timeout (${_timeout}s) waiting to apply additional YAML files"
                    echo "ERROR WAIT_APPLY_ADDITIONAL_YAML" > "${_status}"
                    return 1
            fi
    
            logf "$_log" "WARN Deploy $_cluster: Unable to apply additional YAML files. Will retry (${_elapsed}/${_timeout}s)"
        done
    fi

    # Wait for ClusterServiceVersion
    logf "$_log" "Deploy $_cluster: Waiting up to 10 minutes for CSV"
    echo "WAIT_CSV_1" > "${_status}"
    local _timeout=600 _elapsed='' _step=15
    local _csv_name='' _csv_status=''
    while true; do
        # Wait for _step seconds, except for first iteration.
        if [[ -z "$_elapsed" ]]; then
            _elapsed=0
        else
            sleep $_step
            _elapsed=$(( _elapsed + _step ))
        fi

        # Get CSV name
        KUBECONFIG="$_kc" oc -n $NAMESPACE get csv -o name > csv_name 2> >(tee -a "$_log") || {
            logf "$_log" "WARN Deploy $_cluster: Error getting CSV name. Will retry (${_elapsed}/${_timeout}s)"
            continue
        }

        # Check that CSV name isn't empty
        _csv_name=$(cat csv_name)
        if [[ -z "$_csv_name" ]]; then
            logf "$_log" "WARN Deploy $_cluster: CSV not created yet. Will retry (${_elapsed}/${_timeout}s)"
            continue
        fi

        # Get CSV status
        KUBECONFIG="$_kc" oc -n $NAMESPACE get "$_csv_name" \
            -o json > csv.json 2> >(tee -a "$_log") || {
            logf "$_log" "WARN Deploy $_cluster: Error getting CSV status. Will retry (${_elapsed}/${_timeout}s)"
            continue
        }

        # Check CSV status
        _csv_status=$(jq -r .status.phase csv.json 2> >(tee -a "$_log"))
        case "$_csv_status" in
            Failed)
                logf "$_log" "ERROR Deploy $_cluster: Error CSV install failed after ${_elapsed}s"
                local _msg
                _msg=$(jq -r .status.message csv.json 2> >(tee -a "$_log"))
                logf "$_log" "ERROR Deploy $_cluster: Error message: $_msg"
                logf "$_log" "ERROR Deploy $_cluster: Full CSV"
                jq . csv.json > >(tee -a "$_log") 2>&1
                echo "ERROR WAIT_CSV_1" > "$_status"
                return 1
                ;;
            Succeeded)
                logf "$_log" "Deploy $_cluster: CSV is ready after ${_elapsed}s"
                break
                ;;
        esac

        # Check timeout
        if (( _elapsed > _timeout )); then
                logf "$_log" "ERROR Deploy $_cluster: Timeout (${_timeout}s) waiting for CSV"
                echo "ERROR WAIT_CSV_1" > "${_status}"
                return 1
        fi

        logf "$_log" "WARN Deploy $_cluster: Current CSV status is $_csv_status. Will retry (${_elapsed}/${_timeout}s)"
    done

    # Update CSV
    logf "$_log" "Deploy $_cluster: Updating CSV"
    echo "UPDATE_CSV" > "${_status}"
    # Rewrite CSV. CSV contents are in csv.json
    sed -E "s,$IMAGE_QUERY,$COMPONENT_IMAGE_REF," csv.json > csv_update.json 2> >(tee -a "$_log")
    jq 'del(.metadata.uid) | del(.metadata.resourceVersion)' csv_update.json > csv_clean.json 2> >(tee -a "$_log")
    # Replace CSV on cluster
    KUBECONFIG="$_kc" oc -n $NAMESPACE replace -f csv_clean.json > >(tee -a "$_log") 2>&1 || {
        logf "$_log" "ERROR Deploy $_cluster: Failed to update CSV."
        logf "$_log" "ERROR Deploy $_cluster: New CSV contents"
        jq . csv_clean.json > >(tee -a "$_log") 2>&1
        echo "ERROR_UPDATE_CSV" > "$_status"
        return 1
    }

    # Wait for ClusterServiceVersion
    logf "$_log" "Deploy $_cluster: Waiting up to 10 minutes for CSV"
    echo "WAIT_CSV_2" > "${_status}"
    local _timeout=600 _elapsed=0 _step=15
    local _csv_name='' _csv_status=''
    while true; do
        # Wait for _step seconds, including first iteration
        sleep $_step
        _elapsed=$(( _elapsed + _step ))

        # Get CSV name
        KUBECONFIG="$_kc" oc -n $NAMESPACE get csv -o name > csv_name 2> >(tee -a "$_log") || {
            logf "$_log" "WARN Deploy $_cluster: Error getting CSV name. Will retry (${_elapsed}/${_timeout}s)"
            continue
        }

        # Check that CSV name isn't empty
        _csv_name=$(cat csv_name)
        if [[ -z "$_csv_name" ]]; then
            logf "$_log" "WARN Deploy $_cluster: CSV not created yet. Will retry (${_elapsed}/${_timeout}s)"
            continue
        fi

        # Get CSV status
        KUBECONFIG="$_kc" oc -n $NAMESPACE get "$_csv_name" \
            -o json > csv.json 2> >(tee -a "$_log") || {
            logf "$_log" "WARN Deploy $_cluster: Error getting CSV status. Will retry (${_elapsed}/${_timeout}s)"
            continue
        }

        # Check CSV status
        _csv_status=$(jq -r .status.phase csv.json 2> >(tee -a "$_log"))
        case "$_csv_status" in
            Failed)
                logf "$_log" "ERROR Deploy $_cluster: Error CSV install failed after ${_elapsed}s"
                local _msg
                _msg=$(jq -r .status.message csv.json 2> >(tee -a "$_log"))
                logf "$_log" "ERROR Deploy $_cluster: Error message: $_msg"
                logf "$_log" "ERROR Deploy $_cluster: Full CSV"
                jq . csv.json > >(tee -a "$_log") 2>&1
                echo "ERROR WAIT_CSV_2" > "$_status"
                return 1
                ;;
            Succeeded)
                logf "$_log" "Deploy $_cluster: CSV is ready after ${_elapsed}s"
                break
                ;;
        esac

        # Check timeout
        if (( _elapsed > _timeout )); then
                logf "$_log" "ERROR Deploy $_cluster: Timeout (${_timeout}s) waiting for CSV"
                echo "ERROR WAIT_CSV_2" > "${_status}"
                return 1
        fi

        logf "$_log" "WARN Deploy $_cluster: Current CSV status is $_csv_status. Will retry (${_elapsed}/${_timeout}s)"
    done

    # Apply YAML files in multicluster hub directory
    logf "$_log" "Deploy $_cluster: Wait up to 5 minutes to apply YAML files from MCH directory"
    echo "WAIT_APPLY_MCH" > "${_status}"
    local _timeout=300 _elapsed='' _step=15
    local _mch_name='' _mch_status=''
    while true; do
        # Wait for _step seconds, except for first iteration.
        if [[ -z "$_elapsed" ]]; then
            _elapsed=0
        else
            sleep $_step
            _elapsed=$(( _elapsed + _step ))
        fi

        KUBECONFIG="$_kc" oc -n $NAMESPACE apply -k applied-mch/ \
            > >(tee -a "$_log") 2>&1 && {
            logf "$_log" "Deploy $_cluster: MCH YAML files applied after ${_elapsed}s"
            break
        }

        # Check timeout
        if (( _elapsed > _timeout )); then
                logf "$_log" "ERROR Deploy $_cluster: Timeout (${_timeout}s) waiting to apply MCH YAML files"
                echo "ERROR WAIT_APPLY_MCH" > "${_status}"
                return 1
        fi

        logf "$_log" "WARN Deploy $_cluster: Unable to apply YAML files from MCH directory. Will retry (${_elapsed}/${_timeout}s)"
    done

    # Wait for MultiClusterHub CR to be ready
    logf "$_log" "Deploy $_cluster: Waiting up to 15 minutes for MCH CR"
    echo "WAIT_MCH" > "${_status}"
    local _timeout=900 _elapsed='' _step=15
    local _mch_name='' _mch_status=''
    while true; do
        # Wait for _step seconds, except for first iteration.
        if [[ -z "$_elapsed" ]]; then
            _elapsed=0
        else
            sleep $_step
            _elapsed=$(( _elapsed + _step ))
        fi

        # Get MCH name
        KUBECONFIG="$_kc" oc -n $NAMESPACE get multiclusterhub -o name > mch_name 2> >(tee -a "$_log") || {
            logf "$_log" "WARN Deploy $_cluster: Error getting MCH name. Will retry (${_elapsed}/${_timeout}s)"
            continue
        }

        # Check that MCH name isn't empty
        _mch_name=$(cat mch_name)
        if [[ -z "$_mch_name" ]]; then
            logf "$_log" "WARN Deploy $_cluster: MCH not created yet. Will retry (${_elapsed}/${_timeout}s)"
            continue
        fi

        # Get MCH status
        KUBECONFIG="$_kc" oc -n $NAMESPACE get "$_mch_name" \
            -o json > mch.json 2> >(tee -a "$_log") || {
            logf "$_log" "WARN Deploy $_cluster: Error getting MCH status. Will retry (${_elapsed}/${_timeout}s)"
            continue
        }

        # Check MCH status
        _mch_status=$(jq -r .status.phase mch.json 2> >(tee -a "$_log"))
        if [[ "$_mch_status" == "Running" ]]; then
            logf "$_log" "Deploy $_cluster: MCH CR is ready after ${_elapsed}s"
            break
        fi

        # Check timeout
        if (( _elapsed > _timeout )); then
                logf "$_log" "ERROR Deploy $_cluster: Timeout (${_timeout}s) waiting for MCH CR"
                echo "ERROR WAIT_MCH" > "${_status}"
                return 1
        fi

        logf "$_log" "WARN Deploy $_cluster: Current MCH status is $_mch_status. Will retry (${_elapsed}/${_timeout}s)"
    done

    # Done
    logf "$_log" "Deploy $_cluster: Deployment complete."
    echo "OK" > "${_status}"
    return 0
}

# Function to start an ACM deployment in parallel. The first argument is the
# timeout in seconds to wait. The second parameter is the name of the cluster.
#
# This function uses mostly shell built-ins to minimize forking processes.
# Based on this Stack Overflow comment:
# https://stackoverflow.com/a/50436152/1437822
#
deploy_with_timeout() {
    # Store the timeout and cluster name.
    local _timeout=$1
    local _cluster=$2
    local _step=5
    # Execute the command in the background.
    deploy "$_cluster" &
    # Store the PID of the command in the background.
    local _pid=$!
    # Start the elapsed count.
    local _elapsed=0
    # Check if the command is still running.
    # kill -0 $_pid returns
    #   exit code 0 if the command is running, but does not affect the command
    #   exit code 1 if the command is not running
    while kill -0 $_pid >/dev/null 2>&1 ; do
        # command is still running. wait _step seconds
        sleep $_step
        # increment elapsed time
        _elapsed=$(( _elapsed + _step ))
        # Check if timeout has been reached.
        if (( _elapsed >= _timeout )); then 
            log "Deploy $_cluster: Killing pid $_pid due to timeout (${_elapsed}/${_timeout}s)"
            # Update status
            echo "TIMEOUT at $(date --iso-8601=seconds)" > "${_cluster}.status"
            # Kill deployment
            kill $_pid >/dev/null 2>&1
            break
        fi
    done
}

# Function to gracefully terminate deployments if main script exits
_exit() {
    log "TERMINATE Main script caught an exit signal."
    log "Stopping all deployments."
    kill "$(pgrep -P $$)" >/dev/null 2>&1
}

# Array to store PIDs of deploy processes.
waitgroup=()

# Start a deployment for each cluster
for cluster in "${clusters[@]}"; do
    log "Deploy $cluster: Starting deployment."
    deploy_with_timeout "$DEPLOY_TIMEOUT" "$cluster" &
    pid=$!
    waitgroup+=("$pid")
    log "Deploy $cluster: Started with pid $pid"
done

# Enable trap on EXIT to stop deployments.
trap _exit EXIT

# Wait for deployments to finish.
log "Waiting for ${#waitgroup[@]} deployment(s)."
wait "${waitgroup[@]}"

# Done waiting. Disable EXIT trap.
trap - EXIT

# Check status of all deployments.
log "Deployments done. Checking status."
err=0
for cluster in "${clusters[@]}"; do
    status="${ARTIFACT_DIR}/deploy-$cluster.status"
    if [[ ! -r "$status" ]]; then
        log "Cluster $cluster: ERROR No status file: $status"
        log "Cluster $cluster: See cluster deploy log file (deploy-$cluster.log) for more details."
        err=$(( err + 1 ))
        continue
    fi
    
    status=$(cat "$status")
    if [[ "$status" != OK ]]; then
        log "Cluster $cluster: ERROR Failed with status: $status"
        log "Cluster $cluster: See cluster deploy log file (deploy-$cluster.log) for more details."
        err=$(( err + 1 ))
    fi
done

# Throw error if any deployments failed.
if [[ $err -gt 0 ]]; then
    log "ERROR One or more failed deployments."
    exit 1
fi

log "Deployments complete."

donotuse() {
    # Do X
    logf "$_log" "Deploy $_cluster: Doing X"
    echo "X" > "${_status}"
    KUBECONFIG="$_kc" oc -n $NAMESPACE \
        > >(tee -a "$_log") 2>&1 || {
        logf "$_log" "ERROR Deploy $_cluster: Error doing X"
        echo "ERROR X" > "${_status}"
        return 1
    }

    # Wait for X
    logf "$_log" "Deploy $_cluster: Waiting up to N minutes for X"
    echo "WAIT_X" > "${_status}"
    local _timeout=600 _elapsed='' _step=15
    while true; do
        # Wait for _step seconds, except for first iteration.
        if [[ -z "$_elapsed" ]]; then
            _elapsed=0
        else
            sleep $_step
            _elapsed=$(( _elapsed + _step ))
        fi

        # Check timeout
        if (( _elapsed > _timeout )); then
                logf "$_log" "ERROR Deploy $_cluster: Error waiting for X"
                echo "ERROR WAIT_X" > "${_status}"
                return 1
        fi

        # Do X
        KUBECONFIG="$_kc" oc -n open-cluster-management 2> >(tee -a "$_log") && break

        logf "$_log" "WARN Deploy $_cluster: Failed to do X. Will retry."
    done
}

Properties

Property Value Description
Termination grace period[?] 10m0s Period of time until SIGKILL signal is sent to the test pod (after SIGTERM signal is sent).
Resource requests (cpu) 100m Used in .resources.requests of the pod running this step.
Resource requests (memory) 100Mi Used in .resources.requests of the pod running this step.

GitHub Link:

https://github.com/openshift/release/blob/master/ci-operator/step-registry/ocm/e2e/clusterpool/cluster/deploy/ocm-e2e-clusterpool-cluster-deploy-ref.yaml

Owners:

Approvers:

Source code for this page located on GitHub