본문 바로가기
kubenetes

kubeflow 1.9.0 설치하기

by kyeongseo.oh 2024. 8. 26.
kubernetes version v1.29.8
kubeflow version 1.9.0
kustomize version v5.4.3

1. kustomize를 설치한다.

curl -s "https://raw.githubusercontent.com/kubernetes-sigs/kustomize/master/hack/install_kustomize.sh"  | bash
 
install kustomize /usr/bin/kustomize

 

2. manifest git을 clone 한다.

git clone https://github.com/kubeflow/manifests.git
cd manifests
git checkout v1.9-branch

 

3. apps/jupyter/jupyter-web-app/upstream/base/params.env 수정

secure_cookies를 false로 설정해 http로 접속할 수 있도록 한다.

JWA_UI=default
JWA_PREFIX=/jupyter
JWA_CLUSTER_DOMAIN=cluster.local
JWA_USERID_HEADER=kubeflow-userid
JWA_USERID_PREFIX=
JWA_APP_SECURE_COOKIES=false

 

4. apps/tensorboard/tensorboards-web-app/upstream/base/params.env 수정

secure_cookies를 false로 설정해 http로 접속할 수 있도록 한다.

TWA_CLUSTER_DOMAIN=cluster.local
TWA_USERID_HEADER=kubeflow-userid
TWA_USERID_PREFIX=
TWA_PREFIX=/tensorboards
TWA_APP_SECURE_COOKIES=false

 

5. apps/volumes-web-app/upstream/base/params.env 수정

secure_cookies를 false로 설정해 http로 접속할 수 있도록 한다.

VWA_CLUSTER_DOMAIN=cluster.local
VWA_USERID_HEADER=kubeflow-userid
VWA_USERID_PREFIX=
VWA_PREFIX=/volumes
VWA_APP_SECURE_COOKIES=false

 

6. common/oidc-client/oauth2-proxy/base/deployment.yaml 수정

ndots를 4로 설정한다.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: oauth2-proxy
  labels:
    app: oauth2-proxy
spec:
  replicas: 2
  selector:
    matchLabels:
      app.kubernetes.io/name: oauth2-proxy
  template:
    metadata:
      labels:
        app.kubernetes.io/name: oauth2-proxy
    spec:
      dnsConfig:
        options:
          - name: ndots
            value: "4"

 

7. kustomize를 사용해 kubeflow 설치하기

5분에서 10분 정도 소요된다. 모든 pod가 running이 될 때까지 기다린다.

while ! kustomize build example | kubectl apply -f -; do echo "Retrying to apply resources"; sleep 20; done

 

8. istiod 수정하기

ndots를 4로 수정한다. 수정하지 않으면 jwt token을 validation 하지 못한다.

kubectl edit deployment -n istio-system istiod

    spec:
      dnsConfig:
        options:
          - name: ndots
            value: "4"

 

9. ingress 설정하기

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: kubeflow-ingress
  namespace: istio-system
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
    nginx.ingress.kubernetes.io/ssl-redirect: "false"
    nginx.ingress.kubernetes.io/affinity: "cookie"
    nginx.ingress.kubernetes.io/session-cookie-hash: "sha1"
    nginx.ingress.kubernetes.io/session-cookie-name: "route"
spec:
  ingressClassName: nginx
  defaultBackend:
    service:
      name: istio-ingressgateway
      port:
        number: 80
  rules:
    - host: kubeflow.dd.io
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: istio-ingressgateway
                port:
                  number: 80

 

10. 접속 확인

 

트러블 슈팅

이슈

# pod가 init 중 실패하는 현상이 발생함.

kubeflow           metadata-envoy-deployment-9c7db86d8-rhzxz                1/1     Running                 0             2m12s
kubeflow           metadata-grpc-deployment-d94cc8676-zjdf2                 0/2     Init:CrashLoopBackOff   4 (36s ago)   2m12s
kubeflow           metadata-writer-d9bc4bb89-lflvv                          0/2     Init:CrashLoopBackOff   4 (39s ago)   2m12s
kubeflow           minio-5dc6ff5b96-z6wrh                                   0/2     Init:CrashLoopBackOff   4 (51s ago)   2m11s
kubeflow           ml-pipeline-5846b5b56d-fdsfx                             0/2     Init:CrashLoopBackOff   4 (39s ago)   2m11s
kubeflow           ml-pipeline-persistenceagent-7655ddbcfb-9xckf            0/2     Init:CrashLoopBackOff   4 (50s ago)   2m11s
kubeflow           ml-pipeline-scheduledworkflow-658b675548-xdm9p           0/2     Init:CrashLoopBackOff   4 (42s ago)   2m10s
kubeflow           ml-pipeline-ui-5c66bf88b5-95thg                          0/2     Init:CrashLoopBackOff   4 (35s ago)   2m10s
kubeflow           ml-pipeline-viewer-crd-c4d866c85-49rvf                   0/2     Init:CrashLoopBackOff   4 (37s ago)   2m10s
kubeflow           ml-pipeline-visualizationserver-7c678699d-bgwlm          0/2     Init:CrashLoopBackOff   4 (32s ago)   2m10s
kubeflow           mysql-5b446b5744-v4n6q                                   0/2     Init:CrashLoopBackOff   4 (32s ago)   2m10s
kubeflow           notebook-controller-deployment-5458bf988b-4p9rz          0/2     Init:CrashLoopBackOff   4 (35s ago)   2m9s
kubeflow           profiles-deployment-79f5cf977d-vkgr9                     0/3     Init:CrashLoopBackOff   4 (32s ago)   2m9s
kubeflow           pvcviewer-controller-manager-7979499b66-7rnqz            0/3     Init:CrashLoopBackOff   3 (23s ago)   2m9s
kubeflow           tensorboard-controller-deployment-78f5598f4b-hv6mj       0/3     Init:CrashLoopBackOff   4 (32s ago)   2m9s
kubeflow           tensorboards-web-app-deployment-6dc87f944-qkb6h          0/2     Init:CrashLoopBackOff   4 (34s ago)


# describe해보니  istio-init 실패
Warning  BackOff    90s (x10 over 3m18s)  kubelet            Back-off restarting failed istio-init in pod volumes-web-app-deployment-db79f546d-4xlc6_kubeflow(999424ac-cfe6-468e-b982-ebbb


# 로그를 확인해보니 iptable 관련 에러였음
# kubectl logs -n kubeflow volumes-web-app-deployment-db79f546d-4xlc6 -c istio-init
2024-08-26T00:38:17.887642Z     info    Running command (with wait lock): iptables-restore --noflush --wait=30
2024-08-26T00:38:17.889031Z     error   Command error output: xtables parameter problem: iptables-restore: unable to initialize table 'nat'
Error occurred at line: 1
Try `iptables-restore -h' or 'iptables-restore --help' for more information.
2024-08-26T00:38:17.889062Z     info    Running command (without lock): iptables-save
2024-08-26T00:38:17.890025Z     error   exit status 2

 

해결 방안

# 필요한 커널 모듈이 로드되어 있는지 확인함.
lsmod | grep -E 'ip_tables|iptable_nat'

# 모듈이 없어서, 다음 명령어로 로드한 후 kubeflow를 재설치 하였음
sudo modprobe ip_tables
sudo modprobe iptable_nat

'kubenetes' 카테고리의 다른 글

mlflow helm 설치  (0) 2024.08.29
airflow helm 설치  (0) 2024.08.29
coredns에 host 등록하기  (0) 2023.12.21
nerfstuio viewer image build  (0) 2023.10.18
kubeflow 1.7.0 설치  (0) 2023.08.29

댓글