kubernetes version | v1.29.8 |
kubeflow version | 1.9.0 |
kustomize version | v5.4.3 |
1. kustomize를 설치한다.
curl -s "https://raw.githubusercontent.com/kubernetes-sigs/kustomize/master/hack/install_kustomize.sh" | bash
install kustomize /usr/bin/kustomize
2. manifest git을 clone 한다.
git clone https://github.com/kubeflow/manifests.git
cd manifests
git checkout v1.9-branch
3. apps/jupyter/jupyter-web-app/upstream/base/params.env 수정
secure_cookies를 false로 설정해 http로 접속할 수 있도록 한다.
JWA_UI=default
JWA_PREFIX=/jupyter
JWA_CLUSTER_DOMAIN=cluster.local
JWA_USERID_HEADER=kubeflow-userid
JWA_USERID_PREFIX=
JWA_APP_SECURE_COOKIES=false
4. apps/tensorboard/tensorboards-web-app/upstream/base/params.env 수정
secure_cookies를 false로 설정해 http로 접속할 수 있도록 한다.
TWA_CLUSTER_DOMAIN=cluster.local
TWA_USERID_HEADER=kubeflow-userid
TWA_USERID_PREFIX=
TWA_PREFIX=/tensorboards
TWA_APP_SECURE_COOKIES=false
5. apps/volumes-web-app/upstream/base/params.env 수정
secure_cookies를 false로 설정해 http로 접속할 수 있도록 한다.
VWA_CLUSTER_DOMAIN=cluster.local
VWA_USERID_HEADER=kubeflow-userid
VWA_USERID_PREFIX=
VWA_PREFIX=/volumes
VWA_APP_SECURE_COOKIES=false
6. common/oidc-client/oauth2-proxy/base/deployment.yaml 수정
ndots를 4로 설정한다.
apiVersion: apps/v1
kind: Deployment
metadata:
name: oauth2-proxy
labels:
app: oauth2-proxy
spec:
replicas: 2
selector:
matchLabels:
app.kubernetes.io/name: oauth2-proxy
template:
metadata:
labels:
app.kubernetes.io/name: oauth2-proxy
spec:
dnsConfig:
options:
- name: ndots
value: "4"
7. kustomize를 사용해 kubeflow 설치하기
5분에서 10분 정도 소요된다. 모든 pod가 running이 될 때까지 기다린다.
while ! kustomize build example | kubectl apply -f -; do echo "Retrying to apply resources"; sleep 20; done
8. istiod 수정하기
ndots를 4로 수정한다. 수정하지 않으면 jwt token을 validation 하지 못한다.
kubectl edit deployment -n istio-system istiod
spec:
dnsConfig:
options:
- name: ndots
value: "4"
9. ingress 설정하기
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: kubeflow-ingress
namespace: istio-system
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
nginx.ingress.kubernetes.io/ssl-redirect: "false"
nginx.ingress.kubernetes.io/affinity: "cookie"
nginx.ingress.kubernetes.io/session-cookie-hash: "sha1"
nginx.ingress.kubernetes.io/session-cookie-name: "route"
spec:
ingressClassName: nginx
defaultBackend:
service:
name: istio-ingressgateway
port:
number: 80
rules:
- host: kubeflow.dd.io
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: istio-ingressgateway
port:
number: 80
10. 접속 확인
트러블 슈팅
이슈
# pod가 init 중 실패하는 현상이 발생함.
kubeflow metadata-envoy-deployment-9c7db86d8-rhzxz 1/1 Running 0 2m12s
kubeflow metadata-grpc-deployment-d94cc8676-zjdf2 0/2 Init:CrashLoopBackOff 4 (36s ago) 2m12s
kubeflow metadata-writer-d9bc4bb89-lflvv 0/2 Init:CrashLoopBackOff 4 (39s ago) 2m12s
kubeflow minio-5dc6ff5b96-z6wrh 0/2 Init:CrashLoopBackOff 4 (51s ago) 2m11s
kubeflow ml-pipeline-5846b5b56d-fdsfx 0/2 Init:CrashLoopBackOff 4 (39s ago) 2m11s
kubeflow ml-pipeline-persistenceagent-7655ddbcfb-9xckf 0/2 Init:CrashLoopBackOff 4 (50s ago) 2m11s
kubeflow ml-pipeline-scheduledworkflow-658b675548-xdm9p 0/2 Init:CrashLoopBackOff 4 (42s ago) 2m10s
kubeflow ml-pipeline-ui-5c66bf88b5-95thg 0/2 Init:CrashLoopBackOff 4 (35s ago) 2m10s
kubeflow ml-pipeline-viewer-crd-c4d866c85-49rvf 0/2 Init:CrashLoopBackOff 4 (37s ago) 2m10s
kubeflow ml-pipeline-visualizationserver-7c678699d-bgwlm 0/2 Init:CrashLoopBackOff 4 (32s ago) 2m10s
kubeflow mysql-5b446b5744-v4n6q 0/2 Init:CrashLoopBackOff 4 (32s ago) 2m10s
kubeflow notebook-controller-deployment-5458bf988b-4p9rz 0/2 Init:CrashLoopBackOff 4 (35s ago) 2m9s
kubeflow profiles-deployment-79f5cf977d-vkgr9 0/3 Init:CrashLoopBackOff 4 (32s ago) 2m9s
kubeflow pvcviewer-controller-manager-7979499b66-7rnqz 0/3 Init:CrashLoopBackOff 3 (23s ago) 2m9s
kubeflow tensorboard-controller-deployment-78f5598f4b-hv6mj 0/3 Init:CrashLoopBackOff 4 (32s ago) 2m9s
kubeflow tensorboards-web-app-deployment-6dc87f944-qkb6h 0/2 Init:CrashLoopBackOff 4 (34s ago)
# describe해보니 istio-init 실패
Warning BackOff 90s (x10 over 3m18s) kubelet Back-off restarting failed istio-init in pod volumes-web-app-deployment-db79f546d-4xlc6_kubeflow(999424ac-cfe6-468e-b982-ebbb
# 로그를 확인해보니 iptable 관련 에러였음
# kubectl logs -n kubeflow volumes-web-app-deployment-db79f546d-4xlc6 -c istio-init
2024-08-26T00:38:17.887642Z info Running command (with wait lock): iptables-restore --noflush --wait=30
2024-08-26T00:38:17.889031Z error Command error output: xtables parameter problem: iptables-restore: unable to initialize table 'nat'
Error occurred at line: 1
Try `iptables-restore -h' or 'iptables-restore --help' for more information.
2024-08-26T00:38:17.889062Z info Running command (without lock): iptables-save
2024-08-26T00:38:17.890025Z error exit status 2
해결 방안
# 필요한 커널 모듈이 로드되어 있는지 확인함.
lsmod | grep -E 'ip_tables|iptable_nat'
# 모듈이 없어서, 다음 명령어로 로드한 후 kubeflow를 재설치 하였음
sudo modprobe ip_tables
sudo modprobe iptable_nat
'kubenetes' 카테고리의 다른 글
mlflow helm 설치 (0) | 2024.08.29 |
---|---|
airflow helm 설치 (0) | 2024.08.29 |
coredns에 host 등록하기 (0) | 2023.12.21 |
nerfstuio viewer image build (0) | 2023.10.18 |
kubeflow 1.7.0 설치 (0) | 2023.08.29 |
댓글