Use Kubeflow Pipelines
Kubeflow Pipelines (KFP) is a platform for building and deploying portable, scalable machine learning (ML) workflows based on Docker containers. The KFP SDK allows you to define and manipulate pipelines and components using Python.
TOC
PrerequisitesInstall KFP SDKConfigure KFP to Run with your Object StorageQuick Start ExampleManage Pipelines in the UIAccess the Pipelines DashboardUpload a PipelineCreate a RunInspect Run DetailsRecurring RunsPrerequisites
Install KFP SDK
Start a Jupyter Notebook (or Workbench) in your namespace and install the KFP SDK:
Configure KFP to Run with your Object Storage
When you install Kubeflow with external Object Storage, add a KFP Launcher ConfigMap to configure the storage used by the current namespace or user. You can check the Kubeflow documentation at https://www.kubeflow.org/docs/components/pipelines/operator-guides/configure-object-store/#s3-and-s3-compatible-provider for more details. If no configuration is set, pipeline runs may still try to access a default in-cluster object storage endpoint, which may not match your deployment.
Starting with Kubeflow 1.11.0, the default bundled object storage changes to SeaweedFS. To keep your setup portable across versions, configure Kubeflow Pipelines to use your own Object Storage endpoint instead of depending on a built-in storage service name.
Below is a simple sample for you to start:
For example, set the following values in this ConfigMap to point to your own Object Storage:
defaultPipelineRoot: where to store pipeline intermediate data and artifacts
endpoint: Object Storage service endpoint. It should not start with http or https
disableSSL: whether to disable HTTPS access to the endpoint
region: the region required by your Object Storage provider
credentials: access key and secret key stored in the referenced Secret
After you add this ConfigMap, newly started Kubeflow Pipeline runs automatically read the configuration and store pipeline data in the configured Object Storage.
Quick Start Example
A pipeline is a description of an ML workflow, including all of the components in the workflow and how they combine in the form of a graph.
Below is a simple example of defining a pipeline that prints "Hello, World!" using the KFP SDK.
For more details about how to define and run pipelines, please refer to the official KFP documentation: https://www.kubeflow.org/docs/components/pipelines/user-guides/
Manage Pipelines in the UI
You can also manage pipelines, experiments, and runs directly from the Kubeflow Dashboard.
Access the Pipelines Dashboard
- Log in to the Kubeflow central dashboard.
- Click Pipelines in the sidebar menu.
Upload a Pipeline
If you have compiled your pipeline to a YAML file (e.g., pipeline.yaml from the example above), you can upload it:
- Click Pipelines -> Upload Pipeline.
- Upload a file: Select your
pipeline.yaml. - Pipeline Name: Give it a name (e.g.,
Hello World Pipeline). - Click Create.
Create a Run
To execute the pipeline you just uploaded:
- Click on the pipeline name to open its details.
- Click Create Run.
- Run Name: Enter a descriptive name.
- Experiment: Select an existing experiment or create a new one. Experiments help group related runs.
- Run Parameters: Enter values for any pipeline arguments (e.g.,
recipient:World). - Click Start.
Inspect Run Details
Once the run starts, you will be redirected to the Run Details page.
- Graph: Visualize the steps (components) of your pipeline and their status (Running, Succeeded, Failed).
- Logs: Click on a specific step in the graph to view its container logs in the side panel. This is crucial for debugging.
- Inputs/Outputs: View the artifacts passed between steps or produced as final outputs.
- Visualizations: If your pipeline generates metrics or plots, they will appear in the Run Output or Visualizations tab.
Recurring Runs
You can schedule pipelines to run automatically at specific intervals:
- In the Pipelines list, identify your pipeline.
- Click Create Run but choose Recurring Run as the run type (or navigate to Experiments (KFP) -> Create Recurring Run).
- Trigger: Set the schedule (e.g., Periodic, Cron).
- Parameters: Configure the inputs that will be used for every scheduled execution.
- Click Start.