skill-tree:bda:5:b
Table of Contents
BDA5 Machine Learning
This node encompasses foundational and advanced skills for building, training, evaluating, and optimizing machine learning models in HPC environments. It brings together concepts from classical ML, deep learning, software frameworks, system optimization, and hyperparameter tuning.
Learning Outcomes
- Explain the fundamental differences between supervised, unsupervised, and deep learning approaches.
- Interpret evaluation metrics for different model types and assess performance quality.
- Describe how neural networks are structured and how architecture choice affects modeling tasks.
- Use major ML/DL frameworks such as PyTorch and TensorFlow to implement models.
- Apply distributed training techniques to scale model development across HPC resources.
- Tune performance using batch sizing, mixed precision, and checkpointing for long-running jobs.
- Design and execute hyperparameter search experiments efficiently at scale.
- Implement resource-aware tuning strategies that consider runtime, memory, and energy trade-offs.
- Integrate optimization, monitoring, and reproducibility practices for ML workflows on HPC platforms.
Subskills
skill-tree/bda/5/b.txt · Last modified: 2025/11/05 11:30 by 127.0.0.1
