Skip to main content

Model Performance Parameter Experiment

· 2 min read
Ross Bulat
Full Stack Engineer

This post summarises a model performance experiment using held-out test data to compare classification and regression parameter choices.

Google Colab notebook: Open in Google Colab

Method and Findings

I ran the supplied notebook and added two reproducible parameter experiments using a fixed random seed and held-out test data. For classification, I used the breast-cancer dataset with a standardised radial-basis-function support vector classifier. I varied C (0.1, 1, 10, 100) and gamma (0.001, 0.01, 0.1, and scale) and measured ROC AUC. The best test AUC was 0.9987 at C=10 and gamma=0.01. Increasing model flexibility did not always improve generalisation: at C=100 and gamma=0.1, training AUC reached 1.0000, but test AUC fell to 0.9901, indicating overfitting.

For regression, I used the diabetes dataset with a random-forest regressor. I varied n_estimators (25, 100, 300) and max_depth (2, 4, 8, and unlimited). The best test R² was 0.4995, with 300 trees and max_depth=4; its test RMSE was 52.61. Deeper trees achieved much higher training R² but weaker test performance. For example, 25 trees with depth 8 produced training R² of 0.8662 but test R² of 0.4201. This demonstrates that parameter selection should be based on unseen data rather than training scores. R² is a score, not an error: higher values are better, while RMSE is an error measure and lower values are better.

A strong overall metric does not prove that a model is safe or fair. AUC can hide performance differences between demographic groups and does not select a decision threshold; false positives and false negatives may have unequal consequences, especially in health-related contexts.

Machine-learning professionals should use data collected with appropriate permission, protect confidential information, document limitations and avoid presenting a classroom model as clinically valid. They should also test subgroup performance, consider accessibility and unequal social impacts, maintain reproducible records, and communicate uncertainty honestly. Professional accountability therefore requires combining metrics with governance, human review, security, monitoring and a clearly defined purpose.