You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've written the following toy example after seeing weird behavior on a real-life dataset.
I was looking at the model residuals from various objectives (l1, l2, gamma, huber, fair, ...).
The gamma objective in particular had a weird fit -- I thought "wow, my data really isn't gamma distributed!"
But later I used XGBoost to fit the data with a gamma objective, and it worked fine.
I'm not sure if this is a bug, or a use of lower-precision floats, or a difference in how the learning_rate parameter is interpreted, or something else I haven't thought of. Please let me know if you have a good explanation.
(I'm not sure if this example even captures exactly what's going wrong in my real-life dataset, but it seems interesting in its own right.)
importnumpyasnpimportlightgbmaslgbimportxgboostasxgbparam= {
"objective": "gamma",
"n_estimators": 120,
"learning_rate": 0.1,
"n_jobs": 12,
}
lgbmr=lgb.LGBMRegressor(**param)
rng=np.random.default_rng(seed=0)
n, d=10_000, 2X=rng.standard_normal(size=(n, d))
print("Fitting LGB with smaller-scale data.")
beta=np.array([0.1, 0.1])
scale=np.exp(X @ beta)
y=rng.gamma(shape=1, scale=scale, size=n)
lgbmr.fit(X, y, eval_set=[(X, y)])
print()
print("Fitting LGB with larger-scale data -- blows up, gives no warning.")
beta=np.array([1, 1])
scale=np.exp(X @ beta)
y=rng.gamma(shape=1, scale=scale, size=n)
lgbmr.fit(X, y, eval_set=[(X, y)])
print()
print("Fitting XGB with larger-scale data -- seems to work fine.")
param["objective"] ="reg:gamma"xgbr=xgb.XGBRegressor(**param)
xgbr.fit(X, y, eval_set=[(X, y)])
I've written the following toy example after seeing weird behavior on a real-life dataset.
I was looking at the model residuals from various objectives (
l1, l2, gamma, huber, fair, ...
).The gamma objective in particular had a weird fit -- I thought "wow, my data really isn't gamma distributed!"
But later I used XGBoost to fit the data with a gamma objective, and it worked fine.
I'm not sure if this is a bug, or a use of lower-precision floats, or a difference in how the
learning_rate
parameter is interpreted, or something else I haven't thought of. Please let me know if you have a good explanation.(I'm not sure if this example even captures exactly what's going wrong in my real-life dataset, but it seems interesting in its own right.)
The output was as follows:
The text was updated successfully, but these errors were encountered: