quri_parts.algo.optimizer.adam module#
- class OptimizerStateAdam(params, cost=0.0, status=OptimizerStatus.SUCCESS, niter=0, funcalls=0, gradcalls=0, m=<factory>, v=<factory>)#
Bases:
OptimizerStateOptimizer state for Adam.
- Parameters:
params (algo.optimizer.interface.Params) –
cost (float) –
status (OptimizerStatus) –
niter (int) –
funcalls (int) –
gradcalls (int) –
m (algo.optimizer.interface.Params) –
v (algo.optimizer.interface.Params) –
- m: Params#
- v: Params#
- class Adam(lr=0.05, betas=(0.9, 0.999), eps=1e-09, ftol=1e-05)#
Bases:
OptimizerAdam optimization algorithm proposed in [1].
- Parameters:
lr (float) – learning rate.
betas (Sequence[float]) – coefficients used in the update rules of the moving averages of the gradient and its magnitude.
betasrepresents the robustness of the optimizer. Hence, when using sampling, the higher values forbetasare recommended.eps (float) – a small scaler number used for avoiding zero division.
ftol (Optional[float]) – If not None, judge convergence by cost function tolerance. See
ftol()for details.
- Ref:
[1] Adam: A Method for Stochastic Optimization, Diederik P. Kingma, Jimmy Ba (2014). https://arxiv.org/abs/1412.6980.
- get_init_state(init_params)#
Returns an initial state for optimization.
- Parameters:
init_params (algo.optimizer.interface.Params) –
- Return type:
- step(state, cost_function, grad_function=None)#
Run a single optimization step and returns a new state.
- Parameters:
state (OptimizerState) –
cost_function (algo.optimizer.interface.CostFunction) –
grad_function (algo.optimizer.interface.GradientFunction | None) –
- Return type:
- class AdaBelief(lr=0.001, betas=(0.9, 0.99), eps=1e-16, ftol=1e-05)#
Bases:
AdamAdaBelief optimization algorithm proposed in [1].
- Parameters:
lr (float) – learning rate.
betas (Sequence[float]) – coefficients used in the update rules of the moving averages of the gradient and its magnitude.
betasrepresents the robustness of the optimizer. Hence, when using sampling, the higher values forbetasare recommended.eps (float) – a small scaler number used for avoiding zero division.
ftol (Optional[float]) – If not None, judge convergence by cost function tolerance. See
ftol()for details.
- Ref:
[1] AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients, Juntang Zhuang, Tommy Tang, Yifan Ding, Sekhar Tatikonda, Nicha Dvornek, Xenophon Papademetris, James S. Duncan (2020). https://arxiv.org/abs/2010.07468.