quri_parts.algo.optimizer.adam module#
- class quri_parts.algo.optimizer.adam.OptimizerStateAdam(params: npt.NDArray[np.float_], cost: float = 0.0, status: ~quri_parts.algo.optimizer.interface.OptimizerStatus = OptimizerStatus.SUCCESS, niter: int = 0, funcalls: int = 0, gradcalls: int = 0, m: npt.NDArray[np.float_] = <factory>, v: npt.NDArray[np.float_] = <factory>)#
Bases:
OptimizerState
Optimizer state for Adam.
- m: npt.NDArray[np.float_]#
- v: npt.NDArray[np.float_]#
- class quri_parts.algo.optimizer.adam.Adam(lr: float = 0.05, betas: Sequence[float] = (0.9, 0.999), eps: float = 1e-09, ftol: float | None = 1e-05)#
Bases:
Optimizer
Adam optimization algorithm proposed in [1].
- Parameters:
lr – learning rate.
betas – coefficients used in the update rules of the moving averages of the gradient and its magnitude.
betas
represents the robustness of the optimizer. Hence, when using sampling, the higher values forbetas
are recommended.eps – a small scaler number used for avoiding zero division.
ftol – If not None, judge convergence by cost function tolerance. See
ftol()
for details.
- Ref:
[1] Adam: A Method for Stochastic Optimization, Diederik P. Kingma, Jimmy Ba (2014). https://arxiv.org/abs/1412.6980.
- get_init_state(init_params: npt.NDArray[np.float_]) OptimizerStateAdam #
Returns an initial state for optimization.
- step(state: OptimizerState, cost_function: Callable[[npt.NDArray[np.float_]], float], grad_function: Callable[[npt.NDArray[np.float_]], npt.NDArray[np.float_]] | None = None) OptimizerStateAdam #
Run a single optimization step and returns a new state.
- class quri_parts.algo.optimizer.adam.AdaBelief(lr: float = 0.001, betas: Sequence[float] = (0.9, 0.99), eps: float = 1e-16, ftol: float | None = 1e-05)#
Bases:
Adam
AdaBelief optimization algorithm proposed in [1].
- Parameters:
lr – learning rate.
betas – coefficients used in the update rules of the moving averages of the gradient and its magnitude.
betas
represents the robustness of the optimizer. Hence, when using sampling, the higher values forbetas
are recommended.eps – a small scaler number used for avoiding zero division.
ftol – If not None, judge convergence by cost function tolerance. See
ftol()
for details.
- Ref:
[1] AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients, Juntang Zhuang, Tommy Tang, Yifan Ding, Sekhar Tatikonda, Nicha Dvornek, Xenophon Papademetris, James S. Duncan (2020). https://arxiv.org/abs/2010.07468.