machina.optims package

Submodules

machina.optims.adamw module

class machina.optims.adamw.AdamW(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0)[source]

Bases: torch.optim.optimizer.Optimizer

Implements AdamW algorithm.

It has been proposed in Fixing Weight Decay Regularization in Adam.

Parameters:
  • params (iterable) – iterable of parameters to optimize or dicts defining parameter groups
  • lr (float, optional) – learning rate (default: 1e-3)
  • betas (Tuple[float, float], optional) – coefficients used for computing running averages of gradient and its square (default: (0.9, 0.999))
  • eps (float, optional) – term added to the denominator to improve numerical stability (default: 1e-8)
  • weight_decay (float, optional) – weight decay (not L2 penalty) (default: 0)
step(closure=None)[source]

Performs a single optimization step.

Parameters:closure (callable, optional) – A closure that reevaluates the model and returns the loss.

machina.optims.distributed_adamw module

class machina.optims.distributed_adamw.DistributedAdamW(params, local_rank, world_size, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0)[source]

Bases: torch.optim.optimizer.Optimizer

Implements AdamW algorithm with distributed settings.

It has been proposed in Fixing Weight Decay Regularization in Adam.

Parameters:
  • params (iterable) – iterable of parameters to optimize or dicts defining parameter groups
  • lr (float, optional) – learning rate (default: 1e-3)
  • betas (Tuple[float, float], optional) – coefficients used for computing running averages of gradient and its square (default: (0.9, 0.999))
  • eps (float, optional) – term added to the denominator to improve numerical stability (default: 1e-8)
  • weight_decay (float, optional) – weight decay (not L2 penalty) (default: 0)
step(closure=None)[source]

Performs a single optimization step.

Parameters:closure (callable, optional) – A closure that reevaluates the model and returns the loss.

machina.optims.distributed_sgd module