machina.pols package

Submodules

machina.pols.argmax_qf_pol module

class machina.pols.argmax_qf_pol.ArgmaxQfPol(ob_space, ac_space, qfunc, rnn=False, normalize_ac=True, data_parallel=False, parallel_dim=0, eps=0.2)[source]

Bases: machina.pols.base.BasePol

Policy with Continuous Qfunction.

Parameters:
  • ob_space (gym.Space) – observation’s space
  • ac_space (gym.Space) – action’s space This should be gym.spaces.Box
  • qfunc (SAVfunc) –
  • rnn (bool) –
  • normalize_ac (bool) – If True, the output of network is spreaded for ac_space. In this situation the output of network is expected to be in -1~1.
  • data_parallel (bool) – If True, network computation is executed in parallel.
  • parallel_dim (int) – Splitted dimension in data parallel.
  • eps (float) – Probability of random action
forward(obs)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

machina.pols.base module

class machina.pols.base.BasePol(ob_space, ac_space, net, rnn=False, normalize_ac=True, data_parallel=False, parallel_dim=0)[source]

Bases: torch.nn.modules.module.Module

Base class of Policy.

Parameters:
  • ob_space (gym.Space) – observation’s space
  • ac_space (gym.Space) – action’s space
  • net (torch.nn.Module) –
  • rnn (bool) –
  • normalize_ac (bool) – If True, the output of network is spreaded for ac_space. In this situation the output of network is expected to be in -1~1.
  • data_parallel (bool) – If True, network computation is executed in parallel.
  • parallel_dim (int) – Splitted dimension in data parallel.
convert_ac_for_real(x)[source]

Converting action which is output of network for real world value.

reset()[source]

reset for rnn’s hidden state.

machina.pols.categorical_pol module

class machina.pols.categorical_pol.CategoricalPol(ob_space, ac_space, net, rnn=False, normalize_ac=True, data_parallel=False, parallel_dim=0)[source]

Bases: machina.pols.base.BasePol

Policy with Categorical distribution.

Parameters:
  • ob_space (gym.Space) – observation’s space
  • ac_space (gym.Space) – action’s space This should be gym.spaces.Discrete
  • net (torch.nn.Module) –
  • rnn (bool) –
  • normalize_ac (bool) – If True, the output of network is spreaded for ac_space. In this situation the output of network is expected to be in -1~1.
  • data_parallel (bool) – If True, network computation is executed in parallel.
  • parallel_dim (int) – Splitted dimension in data parallel.
deterministic_ac_real(obs, hs=None, h_masks=None)[source]

action for deployment

forward(obs, hs=None, h_masks=None)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

machina.pols.deterministic_action_noise_pol module

class machina.pols.deterministic_action_noise_pol.DeterministicActionNoisePol(ob_space, ac_space, net, noise=None, rnn=False, normalize_ac=True, data_parallel=False, parallel_dim=0)[source]

Bases: machina.pols.base.BasePol

Policy with deterministic distribution.

Parameters:
  • ob_space (gym.Space) – observation’s space
  • ac_space (gym.Space) – action’s space. This should be gym.spaces.Box
  • net (torch.nn.Module) –
  • noise (Noise) –
  • rnn (bool) –
  • normalize_ac (bool) – If True, the output of network is spreaded for ac_space. In this situation the output of network is expected to be in -1~1.
  • data_parallel (bool) – If True, network computation is executed in parallel.
  • parallel_dim (int) – Splitted dimension in data parallel.
deterministic_ac_real(obs)[source]

action for deployment

forward(obs, no_noise=False)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

reset()[source]

reset for rnn’s hidden state.

machina.pols.gaussian_pol module

class machina.pols.gaussian_pol.GaussianPol(ob_space, ac_space, net, rnn=False, normalize_ac=True, data_parallel=False, parallel_dim=0)[source]

Bases: machina.pols.base.BasePol

Policy with Gaussian distribution.

Parameters:
  • ob_space (gym.Space) – observation’s space
  • ac_space (gym.Space) – action’s space This should be gym.spaces.Box
  • net (torch.nn.Module) –
  • rnn (bool) –
  • normalize_ac (bool) – If True, the output of network is spreaded for ac_space. In this situation the output of network is expected to be in -1~1.
  • data_parallel (bool) – If True, network computation is executed in parallel.
  • parallel_dim (int) – Splitted dimension in data parallel.
deterministic_ac_real(obs, hs=None, h_masks=None)[source]

action for deployment

forward(obs, hs=None, h_masks=None)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

machina.pols.mixture_gaussian_pol module

class machina.pols.mixture_gaussian_pol.MixtureGaussianPol(ob_space, ac_space, net, normalize_ac=True)[source]

Bases: machina.pols.base.BasePol

deterministic_ac_real(obs)[source]

action for deployment

forward(obs)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

machina.pols.mpc_pol module

machina.pols.multi_categorical_pol module

class machina.pols.multi_categorical_pol.MultiCategoricalPol(ob_space, ac_space, net, rnn=False, normalize_ac=True, data_parallel=False, parallel_dim=0)[source]

Bases: machina.pols.base.BasePol

Policy with Categorical distribution.

Parameters:
  • ob_space (gym.Space) – observation’s space
  • ac_space (gym.Space) – action’s space. This should be gym.spaces.MultiDiscrete
  • net (torch.nn.Module) –
  • rnn (bool) –
  • normalize_ac (bool) – If True, the output of network is spreaded for ac_space. In this situation the output of network is expected to be in -1~1.
  • data_parallel (bool) – If True, network computation is executed in parallel.
  • parallel_dim (int) – Splitted dimension in data parallel.
deterministic_ac_real(obs, hs=None, h_masks=None)[source]

action for deployment

forward(obs, hs=None, h_masks=None)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

machina.pols.random_pol module