machina.samplers package

Submodules

machina.samplers.epi_sampler module

Sampler class

class machina.samplers.epi_sampler.EpiSampler(env, pol, num_parallel=8, prepro=None, seed=256)[source]

Bases: object

A sampler which sample episodes.

Parameters:
  • env (gym.Env) –
  • pol (Pol) –
  • num_parallel (int) – Number of processes
  • prepro (Prepro) –
  • seed (int) –
sample(pol, max_episodes=None, max_steps=None, deterministic=False)[source]

Switch on sampling processes.

Parameters:
  • pol (Pol) –
  • max_episodes (int or None) – maximum episodes of episodes. If None, this value is ignored.
  • max_steps (int or None) – maximum steps of episodes If None, this value is ignored.
  • deterministic (bool) –
Returns:

epis – Sampled episodes.

Return type:

list of dict

Raises:

ValueError – If max_steps and max_episodes are botch None.

machina.samplers.epi_sampler.mp_sample(pol, env, max_steps, max_episodes, n_steps_global, n_episodes_global, epis, exec_flags, deterministic_flag, process_id, prepro=None, seed=256)[source]

Multiprocess sample. Sampling episodes until max_steps or max_episodes is achieved.

Parameters:
  • pol (Pol) –
  • env (gym.Env) –
  • max_steps (int) – maximum steps of episodes
  • max_episodes (int) – maximum episodes of episodes
  • n_steps_global (torch.Tensor) – shared Tensor
  • n_episodes_global (torch.Tensor) – shared Tensor
  • epis (list) – multiprocessing’s list for sharing episodes between processes.
  • exec_flags (list of torch.Tensor) – execution flag
  • deterministic_flag (torch.Tensor) –
  • process_id (int) –
  • prepro (Prepro) –
  • seed (int) –
machina.samplers.epi_sampler.one_epi(env, pol, deterministic=False, prepro=None)[source]

Sampling an episode.

Parameters:
  • env (gym.Env) –
  • pol (Pol) –
  • deterministic (bool) – If True, policy is deterministic.
  • prepro (Prepro) –
Returns:

epi_length, epi

Return type:

int, dict