machina.traj package

Submodules

machina.traj.epi_functional module

These are functions which is applied to episodes.

machina.traj.epi_functional.add_next_obs(data)[source]

Adding next observations to episodes.

Parameters:data (Traj) –
Returns:data
Return type:Traj
machina.traj.epi_functional.centerize_advs(data, eps=1e-06)[source]

Centerizing Advantage Function.

Parameters:
  • data (Traj) –
  • eps (float) – Small value for preventing 0 division.
Returns:

data

Return type:

Traj

machina.traj.epi_functional.compute_advs(data, gamma, lam)[source]

Computing Advantage Function.

Parameters:
  • data (Traj) –
  • gamma (float) – Discount rate
  • lam (float) – Bias-Variance trade-off parameter
Returns:

data

Return type:

Traj

machina.traj.epi_functional.compute_h_masks(data)[source]

Computing masks for hidden state. At the begining of an episode, it remarks 1.

Parameters:data (Traj) –
Returns:data
Return type:Traj
machina.traj.epi_functional.compute_pris(data, qf, targ_qf, targ_pol, gamma, continuous=True, deterministic=True, sampling=1, alpha=0.6, epsilon=1e-06)[source]
machina.traj.epi_functional.compute_rets(data, gamma)[source]

Computing discounted cumulative returns.

Parameters:
  • data (Traj) –
  • gamma (float) – Discount rate
Returns:

data

Return type:

Traj

machina.traj.epi_functional.compute_vs(data, vf)[source]

Computing Value Function.

Parameters:
  • data (Traj) –
  • vf (SVFunction) –
Returns:

data

Return type:

Traj

machina.traj.epi_functional.set_all_pris(data, pri)[source]

machina.traj.traj module

trajectory class

class machina.traj.traj.Traj(max_steps=None)[source]

Bases: object

Trajectory class. A Trajectory is a sequence of episodes. An episode is a sequence of steps.

This class provides batch methods.

add_epis(epis)[source]
add_traj(traj)[source]
full_batch(epoch=1, return_indices=False)[source]

Providing whole trajectory as batch.

Parameters:
  • epoch (int) –
  • return_indices (bool) – If True, indices are also returned.
Returns:

data_map

Return type:

dict of torch.Tensor

get_max_pri()[source]
iterate(batch_size, epoch=1, indices=None, shuffle=True)[source]

Iterate a full of trajectory epoch times.

Parameters:
  • batch_size (int) –
  • epoch (int) –
  • indices (ndarray or torch.Tensor or None) – Selected indices for iteration. If None, whole trajectory is selected.
  • shuffle (bool) –
Returns:

data_map

Return type:

dict of torch.Tensor

iterate_epi(shuffle=True)[source]

Iterating episodes.

Parameters:shuffle (bool) –
Returns:epis
Return type:dict of torch.Tensor
iterate_once(batch_size, indices=None, shuffle=True)[source]

Iterate a full of trajectory once.

Parameters:
  • batch_size (int) –
  • indices (ndarray or torch.Tensor or None) – Selected indices for iteration. If None, whole trajectory is selected.
  • shuffle (bool) –
Returns:

data_map

Return type:

dict of torch.Tensor

iterate_rnn(batch_size, num_epi_per_seq=1, epoch=1)[source]

Iterating batches for rnn. batch shape is (max_seq, batch_size, *)

Parameters:
  • batch_size (int) –
  • num_epi_per_seq (int) – Number of episodes in one sequence for rnn.
  • epoch (int) –
Returns:

batch

Return type:

dict of torch.Tensor

iterate_step(batch_size, step=1, indices=None, shuffle=True)[source]
num_epi
num_step
prioritized_random_batch(batch_size, epoch=1, return_indices=False)[source]
prioritized_random_batch_once(batch_size, return_indices=False, mode='proportional', alpha=0.6, init_beta=0.4, beta_step=6.25e-05)[source]
random_batch(batch_size, epoch=1, indices=None, return_indices=False)[source]

Providing batches which is randomly sampled from trajectory.

Parameters:
  • batch_size (int) –
  • epoch (int) –
  • indices (ndarray or torch.Tensor or None) – Selected indices for iteration. If None, whole trajectory is selected.
  • return_indices (bool) – If True, indices are also returned.
Returns:

data_map

Return type:

dict of torch.Tensor

random_batch_once(batch_size, indices=None, return_indices=False)[source]

Providing a batch which is randomly sampled from trajectory.

Parameters:
  • batch_size (int) –
  • indices (ndarray or torch.Tensor or None) – Selected indices for iteration. If None, whole trajectory is selected.
  • return_indices (bool) – If True, indices are also returned.
Returns:

data_map

Return type:

dict of torch.Tensor

register_epis()[source]

machina.traj.traj_functional module

These are functions which is applied to trajectory.

machina.traj.traj_functional.update_pris(traj, td_loss, indices, alpha=0.6, epsilon=1e-06)[source]