Skip to content

Could you explain the decision to duplicate action_mimic in the low level policy input? #25

@luizmgoncw

Description

@luizmgoncw

Hi, this is an architectural question (not a code bug).

In RealTimePolicyController, action_mimic is fed twice to the policy: once in obs_full and again as future_obs:

obs_full = np.concatenate([action_mimic, obs_proprio])
                
obs_hist = np.array(self.proprio_history_buf).flatten()
self.proprio_history_buf.append(obs_full)
                
future_obs = action_mimic.copy()
                
obs_buf = np.concatenate([obs_full, obs_hist, future_obs])
                
assert obs_buf.shape[0] == self.total_obs_size, f"Expected {self.total_obs_size} obs, got {obs_buf.shape[0]}"
                
obs_tensor = torch.from_numpy(obs_buf).float().unsqueeze(0).to(self.device)
with torch.no_grad():
          raw_action = self.policy(obs_tensor).cpu().numpy().squeeze()

What is the intended role of future_obs?

If this was meant as privileged information during training, shouldn’t the final inference / deployment policy remove this input entirely, rather than using mocked or duplicated data?
In a teacher–student setup, I would expect the student (deployment) model not to take this channel at all.

Is this duplication essential, or a legacy design choice?

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions