Could you explain the decision to duplicate action_mimic in the low level policy input?

Hi, this is an architectural question (not a code bug).

In RealTimePolicyController, action_mimic is fed twice to the policy: once in obs_full and again as future_obs:

```
obs_full = np.concatenate([action_mimic, obs_proprio])
                
obs_hist = np.array(self.proprio_history_buf).flatten()
self.proprio_history_buf.append(obs_full)
                
future_obs = action_mimic.copy()
                
obs_buf = np.concatenate([obs_full, obs_hist, future_obs])
                
assert obs_buf.shape[0] == self.total_obs_size, f"Expected {self.total_obs_size} obs, got {obs_buf.shape[0]}"
                
obs_tensor = torch.from_numpy(obs_buf).float().unsqueeze(0).to(self.device)
with torch.no_grad():
          raw_action = self.policy(obs_tensor).cpu().numpy().squeeze()
```

What is the intended role of future_obs?

If this was meant as privileged information during training, shouldn’t the final inference / deployment policy remove this input entirely, rather than using mocked or duplicated data?
In a teacher–student setup, I would expect the student (deployment) model not to take this channel at all.

Is this duplication essential, or a legacy design choice?

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Could you explain the decision to duplicate action_mimic in the low level policy input? #25

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Could you explain the decision to duplicate action_mimic in the low level policy input? #25

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions