-
Notifications
You must be signed in to change notification settings - Fork 40
Open
Description
Hi, this is an architectural question (not a code bug).
In RealTimePolicyController, action_mimic is fed twice to the policy: once in obs_full and again as future_obs:
obs_full = np.concatenate([action_mimic, obs_proprio])
obs_hist = np.array(self.proprio_history_buf).flatten()
self.proprio_history_buf.append(obs_full)
future_obs = action_mimic.copy()
obs_buf = np.concatenate([obs_full, obs_hist, future_obs])
assert obs_buf.shape[0] == self.total_obs_size, f"Expected {self.total_obs_size} obs, got {obs_buf.shape[0]}"
obs_tensor = torch.from_numpy(obs_buf).float().unsqueeze(0).to(self.device)
with torch.no_grad():
raw_action = self.policy(obs_tensor).cpu().numpy().squeeze()
What is the intended role of future_obs?
If this was meant as privileged information during training, shouldn’t the final inference / deployment policy remove this input entirely, rather than using mocked or duplicated data?
In a teacher–student setup, I would expect the student (deployment) model not to take this channel at all.
Is this duplication essential, or a legacy design choice?
Thanks!
Metadata
Metadata
Assignees
Labels
No labels