Reinforcement Learning Module, Part 2 #22

gabriel-trigo · 2025-03-27T14:47:55Z

Fixes #20 (along with other changes)

This PR should be merged AFTER #18 and #19, as it builds on top of them. Consequently, it has the commits from those PRs, as well as new commits on top

The new commits add:

eval.py script to evaluate policies
generate_gin_config_files.py script to generate variations of gin environment config files
Added visualization module with features to save plots of evaluation runs to eval results
Added implementation of td3 and ddpg agents
Other qol changes (see commit descriptions)

Tests pass

…tory is to have all reinforcement learning related code, and scripts to train and evaluate agents

… data

…or progress of RL experiments

…gin config file, and generates variations changing imporant parameters like time step length, start date and number of days on episode. Also added .bash script to illustrate how to use it

…ntations that were added in previous commits. Changed some default argument values, same for populate_starter_buffer.py script

…visualization module, which this observer uses to plot the graphs

…r of steps instead of episode steps to calculate percentage, leading to > 100% values

… Added saved_model_policy.py file to policies directory -- this file has a class which is used to load and interact with policies saved during training

…eter was redundant with the base_building's time_step_sec

…fore, it was wrongly using the first checkpoint, which gave the untrained agent performance)

…hanges from 111b6e2

smart_control/reinforcement_learning/utils/config.py

This reverts commit f831a1b.

smart_control/reinforcement_learning/agents/sac_agent.py

s2t2 · 2025-05-30T19:27:11Z

smart_control/reinforcement_learning/agents/networks/td3_networks.py

if this file is empty we can delete

smart_control/environment/environment.py

s2t2 · 2025-05-30T19:29:26Z

smart_control/environment/environment.py

+logger = log.getLogger(__name__)

-def all_actions_accepted(action_response: ActionResponse) -> bool:
+logger = log.getLogger(__name__)


is this a duplicate logger from line 84?

s2t2 · 2025-05-30T19:30:29Z

smart_control/reinforcement_learning/agents/td3_agent.py

+        train_step_counter=train_step_counter
+    )
+
+    return td3_agent_obj


If you save the file in VS code with the suggested formatting extension it should automatically add the newline at the end of the file.

smart_control/reinforcement_learning/experiment_scripts/eval_experiments.txt

smart_control/reinforcement_learning/observers/composite_observer.py

smart_control/reinforcement_learning/observers/print_status_observer.py

smart_control/reinforcement_learning/observers/rendering_observer.py

smart_control/reinforcement_learning/observers/trajectory_recorder_observer.py

s2t2 · 2025-05-30T19:34:16Z

smart_control/reinforcement_learning/utils/MultiEpisodeWrapper.py

let's use snake case for the file name

s2t2 · 2025-06-06T19:57:17Z

smart_control/reinforcement_learning/scripts/generate_gin_config_files.bash

@@ -0,0 +1,8 @@
+#!/bin/bash
+
+# Run the generator script


Actually let's have the Python code use default directories that exist inside the repo and are git-ignored. Let's move all these bash files to markdown commands listed in the docs. And hopefully we won't have to pass any directory paths because we can use the gitignored default directories.

s2t2 · 2025-05-30T19:46:29Z

smart_control/reinforcement_learning/utils/MultiEpisodeWrapper.py

+      first environment loaded.
+    """
+    def __init__(self, scenario_config_paths: collections.abc.Sequence[str], create_env_fn: collections.abc.Callable):
+        """


Looks like all these files are using four spaces instead of two. If you touch them with VS code after the suggested extensions are installed, it should automatically re-format

Now that Pyink tool was fixed, the touching and saving in vs code should now work #95

s2t2 · 2025-05-30T19:48:00Z

smart_control/reinforcement_learning/visualization/trajectory_plotter.py

+
+logger = logging.getLogger(__name__)
+
+class TrajectoryPlotter:


it would be cool to add a screenshot of the plots produced by each of the various plotting utilities introduced in this PR. we can then include the screenshots in the docs.

smart_control/reinforcement_learning/utils/config.py

s2t2

General comments from initial review:

let's fix all the merge conflicts
let's update the formatting of all files - should happen automatically when using the recommended VS code extensions
let's see if we can convert the bash files to README commands if possible. we will probably want to tweak various parameters (including filepaths as necessary), so checking in these commands may cause many updates
for import statements let's import a given function on a single line (can be very long - we should be allowing that right now with the updated formatting rules)
also! some tests would be ideal / nice to have, even if they are very simple

Let me know when you would like me to review the next iteration.

Thank you for all your contributions!

.gitignore

…This commit fixes that

gabriel-trigo · 2025-05-31T22:48:29Z

Hey, my bad, a pretty silly error here: I forgot to save the files after solving the merge conflicts.

The latest commits should address that. I also used isort to make styling improvements, so hopefully that is according to the google standards now @s2t2

s2t2

Hi @gabriel-trigo thanks for fixing the merge conflicts.

The biggest changes needed right now are to:

Fix the file formatting in all files (tab using two spaces instead of four). You should be able to leverage the linting tools - it may help to install pre-commit hooks
Remove the bash files, and move the Python commands in them to new or existing markdown files in the docs/ directory.
Change all directory path references to use default directory paths that are relative to the repository root directory. we can leverage existing paths in "smart_control/reinforcement_learning/utils/config.py" and/or create new paths there as applicable

Ping me again when these changes are implemented and I can take another look.

Thank you!!

s2t2 · 2025-06-04T23:44:17Z

pyproject.toml

 known_first_party = ["smart_control"]
 skip_glob = ['smart_control/proto/*']

+


We might need to re-add tqdm to the toml.

s2t2 · 2025-06-04T23:45:24Z

smart_control/reinforcement_learning/agents/networks/td3_networks.py

This td3 networks file looks empty. let's move the network classes here, or delete the file.

s2t2 · 2025-06-04T23:47:14Z

smart_control/reinforcement_learning/agents/td3_agent.py

+
+    def __init__(
+        self,
+        input_tensor_spec,


It might be helpful to add type hints

s2t2 · 2025-06-04T23:50:05Z

smart_control/reinforcement_learning/agents/td3_agent.py

+        )
+
+    def call(self, observations, step_type=None, network_state=(), training=False):
+        del step_type  # Unused.


Should we remove this unused variable from the function signature ?

s2t2 · 2025-06-05T02:08:04Z

smart_control/reinforcement_learning/agents/td3_agent.py

+        )
+
+    # Create critic networks if not provided
+    # Create critic networks if not provided


Looks like a duplicate comment

s2t2 · 2025-06-06T19:58:47Z

smart_control/reinforcement_learning/scripts/eval.py

+)
+logger = logging.getLogger(__name__)
+
+def find_latest_checkpoint(policy_dir):


can we use a default directory here that is gitignored inside the repo?

s2t2 · 2025-06-06T19:59:32Z

smart_control/reinforcement_learning/scripts/eval.py

+    # If we're here, either there's no checkpoints dir or no checkpoints in it
+    return None
+
+def create_merged_saved_model(policy_dir):


echo comments for using a default directory here as the default parameter value

s2t2 · 2025-06-06T20:01:13Z

smart_control/reinforcement_learning/scripts/eval.py

+    return temp_dir
+
+def evaluate_policy(
+    policy_dir,


echo comments for using a default directory

s2t2 · 2025-06-06T20:02:50Z

smart_control/reinforcement_learning/scripts/eval.py

+    parser = argparse.ArgumentParser(description='Evaluate a trained reinforcement learning policy')
+    parser.add_argument('--policy-dir', type=str, required=True, help='Path to the directory containing the saved policy. To \
+                                                                       use schedule policy, just type `schedule`')
+    parser.add_argument('--gin-config', type=str, default="/home/gabriel-user/projects/sbsim/smart_control/configs/resources/sb1/generated_configs/config_timestepsec-900_numdaysinepisode-7_starttimestamp-2023-07-06.gin", help='Path to the .gin config file')


let's avoid using a user's personal path, and use a relative reference to a place in this repo instead

s2t2 · 2025-06-06T20:03:03Z

smart_control/reinforcement_learning/scripts/eval.py

+        gin_config_path=gin_config_path,
+        experiment_name=args.experiment_name,
+        num_eval_episodes=args.num_eval_episodes
+    )


need line at end of file

s2t2 · 2025-06-06T21:02:40Z

smart_control/reinforcement_learning/notebooks/test.ipynb

looks like this file has some runtime errors

s2t2 · 2025-06-12T17:37:46Z

superseded by #98

gabriel-trigo added 16 commits March 9, 2025 14:30

feat: add reinforcement learning directory. The purpose of this direc…

bb23237

…tory is to have all reinforcement learning related code, and scripts to train and evaluate agents

Merge branch 'reinforcement_learning-gabriel'

f8ac4de

chore: update .gitignore to igonore experiment results, replay buffer…

72241eb

… data

build: add tqdm to project dependencies. Will be used to better monit…

7ec2f9d

…or progress of RL experiments

feat: add generate_gin_config_files.py script, which takes in a base …

989a423

…gin config file, and generates variations changing imporant parameters like time step length, start date and number of days on episode. Also added .bash script to illustrate how to use it

feat: add ddpg agent implementation to agents directory

5c619d2

feat: add td3 implementation to agents directory

e55bd0a

feat: improve train.py script. Added support for td3 and ddpg impleme…

75232f6

…ntations that were added in previous commits. Changed some default argument values, same for populate_starter_buffer.py script

feat: add observer that records and saves trajectories. Also added a …

812565f

…visualization module, which this observer uses to plot the graphs

fix: minor bug in print_status_observer.py. Was using the total numbe…

5694dcb

…r of steps instead of episode steps to calculate percentage, leading to > 100% values

feat: add eval.py script, which is used to evaluate a trained policy.…

6a92d8d

… Added saved_model_policy.py file to policies directory -- this file has a class which is used to load and interact with policies saved during training

docs: add example bash script to run the populate_starter_buffer script

c0b5679

chore: update .gitignore

7f8d416

fix: get rid of step_interval parameter in environment.py (this param…

111b6e2

…eter was redundant with the base_building's time_step_sec

fix: make eval.py script use the latest learned policy checkpoint (be…

4e0dd09

…fore, it was wrongly using the first checkpoint, which gave the untrained agent performance)

tests: fix environment.py tests that were failing to conform to the c…

c15c274

…hanges from 111b6e2

s2t2 reviewed Apr 8, 2025

View reviewed changes

smart_control/reinforcement_learning/utils/config.py Show resolved Hide resolved

s2t2 reviewed Apr 8, 2025

View reviewed changes

smart_control/reinforcement_learning/utils/config.py Outdated Show resolved Hide resolved

gabriel-trigo added 2 commits April 11, 2025 12:15

fix: change learning rate of td3 algorithm

f831a1b

Revert "fix: change learning rate of td3 algorithm"

e698c6d

This reverts commit f831a1b.

s2t2 force-pushed the copybara_push branch from cfad6e2 to da325d2 Compare May 16, 2025 16:27

gabriel-trigo added 2 commits May 30, 2025 14:26

reinforcement learning 2, merged with changes to main

0bf3927

fix: fix merge conflicts I forgot

c1b4322