Skip to content

Conversation

@benoit-nexthop
Copy link
Contributor

@benoit-nexthop benoit-nexthop commented Nov 15, 2025

Pre-submission checklist

  • I've ran the linters locally and fixed lint errors related to the files I modified in this PR. You can install the linters by running pip install -r requirements-dev.txt && pre-commit install
  • pre-commit run

Summary

This is an early prototype implementation of the fboss2 config commands, with very rudimentary support for config sessions and 2 config commands for interfaces.

Note: the first 2 commits in this stack are actually #628

Test Plan

New unit tests added.

@meta-cla meta-cla bot added the CLA Signed label Nov 15, 2025
@benoit-nexthop benoit-nexthop force-pushed the fboss2-config-prototype branch 4 times, most recently from 0f14820 to 0ef3b1d Compare November 22, 2025 02:05
@benoit-nexthop
Copy link
Contributor Author

This is still WIP, I fixed some small bugs and I'm still working on the unit tests.

@benoit-nexthop benoit-nexthop force-pushed the fboss2-config-prototype branch from 0ef3b1d to 2e88716 Compare November 24, 2025 02:17
@benoit-nexthop
Copy link
Contributor Author

Fixed some concurrency bugs found by adding unit tests, and enhanced test coverage further.

Copy link
Contributor

@joseph5wu joseph5wu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a huge PR with lots of changes, I might have a lot of comments all at once.
Overall I think the code structures and basic idea of how the prototype should be look good to me.
However, there're some clarification questions that I would like to hear your thought:

  1. How many writes/reads thrift object inside one session? My expectation is we only need one read to get the thrift object from the config file at session start and then all the config commands will only need to update the memory thrift object; finally the session commit to actually write the thrift object to a new config file for the services to reload. I know it might be tricky to meet this expectation for a cli binary. But let's make sure your implementation can actually handle multiple commands but also avoid too many read/write
  2. revision control. I'm fine with the tool automatically generates the revision id. However, I'm not sure it will be easy to identify such revision id in the history. Besides, I remember in your design doc, there's some config metadata to map the revision with the command list. I don't see such logic in this prototype.
  3. We don't recommend vendors to change our BUCK files as you won't be able to check the buck build signals. Besides I would also recommend to leave those existing unit tests change out of this PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually don't recommend vendors to touch the BUCK file as they won't be able to verify the changes using buck command.

That's why once we finish landing the PRs, we will adjust the buck file in another diff or the same diff if our on-diff signal complains

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is going to save you time 😁


namespace facebook::fboss {

const CommandTree& kConfigCommandTree() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see this function will become massive once we keep adding more and more sub-commands.
Maybe we can adjust the returned CommandTree as:

root = {
  // Config related subcommands, like applied-info/ history/ reload/ rollback
  // Session control subcommands
}
// make the sub class to manage the sub CommandTree so we don't have to update here every time whether there's a new sub command
root.add(CmdConfigInterface.getCommandTree()); 
root.add(CmdConfigXXX.getCommandTree());
return root;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest refactoring this in a separate change. We already have another change internally that is going to avoid mixing everything together like that.

case utils::ObjectArgTypeId::OBJECT_ARG_TYPE_FAN_PWM:
subCmd->add_option("pwm", args, "Fan PWM (0..100) or 'disable'");
break;
case utils::ObjectArgTypeId::OBJECT_ARG_TYPE_MTU:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about how to better manage the sub-commands and arg types of the new config.
We don't wanna blow up the base class with all the different sub-commands

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest refactoring this in a separate change. We already have another change that is going to avoid mixing everything together like that.

Comment on lines +33 to +97
// Get the path to the session config file (~/.fboss2/agent.conf)
std::string getSessionConfigPath() const;

// Get the path to the system config file (/etc/coop/agent.conf)
std::string getSystemConfigPath() const;

// Get the path to the CLI config directory (/etc/coop/cli)
std::string getCliConfigDir() const;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead defining your own path/directory const function,
let's using https://github.com/facebook/fboss/blob/main/fboss/agent/AgentDirectoryUtil.h

Copy link
Contributor Author

@benoit-nexthop benoit-nexthop Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the code to this:

ConfigSession::ConfigSession() {
  username_ = getUsername();
  std::string homeDir = getHomeDirectory();

  // Use AgentDirectoryUtil to get the config directory path
  // getConfigDirectory() returns /etc/coop/agent, so we get the parent to get
  // /etc/coop
  AgentDirectoryUtil dirUtil;
  fs::path configDir = fs::path(dirUtil.getConfigDirectory()).parent_path();

  sessionConfigPath_ = homeDir + "/.fboss2/agent.conf";
  systemConfigPath_ = (configDir / "agent.conf").string();
  cliConfigDir_ = (configDir / "cli").string();

  initializeSession();
}

however the CLI still needs some more paths of its own, do you want those paths to be defined in AgentDirectoryUtil.h as well even though they are specific to the CLI?

Comment on lines 138 to 148
#ifdef IS_OSS
class TBgpServiceSvIf {
// Stub because the Thrift model for BgpService is not open source yet.
public:
virtual ~TBgpServiceSvIf() {}
virtual void getRunningConfig(std::string&) = 0;
};
#endif

class MockFbossBgpService : public TBgpServiceSvIf {
public:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you just put the definition of class MockFbossBgpService behind #ifdef IS_OSS?
And we can adjust CmdHandlerTestBase to only define bgp related class members when it's not IS_OSS?
I worry by using a stub TBgpServiceSvIf, you might miss some expected functions that the caller might try to call and have an unexpected behavior. While with the #ifdef IS_OSS, we can quickly identify whether the caller is trying to verify any bgp expectation and then we can adjust the test accordingly.

Eventually once we have BGP ready for OSS, we can just easily remove the #ifdef IS_OSS

agentConfig_);
std::string prettyJson =
folly::toPrettyJson(folly::parseJson(std::move(json)));

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's why some of the test is using EXPECT_THRIFT_EQ from nettools/netrisk/thrift/ThriftUtils.h.
We can introduce a helper function to convert thrift obj to pretty json with the same logic here.

Comment on lines +33 to +34
// Save the updated config
ConfigSession::getInstance().saveConfig();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think if we only saveConfig() before commit() to avoid multiple write in the file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to call saveConfig() at the end of each config command otherwise we'll lose the current change.

// Specific revision specified
newRevision = session.rollback(hostInfo, revisions[0]);
}
if (newRevision) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the rollback() return 0 to represent no new version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rollback() should never return 0 because a rollback always creates a new revision that is identical to the revision you rolled back to, so if it returns 0 something is really messed up.


CmdConfigSessionCommitTraits::RetType CmdConfigSessionCommit::queryClient(
const HostInfo& hostInfo) {
auto& session = ConfigSession::getInstance();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this factory function work here in the cli binary?
I'm assuming the implementation will be like:

  • User types fboss2-dev config session start. Then the fboss2-dev binary will try to get this factory member instance, and no such session exists, so it create a new one. Then the binary exits?
  • Then user type fboss2-dev config intf XXX description YYY. So the fboss2-dev binary will run again but because it exited previously, so the getInstance() will still think there's no instance, so it create another copy. But this time, the new copy will actually change the json content and serialize it on the device
  • Now if user type fboss2-dev config intf XXX mtu ZZZ. So the fboss2-dev binary will run for the third time from scratch, IIUC, the getInstance() still think there's no instance and hence load the config again, does that mean the previous changing description is wiped?

My expectation of the config session is every time

  • when a user run session start, the fboss2-dev binary will load the thrift object from current config into memory;
  • and then a user can type as many as config cli they want to change such thrift object but only in memory to avoid too many unnecessary writes;
  • finally when a user types session commit, we actually write the thrift object from memory to this agent-rABC.config and then update the symlink before calling agent reloadConfig() thrift function.

I'm feeling there're some differences b.w your implementations and my expectation. Let me know what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I clarified that during our discussion earlier but getInstance() will automatically clone the config the first time, subsequent calls will returned the already cloned config, hence why when you do config intf XXX description YYY it will change the cloned config and then when you do config intf XXX mtu ZZZ it will further change the cloned config.

@benoit-nexthop benoit-nexthop force-pushed the fboss2-config-prototype branch from 857d77d to 1ce685e Compare December 18, 2025 00:23
This command just calls `reloadConfig()` on the wedge_agent.
Sample output:
```
Config Applied Information:
===========================
Last Applied Time: 2025-10-11 09:29:36.589
Last Coldboot Applied Time: 2025-10-11 06:44:36.741
```
This can handle only one session per user, the current live config is
copied into `~/.fboss2/agent.conf` and when committing an atomic symlink
update is performed on /etc/coop/agent.conf, which is the only config
file supported by this basic prototype.
Add some helper code to process interface-list arguments.
@benoit-nexthop benoit-nexthop force-pushed the fboss2-config-prototype branch from 1ce685e to 79865dc Compare December 23, 2025 20:29
@benoit-nexthop
Copy link
Contributor Author

This PR has been broken out into separate PRs, 1 per commit:
#753
#754
#755
#756
#757
#758
#759
#760
#761

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants