Skip to content

How to do IO args for parallel cmdstan? #1002

@SteveBronder

Description

@SteveBronder

Summary:

I think stan-dev/stan#3033 should hopefully go in next week or so and I'd like to figure out some of the command line arguments and processing those input args related to running multiple chains in a Stan program

We need to figure out

  1. The actual arg to specify multiple parallel chains
  2. How to handle args of initial values of parameters, inverse mass matrices, and output files

Docs also need updated to describe the behavior of the rng with multiple chains as well as any arguments

New args

There's a few ways to specify the number of chains. What I have in the prototype is

STAN_NUM_THREADS=20 examples/dawid/dawid sample num_samples=500 \
  num_warmup=500 data file=examples/dawid/caries_dump.R chains n_chains=8

But idt I like this because, what other args would go into chains? I think what might be better here is to have a parallel "meta arg" so users would write

STAN_NUM_THREADS=20 examples/dawid/dawid sample num_samples=500 \
  num_warmup=500 data file=examples/dawid/caries_dump.R parallel n_chains=8

We want to add this meta arg for specifying the number of parameters and gpu info anyway so I think this is a good opportunity to add it so users can write things like

examples/dawid/dawid sample num_samples=500 \
  num_warmup=500 data file=examples/dawid/caries_dump.R parallel n_chains=8 n_threads=20

Does anyone disagree with this?

Multiple IO

How to handle multiple init and output files is kind of tricky. The definition for init right now is

init=<string>
  Initialization method: "x" initializes randomly between [-x, x], "0" initializes to 0, anything else 
    identifies a file of values
  Valid values: All
  Defaults to "2"

So the only issue is the file of values. I think what we could do here is a simple rule like

init=<string>
  Initialization method: "x" initializes randomly between [-x, x], "0" initializes to 0, anything else
   identifies a file of values. If `n_chain` is greater than one the program will first search
   for "path/to/{init_filename}_{1:n_chain}.{file_ending}" and if not found will then search for
   a file "path/to/{init_filename}.{file_ending}"
  Valid values: All
  Defaults to "2"

So if the user specifies their init file as "my_init.json" we will first look for "my_init_1.json", "my_init_2.json", etc. and if that fails then we look for "my_init.json".

I think we can do the same thing for the initial mass matrix. For the output files we just take the output file they specify and make a new set out output files with a _1:n_chain before the file extension. If anyone has better ideas I'm all ears!

Current Version:

v2.26.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions