Skip to content

Conversation

@keighrim
Copy link
Member

@keighrim keighrim commented Nov 30, 2025

Overview

This release introduces three new fields to the app metadata to improve the orchestration and handling of large models within apps.

Additions

  • analyzer_versions: A new app metadata field (complementing the existing analyzer_version) designed to store version strings for multiple models wrapped in a single CLAMS app (see analyzer_version when using HF model family #251 (comment) for discussion).
  • est_gpu_mem_min and est_gpu_mem_typ: New app metadata fields that allow developers to specify estimated GPU memory usage (minimum and typical) for better resource management.

Changes

  • Updated to the latest mmif-python SDK (1.2.1)
  • Profiling storage: Runtime profiling results (CPU/CUDA architecture, running time) are now stored under a new appProfiling field in the output view metadata (Refactor metadata structure to nest profiling and runtime metrics #261).
    • Note: This is an experimental field intended primarily for human inspection; its internal structure may change without notice.
  • View timestamps: The timestamp property of views is now updated based on the time when the _annotate() call completes. This ensures all views from a single app execution share a consistent "marker" for easier identification (Implement "run ID" property for views to distinguish app executions #269).
  • Running an app that declares an est_gpu_mem_min value in production mode (--production) will now automatically limit the number of Gunicorn workers spawned to prevent GPU out-of-memory (OOM) errors.
  • VRAM usage record: Apps running on CUDA and torch now cache VRAM usage statistics in the local disk cache (typically in $XDG_CACHE_HOME) (gunicorn, torch, and cuda #243). Currently, this information is for logging only, and we plan to add a retrieval API in the future.
    • See current private implementation at
      def _get_profile_path(self, param_hash: str) -> pathlib.Path:
      """
      Get filesystem path for memory profile file.
      Profile files are stored in a per-app directory under user's cache.
      :param param_hash: Hash of parameters from :func:`mmif.utils.cli.describe.generate_param_hash`
      :return: Path to the profile file
      """
      # Sanitize app identifier for filesystem use
      app_id = self.metadata.identifier.replace('/', '-').replace(':', '-')
      cache_base = pathlib.Path(os.environ.get('XDG_CACHE_HOME', pathlib.Path.home() / '.cache'))
      cache_dir = cache_base / 'clams' / 'memory_profiles' / app_id
      return cache_dir / f"memory_{param_hash}.json"

keighrim and others added 16 commits August 10, 2025 22:12
…etadata

added VRAM profile for CUDA device in view metadata
Refactored appRunningTime and appRunningHardware to be nested under
a new appProfiling parent field for better metadata organization.

- appRunningTime -> appProfiling.runningTime
- appRunningHardware -> appProfiling.hardware

Closes #261
* now all gunicorn workers are discarded after processing 1 request
    * this will prevent VRAM leak caused by zombie workers holding VRAM doing nothing
    * but in exchange of some loading time at worker initialization (e.g. model loading)
    * can be overridden by setting `max_requests` passed to `serve_production()` in app.py
* number of gunicorn workers are set reasonably low when app declares GPU usage
* app metadata now has two new fields can be used to declare GPU usage
* for GPU apps, first-time VRAM usage for each parameter combination is now recorded in local cache directory see `$XDG_CACHE_HOME/clams`
updated cuda memory handling and related documentation
overwriting timestamp values of all "current" app's views after `_annotate()`
…rsions

added a new metadata field for multi-model/family app
@keighrim keighrim merged commit 234ae6f into main Nov 30, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants