Skip to content

Conversation

@adeebshihadeh
Copy link
Contributor

@adeebshihadeh adeebshihadeh commented Dec 20, 2025

closes #36460

  • single process (except ffmpeg...)
  • faster than realtime
  • same options as old clips
  • toggle between comma 3X or comma four UI
  • small CI test
  • progress bar

Separate issues hit while working on this:

@github-actions github-actions bot added the tools label Dec 20, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Dec 22, 2025

mici raylib UI Preview

✅ Videos are identical! View Diff Report

@github-actions
Copy link
Contributor

github-actions bot commented Dec 22, 2025

raylib UI Preview

All Screenshots

@adeebshihadeh adeebshihadeh marked this pull request as ready for review December 22, 2025 03:40
@lordphone
Copy link

lordphone commented Dec 23, 2025

Hey Adeeb!

I'm glad to see you got clips to work. After my finals I was struggling a bit to see how RECORD=1 could really help me with simplifying the approach, but after comparing code in this PR to mine I learnt a lot.

Anyways, as I was here already, I also ran some PROFILE_RENDER=50 on clipping. The biggest bottleneck for me is file downloading and possibly GOP decoding.

Our FrameReader decodes one GOP at a time, sequentially.

I have a decent PC, and with FILEREADER_CACHE=1 generating a 2 min clip from 3 segments, high quality I can get:

  • 3.4x realtime (35s)
  • 4x render speed (77 fps)

However with 8 parallel GOP decoding workers that pre-decodes all GOP before rendering I can get:

  • 4.7x realtime (25s)
  • 13x render speed (255 fps), but really its 5.3x (110 fps) if factoring in the decoding time.

After vibing the parallel GOP decoding in run.py, profiling shows decompress_video_data from FrameReader is no longer 58% of the render loop time (3.85s blocking on ffmpeg), with the biggest loop element now being frame capturing at only 9%.

I feel like this is more of a patch than a fix tho, and a more permanent and efficient fix might be perhaps down in FrameReader somewhere. Also instead of FrameReader working as segments are downloading, now it needs to download the whole clip first, then decode, then render - these are issues that would slow down this approach a lot if slow connection.

I hope there's some helpful information in here to achieve faster-than-realtime. I'd be happy to work on anything related as well!

Compare against upstream directclips:
directclips → directclips-parallel-GOP-decoding

Side note: I also found a potential bug in tools/lib/url_file.py when using FILEREADER_CACHE=1 with clip.py. The seek() and read() methods can receive numpy integers from FrameReader's index array, causing overflow in the slice calculations. Fixed it and removed cache folder and now it runs normally with the clip tool - but not sure how deep this affects other files.

@lordphone
Copy link

added streaming. with streaming + parallel gop decoding, 2 min clip decoding without cache is about 2.3x faster. with cache is about 1.8x faster. Just experimenting : )

@adeebshihadeh
Copy link
Contributor Author

Oh nice, feel free to take this over. I probably won’t get back to it for a while.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

clips: directly render frames

3 participants