Skip to content

Transcripts missing error on v3.0.0 #590

@arslanashraf7

Description

@arslanashraf7

We recently deployed an instance of edx-platform master branch and started seeing the missing files errors, specifically in the case of transcripts. The errors mainly blocked the import/export functionality of the CMS (and a few others on learning MFE video units)

The logged error:

ClientError: An error occurred (404) when calling the HeadObject operation: Not Found
  File "storages/backends/s3.py", line 532, in _open
    f = S3File(name, mode, self)
  File "storages/backends/s3.py", line 132, in __init__
    self.obj.load(**params)
  File "boto3/resources/factory.py", line 565, in do_action
    response = action(self, *args, **kwargs)
  File "boto3/resources/action.py", line 88, in __call__
    response = getattr(parent.meta.client, operation_name)(*args, **params)
  File "botocore/client.py", line 598, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "botocore/context.py", line 123, in wrapper
    return func(*args, **kwargs)
  File "botocore/client.py", line 1061, in _make_api_call
    raise error_class(parsed_response, operation_name)
FileNotFoundError: File does not exist: media/video-transcripts/redacted.sjson
  File "edxval/api.py", line 293, in get_video_transcript_data
    return dict(file_name=video_transcript.filename, content=video_transcript.transcript.file.read())
  File "django/db/models/fields/files.py", line 48, in _get_file
    self._file = self.storage.open(self.name, "rb")
  File "django/core/files/storage/base.py", line 22, in open
    return self._open(name, mode)
  File "storages/backends/s3.py", line 535, in _open
    raise FileNotFoundError("File does not exist: %s" % name)

We believe that the issue was related to file paths. Somehow, the edxval package was looking at a different path for these files than it should. These files were present in the system before our upgrade. We had to downgrade the edxval package to <3.0.0, which resolved the issue for us.

We believe that the issue was related to Django5 upgrade #577, which was released as part of v3.0.0, merged into edx-platform via openedx/edx-platform#36751. Our best guess is that the issue may have been related to how the storages are handled in Django5 upgrade.

Moreover, our assumption is that the Django5 upgrades are made backward compatible, so in an ideal scenario, the v3.0.0 of edxval should not break any storage paths even while using Django4, as the current edx-platform master still uses Django4.

NOTE:

We are using S3 storage for the transcripts.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions