Skip to content

Update the embed_content in RecommendationsAdapter #4455

@akolson

Description

@akolson

Overview

This task involves updating embed_content in RecommendationsAdapter to be able to get all file URLs for each node from which textual content will be extracted.

Description and outcomes

  • Update the embed_content in RecommendationsAdapter
    • The embed_content accepts a list of nodes(ContentNode) as a parameter.
    • For each node, find all file URLs to be extracted
    • Use kind and preset fields in ContentNode and File models respectively to determine which file URLs to extract.
    • Currently, all studio files are store in this bucket
  • Finding file URLs
    • Audio files (mp3)
      • Return the corresponding URL(s)
    • Video files (mp4, web)
      • Return the corresponding subtitle URL(s) if they exist, else return corresponding URL(s) for the actual video files
    • HTML files (html5)
      • HTML files are uploaded as zip files and extracted into this bucket
      • Return the corresponding URL(s) of the extracted zip location.
    • H5P files (h5p)
      • Return the corresponding URL(s)
    • ZIM files (zim)
      • Return the corresponding URL(s)
    • Document files (pdf, epub)
      • Return the corresponding URL(s)
    • Exercise files (Perseus)
      • Return the corresponding URL(s)
  • Making a request
    • Make a request to the recommendations backend. For example
      body = {
         'resources': resources,
         'metadata': {}
      }
      embed_content_request = EmbedContentRequest(
         headers={},  # Leaving this to allow for passing of headers to external api
         params={},  # Same for this
         body=body
      )
      return self.backend.make_request(embed_content_request)
      
      Where resources is the updated embed_content_request.json

Acceptance Criteria

  • All file URLs associated with the passed nodes in embed_content are extracted correctly

Assumptions and Dependencies

Scope

The scope of this task is limited to;

  • Updating embed_content to gather all file URLs required for content extraction.

Accessibility Requirements

NA

Resources

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions