Skip to content

Collection Readers

Sean Finan edited this page Dec 19, 2025 · 1 revision

A Collection Reader is the first component in a Pipeline that reads documents for the rest of the pipeline to process. A reader can read documents from files on disk, cells in a database, lines in a file, content on the internet, text typed at a prompt, and so forth.
The table below contains readers that come with cTAKES.

Name Description
Database Reader Read documents from a database.
Dependency File Reader Reads in dependency tree training/test data in a tab-delimited format.
FhirJsonFileReader Reads fhir information from json.
FhirXmlFileReader Reads fhir information from xml.
File Tree Reader Reads document texts from text files in a directory tree.
Files in Dir Cycle Reader Reads document texts from text files in a directory, repeating for a number of iterations.
Files in Dir Reader Reads document texts from text files in a directory.
JDBC Note Table Reader Reads document texts from database table's fields.
JDBC Reader Reads document texts from database text fields.
Lines in File Reader Reads a document texts from a single text file, treating each line as a document.
Lucene Field Reader Reads document texts from Lucene text fields.
NegEx Corpus Reader Reads lines from file named by AssertionConst.NEGEX_CORPUS
OpenNLP POS Reader Reads in part-of-speech training/test data in the OpenNLP format.
Text Files Reader Reads document texts from text files specified in a provided list.
XMI in Dir Reader (1) Reads document texts and annotations from XMI files in a directory.
XMI in Dir Reader (2) Reads document texts and annotations from XMI files in a directory.
XMI Reader (1) Reads document texts and annotations from XMI files specified in a provided list.
XMI Reader (2) Reads document texts and annotations from XMI files specified in a provided list.
XMI Reader (3) Reads document texts and annotations from XMI files specified in a provided list.
XMI Reader (4) Reads document texts and annotations from XMI files specified in a provided list.
XMI Tree Reader Reads document texts and annotations from XMI files in a directory tree.

Clone this wiki locally