PatExtractor is an advanced CMSSW EDAnalyzer which transforms PAT tuples (usually produced with PF2PAT) to plain root trees, using modules called extractors. Each extractor is idenpendant, and only extracts informations about one type of object (muons, electrons, tracks, …).
On top of that, there are analysis. An analysis is a simple module that is ran after all extractors, whom purpose is left to the end user. Usually, it’s for performing one step of the analysis (typically the second step). An analysis has access to all the extractors data, and can produce, for exemple, a tree.
PatExtractor uses an advanced plugin system for managing analysis. You don’t have to modify the source of PatExtractor in order to add your own analysis. Just register your new plugin in the PatExtractorFactory and you’re done.
There are two differents operating mode available in PatExtractor :
-
The first mode is the default one, and called
extractors + analysismode. As its name indicate, in this mode theextractorsand theanalysisare ran, one by one. Input files are expected to be PAT tuples, andanalysishave access to the whole CMSSW framework. -
The second mode is called
analysismode. In this mode, noextractorsare ran, because the input files are expected to beextractedfiles. Only theanalysesare ran, and have access only to the data previouslyextractedby theextractors.
Supposed you want to run the same analysis twice on the same dataset. Here’s the best way to do :
-
First, run
PatExtractorinextractors + analysismode. In input, specify the PAT dataset as you would do in CMSSW python configuration (with thePoolSourcemodule). This will produce anextractedoutput file. -
Next, run
PatExtractorinanalysismode only. In input, specify the previouslyextractedroot file (using theinputRootFilepython attribute, and not thePoolSourcemodule). Don’t forget to switch the flagfillTreetofalse!. This way, noextractorswill be ran, with a noticable gain of time.
|
Caution
|
When using process.maxEvents = cms.untracked.PSet(
input = cms.untracked.int32(1)
)
process.source = cms.Source("EmptySource") |
There are currently 10 extractors available :
-
EventExtractor: extracts informations related to the event, like the event id, run number, lumi section, the number of true interactions, … -
ElectronExtractor,MuonExtractor,PhotonExtractor: extract informations aboutelectrons,muonsandphotons. -
JetMETExtractor: extracts informations about jets and MET. -
MCExtractor: extracts informations about the generated events. -
HLTExtractorextracts informations about HLT -
PFpartExtractor: extracts informations about PF particles -
TrackExtractor,VertexExtractor: extracts informations about tracks and vertices.
Below are more informations about specific extractors. If an extractor is not listed, there’s nothing special about its behaviour.
-
Output trees:
-
electron_PF,muon_PF -
electron_loose_PF,muon_loose_PF
-
-
extractornames:-
electrons,muons -
electrons_loose,muons_loose
-
These extractors are ran twice, once on
|
Caution
|
Beware: there wil be |
-
Output trees:
-
jet_PF,MET_PF
-
-
extractorname:-
JetMET
-
This extractor must be configured in the CMSSW python configuration file. It expects to read a cms.PSet named jet_PF for jets extracting configuration, and another cms.PSet named met_PF for MET extraction. Possible options are listed below.
-
Jets extraction:
-
input (cms.InputTag): the input tag of the jet collection to extract -
redoJetCorrection (cms.untracked.bool, false): Should thisextractorredo the jet energy corrections. Iftrue, a valid global tag must be set. -
jetCorrectorLabel (cms.string): the corrector label to use ifredoJetCorrectionistrue. Use something likeak5PFchsL1FastL2L3Residualfor data andak5PFchsL1FastL2L3for MC. -
doJER (cms.untracked.bool, true): iftrue, the jet resolution is smeared. Automatically set tofalsewhen running on data. -
jerSign (cms.untracked.int32, 0): for JER systematic evaluation. Set to 1 for 1-sigma up variation, or set to -1 for 1-sigma down variation. -
jesSign (cms.untracked.int32, 0): for JES systematic evaluation. Set to 1 for 1-sigma up variation, or set to -1 for 1-sigma down variation.
-
-
MET extraction:
-
input (cms.InputTag): the input tag of the MET collection to extract -
redoMetPhiCorrection (cms.untracked.bool, false): iftrue, perform the MET phi correction. Useful if the jet energy corrections are redone and you still want the MET phi correction. -
redoMetTypeICorrection (cms.untracked.bool, false): iftrue, recompute Type-I correction (JEC propagation to MET). AutomaticallytrueifredoJetCorrectionistrue.
-
-
Output tree:
-
MC
-
-
extractorname:-
MC
-
This module extracts generator particles informations with status 3 only, and is only compatible with MADGRAPH samples. It’s useful if you want to perform a matching between jets and partons.
-
Output tree:
-
HLT
-
-
extractorname:-
HLT
-
This module extracts HLT informations from the event, and store only triggers which fired. Furthermore, it also provides a way to flag events which pass a pre-selected trigger (this allow the user to select only events passing a dedicated trigger).
-
triggersXML (cms.untracked.string, ""): Astringcontaining the content of aXMLdocument describing the triggers to flag
The XML document must follow the following structure (it’s a real document used for a \(t\bar{t}\) analysis) :
<?xml version="1.0" encoding="UTF-8"?>
<triggers>
<runs from="0" to="193621">
<path>
<name>HLT_IsoMu17_eta2p1_TriCentralPFJet30_v.*</name>
</path>
</runs>
<runs from="193834" to="194225">
<path>
<name>HLT_IsoMu17_eta2p1_TriCentralPFNoPUJet30_v.*</name>
</path>
</runs>
<runs from="194270" to="199608">
<path>
<name>HLT_IsoMu17_eta2p1_TriCentralPFNoPUJet30_30_20_v.*</name>
</path>
</runs>
<runs from="199698" to="500000">
<path>
<name>HLT_IsoMu17_eta2p1_TriCentralPFNoPUJet45_35_25_v.*</name>
</path>
</runs>
</triggers>Run ranges are inclusive (ie, \(r \leq min~or~r \geq max\)). Path name must be a valid regex.
|
Note
|
No event will be thrown if trigger are not matched. Only a flag will be set. |
The default python configuration of PatExtractor can be found in the file python/PAT_extractor_cfi.py. Below is a description of all options :
-
extractedRootFile (cms.string): the output file produced byPatExtractor, where all the extracted trees and analysis objects are stored. -
fillTree (cms.untracked.bool, true): Allow to set the mode ofPatExtractor. Iftrue, mode "extractors + analysis" is set, otherwise, mode "analysis" is set. See <> for more details. -
inputRootFile (cms.string): when running inanalysismode, indicates the input file to use. -
isMC (cms.untracked.bool, true): Indicates whether or not input file is MC. -
doHLT (cms.untracked.bool, false): Iftrue, runHLTExtractor -
doMC (cms.untracked.bool, false): Iftrue, runMCExtractor -
doPhoton (cms.untracked.bool, false): Iftrue, runPhotonExtractor -
photon_tag (cms.InputTag, selectedPatPhotons): The input tag of the photons collection -
doElectron (cms.untracked.bool, false): Iftrue, runElectronExtractor -
electron_tag (cms.InputTag, selectedPatElectronsPFlow): The input tag of the electrons collection -
doMuon (cms.untracked.bool, false): Iftrue, runMuonExtractor -
muon_tag (cms.InputTag, selectedPatMuonsPFlow): The input tag of the muons collection -
doJet (cms.untracked.bool, false): Iftrue, run the jet part ofJetMETExtractor -
jet_PF (cms.PSet): See here for more details -
doMET (cms.untracked.bool, false): Iftrue, run the MET part ofJetMETExtractor -
MET_PF (cms.PSet): See here for more details -
doVertex (cms.untracked.bool, false): Iftrue, runVertexExtractor -
vtx_tag (cms.InputTag, offlinePrimaryVertices): The input tag of the vertices collection -
doTrack (cms.untracked.bool, false): Iftrue, runTrackExtractor -
trk_tag (cms.InputTag, generalTracks): The input tag of the tracks collection -
doPF (cms.untracked.bool, false): Iftrue, runPFpartExtractor -
pf_tag (cms.InputTag, particleFlow): The input tag of the PF particles collection -
n_events (cms.untracked.int32, 10000): If operates inanalysismode, the number of events to process. -
plugins (cms.PSet): The list of plugins (analysis) to run. The expected format ispluginname = cms.PSet($parameters$).
|
Warning
|
Do not create your analysis in PatExtractor folders! Create your own CMSSW package for that. For example, create your own github repository, and store your analysis here. See https://github.com/IPNL-CMS/MttExtractorAnalysis for real-life example. |
Adding your own analysis in PatExtractors is easy. Here’s a list of steps to follow:
-
Each new
analysis(or plugin) must be a class inheriting frompatextractor::Plugin(you can find declaration ininterface/ExtractorPlugin.h). -
patextractor::Pluginhas one pure virtual function that you must override in your class:virtual void analyze(const edm::Event&, const edm::EventSetup&, PatExtractor&). It’s the function that will be called for each events. -
You now need to register your plugin in the
PatExtractorPluginFactory, using theDEFINE_EDM_PLUGIN($factory$, $class$, $name$)macro. -
Finally, you need to add your plugin to the python configuration.
Let’s see an example :
MyAnalysis.h
#include <Extractors/PatExtractor/interface/ExtractorPlugin.h>
class MyAnalysis: patextractor::Plugin {
public:
MyAnalysis(const edm::ParameterSet& iConfig);
virtual void analyze(const edm::EventSetup& iSetup, PatExtractor& extractor);
};MyAnalysis.cpp
#include "MyAnalysis.h"
MyAnalysis::MyAnalysis(const edm::ParameterSet& iConfig): Plugin(iConfig)
{
// Initialize the analysis parameters using the ParameterSet iConfig
int an_option = iConfig.getUntrackedParameter<int>("an_option", 0);
}
MyAnalysis::analysis(const edm::EventSetup& iSetup, PatExtractor& extractor)
{
// Do the analysis
}
// Register the plugin inside the factory
DEFINE_EDM_PLUGIN(PatExtractorPluginFactory, MyAnalysis, "MyAnalysis");In the example above, we created a new analysis called PatExtractorPluginFactory. We now just need to add into the python configuration file that we want to use this analysis.
import FWCore.ParameterSet.Config as cms
# Create process
process = cms.Process("PATextractor")
# Load various configurations
process.load('Configuration/StandardSequences/Services_cff')
process.load('Configuration/StandardSequences/GeometryIdeal_cff')
process.load('Configuration/StandardSequences/MagneticField_38T_cff')
process.load('Configuration/StandardSequences/EndOfProcess_cff')
process.load('Configuration/StandardSequences/FrontierConditions_GlobalTag_cff')
process.load("FWCore.MessageLogger.MessageLogger_cfi")
process.load("Extractors.PatExtractor.PAT_extractor_cff")
# Set the number of events we want to process
process.maxEvents = cms.untracked.PSet(
input = cms.untracked.int32(10)
)
# Input PAT file to extract
process.source = cms.Source("PoolSource",
fileNames = cms.untracked.vstring("myfilename.root"),
duplicateCheckMode = cms.untracked.string( 'noDuplicateCheck' )
)
# Run on MC
process.PATextraction.isMC = True
process.PATextraction.doMC = True
# Set the output file name
process.PATextraction.extractedRootFile = cms.string('extracted_mc.root')
# Turn on some extractors
process.PATextraction.doMuon = True
process.PATextraction.doElectron = True
process.PATextraction.doJet = True
# And finally, loads our analysis
process.PATextraction.plugins = cms.PSet( # (1)
MyAnalysis = cms.PSet(
an_option = cms.untracked.int32(42)
)
)-
this tells
PatExtractorto load a plugin named MyAnalysis (case sensitive!). The associatedcms.PSet()will be given to argement to the class constructor. It contains only one option,an_option, an integer with value 42.
In order to access extractors inside your analysis, you have to use the extractor reference passed inside the analyze function, and more precisely the method
std::shared_ptr<SuperBaseExtractor> PatExtractor::getExtractor(const std::string& name);This method takes at first argument the name of the extractor you want to access (see section extractors for the list of all extractors name), and return a pointer to the extractor.
For a list of methods of each extractor, please refer to the class declaration inside the header file (in interface/)