Notes
Slide Show
Outline
1
Broker, Monitor, and Alert
  • How to Execute and Supervise Workflows using Agent Workflows
2
Sound Familiar?
  • “Our end-to-end ETL process depends on embedded touch files that are complicating our project workflows and making restarts difficult.”
  • “It would have been nice to know before I left on Friday afternoon that the session was running slow due to database contention.”
  • “We have hundreds of workflows running out of 30 project folders across three different repositories...I have no idea if everything was properly restarted after the servers were bounced.”
  • “We have workflows running at specific times during the day, when certain files appear, and when other processes finish.  How is it possible to build an execution solution to tie all these workflows together?”
  • “How can we delete touch files when our workflows are not scheduled to start looking for touch files?”
  • “We are using a third-party scheduling tool because Informatica can’t do everything we need.”


3
Agenda
  • Execution Broker – Example 1
  • Execution Broker – Example 2
  • Workflow Monitor
  • Questions
4
Agent Workflow:
The Execution Broker
5
What is an Agent Workflow?
  •    An Agent Workflow provides execution control, monitoring, and alert notification services for project workflows.


  • Agent Workflows typically do not perform data manipulation or transformation duties.


6
What is an Execution Broker?
  • A type of Agent Workflow.


  • Controls the execution order and dependencies of many other workflows.


  • Provides the ability to determine the status of the entire process based on the Broker’s status.


  • Can suspend the entire process by stopping itself.
7
Why Build an Execution Broker?
  • Your scheduling requirements are not simple, they include atypical scheduling operations, and are subject to “if then” scenarios.


  • You want to limit (or eliminate) the work involved in maintaining several custom Schedulers.


  • You want to ensure that DTM delays do not result in hard-scheduled workflows stepping on each other.


  • Your execution plan includes many workflows running in several folders that all need to play nicely together or bad things happen.
8
The Scheduling Challenge
Example 1
  • Project: 4 workflows in 2 folders
  • Desired Execution:
    • Run once per day around 11 AM.
    • Two workflows (A à B) depend on a go file from Oracle CDC.
    • Two workflows (C à D) depend on two go files; the same Oracle CDC file plus another go file from an ancillary Project.
    • Do not delay workflows A & B if C & D are delayed by the ancillary Project.
    • Consume the Oracle CDC go file even if it is not time to run to ensure a fresh go file is placed when it is time to run.
    • Execution frequency could change to multiple times per day.  Must scale quickly and easily and not affect other workflows using the same Scheduler.
9
The Scheduling Requirements
Example 1 – Page 1
  • Limit the number of execution controls inside the project workflows.


  • Execute the workflow stream once per day, but make it easily scalable to n times per day when required without creating a new scheduler or modifying an existing one.


  • When the workflow is not running, stale go / touch files from Oracle CDC must be consumed every hour.
10
The Scheduling Requirements
Example 1 – Page 2
  • When it is time to run, the workflow must wait for a fresh go file from Oracle CDC.


  • Do not delay the execution of all workflows just because one workflow is delayed by a delinquent ancillary project system go file (wait for nothing).


  • The execution solution must have one code base that is deployable to any project folder in any repository without modification (meaning no hard-coding of file paths or database connections).
11
The Scheduling Requirements
Example 1 – Page 3
  • No scripting outside of Informatica, no stand-alone shells, no stored procedures, no Windows scheduled tasks, no cron, no complexity…no problem!
12
Execution Broker Solution
  • Decision Task – determines if it is time to do something.
  • Dummy Sessions – accomplish the dirty work.
  • Event Wait – just because it is time to do something doesn’t mean the time is right.
  • Timers – let’s not get carried away.
13
The Decision Task
  • True or False:  Is it between 11 and 11:15:59 AM?
  • Easily scales to multiple execution times with additional code.
14
Decision: FALSE
  • Navigate the upper thread when the decision resolves to false.
15
FALSE: Perform the Dirty Work
  • Consume the stale go file.
  • Clean up your mess.
  • Notice $BadFileDir1.
16
Project Folder Transparency
  • Use an Informatica parameter file to facilitate project folder transparency.
    • All spawned Execution Brokers and Project Workflows (per server) reference the same parameter file residing in a single location.
17
FALSE: Take a Short Break
  • Wait 14 minutes from when this all began before restarting.
18
QUESTION
  • If the allowed execution window is 16 minutes, why wait 14 minutes to restart?



  • Insurance and Workflow Log Overload.
    • We want to guarantee the Decision Task resolves to TRUE once between 11 and 11:15:59 AM no matter when it checks the time.  Secondly, let’s run the Broker 4 times per hour instead of 40.
19
Decision: TRUE
  • Navigate the lower thread when the decision task resolves to true.
20
TRUE:  Hurry up and Wait!
  • It’s time, but not really.  Wait for a fresh Oracle CDC go file.
  • Note the use of $$GoFile – a Workflow Variable whose value is maintained in the Informatica parameter file.
21
TRUE: Perform the Dirty Work
  • Place the go files that open the flood gates.
  • Clean up your mess.
  • Notice $BadFileDir1.
22
TRUE: Take a Longer Break
  • Wait 20 minutes from when this all began before restarting.
23
QUESTION
  • If the allowed execution window is 16 minutes, why wait 20 minutes to restart?


  • Execution Frequency Requirement.
    • We want to guarantee the Decision Task resolves to TRUE only once between 11 and 11:15:59 AM no matter how quickly the brokered Workflows run to completion.
24
Broker in Action
  • Current time is 7:53 AM, so Broker is navigating the FALSE thread.


  • The two primary broker-dependent workflows (A & C) wait patiently to start.
25
The Broker waits for nothing!
  • Current time is 12:49 PM on December 18 (FALSE thread).
  • The bottom workflow was restarted at 11:28 AM, so it was successfully triggered by the Broker sometime after 11 AM on the same day.
  • The middle workflow was last restarted at 8:27 AM on the day before, so it is still waiting for an ancillary go file, yet the bottom workflow is unaffected by this delay.
26
The Scheduling Challenge
Example 2
  • Project: 10 workflows in 3 folders (A, B, and C)
  • Desired Execution:
    • Folder A workflows depend on a master go file from Oracle CDC.  Run these  workflows every time a master CDC go file is presented (usually hourly).
    • Remove the ancillary Project’s go file if it was not consumed.  Place the ancillary Project’s go file after Folder A workflows finish.
    • Folder B and C workflows depend on Folder A finishing and appearance of a secondary CDC go file.
    • If the secondary CDC go file does not appear, capture and sort records that were deleted by Folder A’s execution into date-time stamped text files for later processing by folders B and C.
    • If the secondary CDC go file does appear, consume it, then create list files for each group of date-time stamped text files that are read by workflows in folders B and C.
    • After folders B and C finish, delete the list files, remove any empty text files, then archive and compress the remaining text files into a single date-time stamped .tar file.
    • Do not restart Folder A workflows until Folder B and C workflows finish and the archival and compression process is successful, even if CDC presents a master go file before this process completes.
27
The Scheduling Requirements
Example 2 – Page 1
  • Execute the first workflow stream every time a master go file is presented from the CDC (usually every hour).


  • Execute the second and third workflow streams after the first stream finishes, but only if a secondary go file (different name) is placed by the CDC.


  • If the secondary go file is not presented, accumulate data captured from the first workflow stream into text files for eventual use by the two secondary workflow streams.
28
The Scheduling Requirements
Example 2 – Page 2
  • Do not trigger the initial workflow stream for re-execution until the two secondary workflow streams have completed or are bypassed, even if a master go file is presented again by the CDC.


  • Archive and compress accumulated data files generated by the initial workflow stream after the data has been read and applied by the two secondary workflow streams.


  • No scripting outside of Informatica, no stand-alone shells, no stored procedures, no Windows scheduled tasks, no cron, no complexity.
29
Solution: Execution Broker
  • Event Wait – wait for the primary go file.
  • Dummy Session – trigger the first workflow stream.
  • Event Wait – wait for the first workflow stream to complete.
  • Dummy Session – check for the secondary go file.
  • Event Wait – prohibit restart until the entire process is complete (whether that be the first stream only or all three streams).
30
Fork in the Road
  • If the secondary go file is present, trigger the secondary workflow streams.
  • If the secondary go file is not present, alert the Broker that everything is complete (bypass the archival and compression of the captured data).
31
Non-Broker Tasks
  • Use post-session success commands to date-time stamp generated text files.


  • Not a Broker task since this operation is necessary whenever text files are created (so keep this work in the project workflow).
32
Generate List Files
  • Use post-session success commands to generate list files after secondary go files are presented.
  • Avoid “know it all / do it all” Brokers.  List files should be created every time the secondary workflows are triggered, so keep this work in the secondary project workflows.
33
Manage Generated Files
  • Use dummy sessions in the Broker workflow to perform file management operations that do not belong in the Project workflows.


  • END BROKER EXAMPLE 2
34
Agent Workflow:
The Workflow Monitor
35
What is a Workflow Monitor?
  • A type of Agent Workflow and a delivered INFA module.


  • Monitors any number of workflows residing in multiple folders across any number of repositories.


  • Activates or inactivates workflows to monitor from a single control table.


  • Allows virtually any execution rule to be applied.
    • If the rule can be coded, it can be enforced.

  • Provides custom alert messages with specific limits on alert message frequency.
36
Why Build a Workflow Monitor?
  • Failure notifications are not enough.  You need to know when workflows are delayed, unscheduled, suspended, or not restarted after failures and manual stoppages.


  • You have a large number or workflows in several folders across multiple repositories to manage. You do not have time to navigate the Informatica Workflow Monitor looking for execution problems / delays.


  • You prefer a proactive approach to Workflow management.  You want to intercept and correct potential problems before it’s too late.
    • It’s too late when your customers call you about the problem.

  • You want error messages that are more detailed, more usable, and customized to your preferences to enable more efficient problem investigation and correction.


  • You want to hire a Junior Informatica Support Analyst but your IT budget will not allow it.
37
Monitoring Requirements
Page 1
  • Unlike the Broker (unique for each project), the Monitor should be project independent.


  • All run status codes should be supported, including stopped, failed, aborted, unknown, suspended, unscheduled, and terminated.


  • Alert text should include server, folder, workflow, workflow start time, workflow execution status, and rule violated.
38
Monitoring Requirements
Page 2
  • Stop sending alerts after the problem is corrected or a user-defined amount of time has passed.


  • Multiple execution periods must be supported (hourly, n per day, daily, once weekly, etc.)


  • Avoid sending outdated alerts by considering only the latest execution instance of each monitored workflow.


  • Easy administration.  Must have the ability to initiate or suspend monitoring on specific workflows without stopping the Monitor or making code changes in Designer.
39
The Workflow Monitor
40
Repository Views
  • Read two Informatica views to gather workflow execution statistics:
    • REP_WORKFLOWS
    • REP_WFLOW_RUN
41
Repository View SQL
  • Join views together at the server, folder, workflow level to ensure uniqueness across environments.


  • Correlate a sub-query to limit the amount of execution history returned (the SYSDATE part) and ensure only the latest execution is returned (the MAX WORKFLOW_RUN_ID part).
42
Merge Execution Records
  • Merge execution records from the first two repositories into horizontal records using the Joiner transformation (full outer join).
43
Merge More Execution Records
  • Bring in another repository and join those records to the first two repositories with another Joiner transformation (full-outer join).
44
Normalize Execution Records
  • Use a Normalizer transformation to convert the horizontal records into vertical records at the server, folder, workflow level.
45
Normalize Execution Records
  • Normalizer transformation configuration.
46
Lookup and Filter
  • Lookup against the Monitor Control Table to capture the frequency rule and the alert team assigned to each Workflow.


  • Filter workflows without a defined frequency rule or alert team.
47
Monitor Control Table
  • A single, centralized control table drives all monitoring operations.


  • Define the workflows to monitor based on the server and Informatica repository folder they reside in.


  • Easily modify the frequency rule, active flag, or alert team without changing ETL code.
48
Apply Execution Rules
  • Armed with frequency rules, determine which workflows are in violation.
49
Determine and Generate Alert
  • v_ALERT_REQUIRED determines if the workflow is in violation of its execution requirement.
  • o_ALERT_MESSAGE generates the alert message if warranted.
50
Alert Required?
  • v_ALERT_REQUIRED determines if the workflow is in violation of its execution requirement while still avoiding stale alerts (false positives).
51
Generate Alert Message
  • o_ALERT_MESSAGE generates the alert message if warranted.
  • If it is in the repository, it can be in your alert message.
52
Filter and Route Alert Messages
  • Capture all generated alerts and route them to subject-area specific files for email / PDA / pager distribution.
53
Workflow Monitor Sessions
  • Step 1: Capture all alerts into individual subject-area files.
  • Step 2: Determine which subject areas have alerts.
  • Step 3: Distribute the alert messages.
  • Why not use email tasks to send alert messages?
54
“Capture” Session Configuration
  • Repository connections are defined for each Source Qualifier.
  • Control table resides in a single database.
55
Filter Then Determine
  • Not all subject areas will have alerts.
  • Target-level success rows check not available.
56
Determine Then Distribute
  • Determine which subject areas have alerts.
  • Distribute alert messages to subject-area support personnel.  Why not use an email task?
57
Determine and Distribute Map
  • Same map is run twice; once to determine if there are alerts, then again to distribute the alert messages.


  • Why not run the map once (the determine step) then use an email task to distribute alert messages to subject-area support personnel?
58
Use an Email Task?
59
Configure the Email Task
  • Why won’t this work?
60
Configure the Email Task
  • Email tasks do not support file attachments
61
Workflow Monitor
  • Capture Session is run once.
  • Determine and Distribute Sessions are run twice (when alerts exist) for each subject area being monitored.
62
Distribute via Post-Session
  • If alerts are present in the determination step, the distribution step sends the alerts.
63
Workflow Monitor in Action
  • At 7:45 and 8:45 AM, no alerts were found. Only the capture step runs.
  • At 9:45 AM, some alerts were captured.  All determination steps run.
  • Just one distribution step is executed since all alerts belong to one subject area.
64
Session Statistics
  • Statistics on the capture session confirm that only the Benefits subject area generated an alert.
65
Alert Message Distributed
  • Alert message includes the server, folder, and workflow that violated an execution rule.  Supporting information includes the workflow start time, its current status, and the execution rule that was applied.


66
Confirm the Alert is Warranted
  • The top workflow caused the alert message to be generated and distributed.
  • The Informatica Workflow Monitor confirms the delay.
67
Workflow Monitor Considerations
  • The Monitor cannot monitor itself.  If the Monitor fails, configure failure email notifications in each session.  Do not check “fail parent if this task fails” so the Monitor will still reschedule itself after failure.


  • There are many ways to perform the alert determination and distribution steps.  For example, you could add a single command task after the determination step to check for populated alert files and then initiate a UNIX email command to send the alerts.  The solution demonstrated here uses Informatica in its purest form without the need to maintain UNIX scripting.
68
Questions?