1
|
- How to Execute and Supervise Workflows using Agent Workflows
|
2
|
- “Our end-to-end ETL process depends on embedded touch files that are
complicating our project workflows and making restarts difficult.”
- “It would have been nice to know before I left on Friday afternoon that
the session was running slow due to database contention.”
- “We have hundreds of workflows running out of 30 project folders across
three different repositories...I have no idea if everything was properly
restarted after the servers were bounced.”
- “We have workflows running at specific times during the day, when
certain files appear, and when other processes finish. How is it possible to build an
execution solution to tie all these workflows together?”
- “How can we delete touch files when our workflows are not scheduled to
start looking for touch files?”
- “We are using a third-party scheduling tool because Informatica can’t do
everything we need.”
|
3
|
- Execution Broker – Example 1
- Execution Broker – Example 2
- Workflow Monitor
- Questions
|
4
|
|
5
|
- An Agent Workflow provides execution
control, monitoring, and alert notification services for project
workflows.
- Agent Workflows typically do not perform data manipulation or
transformation duties.
|
6
|
- A type of Agent Workflow.
- Controls the execution order and dependencies of many other workflows.
- Provides the ability to determine the status of the entire process based
on the Broker’s status.
- Can suspend the entire process by stopping itself.
|
7
|
- Your scheduling requirements are not simple, they include atypical
scheduling operations, and are subject to “if then” scenarios.
- You want to limit (or eliminate) the work involved in maintaining
several custom Schedulers.
- You want to ensure that DTM delays do not result in hard-scheduled
workflows stepping on each other.
- Your execution plan includes many workflows running in several folders
that all need to play nicely together or bad things happen.
|
8
|
- Project: 4 workflows in 2 folders
- Desired Execution:
- Run once per day around 11 AM.
- Two workflows (A à B)
depend on a go file from Oracle CDC.
- Two workflows (C à D)
depend on two go files; the same Oracle CDC file plus another go file
from an ancillary Project.
- Do not delay workflows A & B if C & D are delayed by the
ancillary Project.
- Consume the Oracle CDC go file even if it is not time to run to ensure
a fresh go file is placed when it is time to run.
- Execution frequency could change to multiple times per day. Must scale quickly and easily and not
affect other workflows using the same Scheduler.
|
9
|
- Limit the number of execution controls inside the project workflows.
- Execute the workflow stream once per day, but make it easily scalable to
n times per day when required without creating a new scheduler or
modifying an existing one.
- When the workflow is not running, stale go / touch files from Oracle CDC
must be consumed every hour.
|
10
|
- When it is time to run, the workflow must wait for a fresh go file from
Oracle CDC.
- Do not delay the execution of all workflows just because one workflow is
delayed by a delinquent ancillary project system go file (wait for
nothing).
- The execution solution must have one code base that is deployable to any
project folder in any repository without modification (meaning no
hard-coding of file paths or database connections).
|
11
|
- No scripting outside of Informatica, no stand-alone shells, no stored
procedures, no Windows scheduled tasks, no cron, no complexity…no
problem!
|
12
|
- Decision Task – determines if it is time to do something.
- Dummy Sessions – accomplish the dirty work.
- Event Wait – just because it is time to do something doesn’t mean the
time is right.
- Timers – let’s not get carried away.
|
13
|
- True or False: Is it between 11
and 11:15:59 AM?
- Easily scales to multiple execution times with additional code.
|
14
|
- Navigate the upper thread when the decision resolves to false.
|
15
|
- Consume the stale go file.
- Clean up your mess.
- Notice $BadFileDir1.
|
16
|
- Use an Informatica parameter file to facilitate project folder
transparency.
- All spawned Execution Brokers and Project Workflows (per server)
reference the same parameter file residing in a single location.
|
17
|
- Wait 14 minutes from when this all began before restarting.
|
18
|
- If the allowed execution window is 16 minutes, why wait 14 minutes to
restart?
- Insurance and Workflow Log Overload.
- We want to guarantee the Decision Task resolves to TRUE once between 11
and 11:15:59 AM no matter when it checks the time. Secondly, let’s run the Broker 4
times per hour instead of 40.
|
19
|
- Navigate the lower thread when the decision task resolves to true.
|
20
|
- It’s time, but not really. Wait
for a fresh Oracle CDC go file.
- Note the use of $$GoFile – a Workflow Variable whose value is maintained
in the Informatica parameter file.
|
21
|
- Place the go files that open the flood gates.
- Clean up your mess.
- Notice $BadFileDir1.
|
22
|
- Wait 20 minutes from when this all began before restarting.
|
23
|
- If the allowed execution window is 16 minutes, why wait 20 minutes to
restart?
- Execution Frequency Requirement.
- We want to guarantee the Decision Task resolves to TRUE only once
between 11 and 11:15:59 AM no matter how quickly the brokered Workflows
run to completion.
|
24
|
- Current time is 7:53 AM, so Broker is navigating the FALSE thread.
- The two primary broker-dependent workflows (A & C) wait patiently to
start.
|
25
|
- Current time is 12:49 PM on December 18 (FALSE thread).
- The bottom workflow was restarted at 11:28 AM, so it was successfully
triggered by the Broker sometime after 11 AM on the same day.
- The middle workflow was last restarted at 8:27 AM on the day before, so
it is still waiting for an ancillary go file, yet the bottom workflow is
unaffected by this delay.
|
26
|
- Project: 10 workflows in 3 folders (A, B, and C)
- Desired Execution:
- Folder A workflows depend on a master go file from Oracle CDC. Run these workflows every time a master CDC go
file is presented (usually hourly).
- Remove the ancillary Project’s go file if it was not consumed. Place the ancillary Project’s go file
after Folder A workflows finish.
- Folder B and C workflows depend on Folder A finishing and appearance of
a secondary CDC go file.
- If the secondary CDC go file does not appear, capture and sort records
that were deleted by Folder A’s execution into date-time stamped text
files for later processing by folders B and C.
- If the secondary CDC go file does appear, consume it, then create list
files for each group of date-time stamped text files that are read by
workflows in folders B and C.
- After folders B and C finish, delete the list files, remove any empty
text files, then archive and compress the remaining text files into a
single date-time stamped .tar file.
- Do not restart Folder A workflows until Folder B and C workflows finish
and the archival and compression process is successful, even if CDC
presents a master go file before this process completes.
|
27
|
- Execute the first workflow stream every time a master go file is
presented from the CDC (usually every hour).
- Execute the second and third workflow streams after the first stream
finishes, but only if a secondary go file (different name) is placed by
the CDC.
- If the secondary go file is not presented, accumulate data captured from
the first workflow stream into text files for eventual use by the two
secondary workflow streams.
|
28
|
- Do not trigger the initial workflow stream for re-execution until the
two secondary workflow streams have completed or are bypassed, even if a
master go file is presented again by the CDC.
- Archive and compress accumulated data files generated by the initial
workflow stream after the data has been read and applied by the two
secondary workflow streams.
- No scripting outside of Informatica, no stand-alone shells, no stored
procedures, no Windows scheduled tasks, no cron, no complexity.
|
29
|
- Event Wait – wait for the primary go file.
- Dummy Session – trigger the first workflow stream.
- Event Wait – wait for the first workflow stream to complete.
- Dummy Session – check for the secondary go file.
- Event Wait – prohibit restart until the entire process is complete
(whether that be the first stream only or all three streams).
|
30
|
- If the secondary go file is present, trigger the secondary workflow
streams.
- If the secondary go file is not present, alert the Broker that
everything is complete (bypass the archival and compression of the
captured data).
|
31
|
- Use post-session success commands to date-time stamp generated text
files.
- Not a Broker task since this operation is necessary whenever text files
are created (so keep this work in the project workflow).
|
32
|
- Use post-session success commands to generate list files after secondary
go files are presented.
- Avoid “know it all / do it all” Brokers.
List files should be created every time the secondary workflows
are triggered, so keep this work in the secondary project workflows.
|
33
|
- Use dummy sessions in the Broker workflow to perform file management
operations that do not belong in the Project workflows.
- END BROKER EXAMPLE 2
|
34
|
|
35
|
- A type of Agent Workflow and a delivered INFA module.
- Monitors any number of workflows residing in multiple folders across any
number of repositories.
- Activates or inactivates workflows to monitor from a single control
table.
- Allows virtually any execution rule to be applied.
- If the rule can be coded, it can be enforced.
- Provides custom alert messages with specific limits on alert message
frequency.
|
36
|
- Failure notifications are not enough.
You need to know when workflows are delayed, unscheduled,
suspended, or not restarted after failures and manual stoppages.
- You have a large number or workflows in several folders across multiple
repositories to manage. You do not have time to navigate the Informatica
Workflow Monitor looking for execution problems / delays.
- You prefer a proactive approach to Workflow management. You want to intercept and correct
potential problems before it’s too late.
- It’s too late when your customers call you about the problem.
- You want error messages that are more detailed, more usable, and
customized to your preferences to enable more efficient problem
investigation and correction.
- You want to hire a Junior Informatica Support Analyst but your IT budget
will not allow it.
|
37
|
- Unlike the Broker (unique for each project), the Monitor should be
project independent.
- All run status codes should be supported, including stopped, failed,
aborted, unknown, suspended, unscheduled, and terminated.
- Alert text should include server, folder, workflow, workflow start time,
workflow execution status, and rule violated.
|
38
|
- Stop sending alerts after the problem is corrected or a user-defined
amount of time has passed.
- Multiple execution periods must be supported (hourly, n per day, daily,
once weekly, etc.)
- Avoid sending outdated alerts by considering only the latest execution
instance of each monitored workflow.
- Easy administration. Must have
the ability to initiate or suspend monitoring on specific workflows
without stopping the Monitor or making code changes in Designer.
|
39
|
|
40
|
- Read two Informatica views to gather workflow execution statistics:
- REP_WORKFLOWS
- REP_WFLOW_RUN
|
41
|
- Join views together at the server, folder, workflow level to ensure
uniqueness across environments.
- Correlate a sub-query to limit the amount of execution history returned
(the SYSDATE part) and ensure only the latest execution is returned (the
MAX WORKFLOW_RUN_ID part).
|
42
|
- Merge execution records from the first two repositories into horizontal
records using the Joiner transformation (full outer join).
|
43
|
- Bring in another repository and join those records to the first two
repositories with another Joiner transformation (full-outer join).
|
44
|
- Use a Normalizer transformation to convert the horizontal records into
vertical records at the server, folder, workflow level.
|
45
|
- Normalizer transformation configuration.
|
46
|
- Lookup against the Monitor Control Table to capture the frequency rule
and the alert team assigned to each Workflow.
- Filter workflows without a defined frequency rule or alert team.
|
47
|
- A single, centralized control table drives all monitoring operations.
- Define the workflows to monitor based on the server and Informatica
repository folder they reside in.
- Easily modify the frequency rule, active flag, or alert team without
changing ETL code.
|
48
|
- Armed with frequency rules, determine which workflows are in violation.
|
49
|
- v_ALERT_REQUIRED determines if the workflow is in violation of its
execution requirement.
- o_ALERT_MESSAGE generates the alert message if warranted.
|
50
|
- v_ALERT_REQUIRED determines if the workflow is in violation of its
execution requirement while still avoiding stale alerts (false
positives).
|
51
|
- o_ALERT_MESSAGE generates the alert message if warranted.
- If it is in the repository, it can be in your alert message.
|
52
|
- Capture all generated alerts and route them to subject-area specific
files for email / PDA / pager distribution.
|
53
|
- Step 1: Capture all alerts into individual subject-area files.
- Step 2: Determine which subject areas have alerts.
- Step 3: Distribute the alert messages.
- Why not use email tasks to send alert messages?
|
54
|
- Repository connections are defined for each Source Qualifier.
- Control table resides in a single database.
|
55
|
- Not all subject areas will have alerts.
- Target-level success rows check not available.
|
56
|
- Determine which subject areas have alerts.
- Distribute alert messages to subject-area support personnel. Why not use an email task?
|
57
|
- Same map is run twice; once to determine if there are alerts, then again
to distribute the alert messages.
- Why not run the map once (the determine step) then use an email task to
distribute alert messages to subject-area support personnel?
|
58
|
|
59
|
|
60
|
- Email tasks do not support file attachments
|
61
|
- Capture Session is run once.
- Determine and Distribute Sessions are run twice (when alerts exist) for
each subject area being monitored.
|
62
|
- If alerts are present in the determination step, the distribution step
sends the alerts.
|
63
|
- At 7:45 and 8:45 AM, no alerts were found. Only the capture step runs.
- At 9:45 AM, some alerts were captured.
All determination steps run.
- Just one distribution step is executed since all alerts belong to one
subject area.
|
64
|
- Statistics on the capture session confirm that only the Benefits subject
area generated an alert.
|
65
|
- Alert message includes the server, folder, and workflow that violated an
execution rule. Supporting
information includes the workflow start time, its current status, and
the execution rule that was applied.
|
66
|
- The top workflow caused the alert message to be generated and
distributed.
- The Informatica Workflow Monitor confirms the delay.
|
67
|
- The Monitor cannot monitor itself.
If the Monitor fails, configure failure email notifications in
each session. Do not check “fail
parent if this task fails” so the Monitor will still reschedule itself
after failure.
- There are many ways to perform the alert determination and distribution
steps. For example, you could add
a single command task after the determination step to check for
populated alert files and then initiate a UNIX email command to send the
alerts. The solution demonstrated
here uses Informatica in its purest form without the need to maintain
UNIX scripting.
|
68
|
|