Showing posts with label fault handling. Show all posts
Showing posts with label fault handling. Show all posts

Friday, June 14, 2013

SOA Black Belt Workshop, Day 3: Architecture Internals

The topic of today was architecture internals.

Adapters: added value 

We were supposed to start with Fault handling, but the adapter session by Niall was preponed. He is a very good presenter, but unfortunately part of the slide deck was the usual marketing mumbo jumbo about how easy it is to integrate to any system. Since this day was "architecture essentials", I would have enjoyed a discussion on whether you want to use an adapter or build a 'proper' web service in Java more. The lab that accompanied this session was fun: we needed to fix the adapters that were configured beforehand. It was very much like what happens in real life.

Fault handling: expect the unexpected

The session on fault handling did not cover anything new as far as I am concerned. There is a lot of material out there that covers this topic. A lot of emphasis was placed on transaction management. This makes sense in the context of Fault handling, but it made the whole thing a bit repetitive. I would have preferred a shorter presentation about the different fault types and then a lab where we would actually build complicated fault handling scenario's (catchAll, fault policies, rolling back transactions). 

Security: we all have our role

Flavius explained the OPSS framework, virtual directory and how you can manage application roles in Enterprise Manager. This is particularly relevant in a human task service. A lot of people I know use groups in LDAP for that which makes the groups way to fine grained to be manageable. The other feature I was not aware of was using a light weight OVD by turning on 'virtual' in the WebLogic console.


Performance tuning essentials: a journey through the database 

Niall then traced all the database inserts and updates for a simple BPM process. It was interesting, but the title was misleading. The important part to remember is that every design decision you make in your composite as a developer will result in a write to the database. Performance is not always the key requirement, but it is something to take into account, of course.

Fusion apps: oer is the new irep

Niall showed us the Oracle Enterprise Repository for Fusion apps. It contains all the business objects, Web Services and other artefacts that you need to integrate with Fusion Apps. A lot of the integration is still based on an API model, rather than services. From a product vendor perspective it makes sense, but as an implementer you need to make sure that you don't expose all these fine grained services on your Enterprise Service Bus. 

Anti patterns are the new patterns

Ravi Sankaran did the last session over the phone. He presented a number of anti patterns, reasons why they occur and recommendations to follow to avoid them or only apply them if needed. The content was interesting but listening to somebody over the phone in a hot room (27 degrees and humid outside) after three days of training is not the best condition for knowledge transfer. I will have to take a look at the slides as soon as they are available.

The evening

Jürgen organized a boat tour and a really really nice dinner at a very interesting restaurant and handed everyone their blackbelt. It was a great way to talk to some new people and to see something of the city at the same time. 

Tomorrow is the last day, then it is back to reality again...

Thursday, May 9, 2013

Article published | Fault Handling and Prevention (II)


Oracle Technology Network (OTN) published the article Fault Handling and Prevention - Part 2 (Fault Handling and Prevention for Services in Oracle Service Bus) by Ronald van Luttikhuizen and Guido Schmutz.


Part 1 of this article series on Fault Handling and Prevention discussed what fault handling is and why it is important. It also addressed the specific challenges in handling faults in a service-oriented landscape as compared to traditional systems. Part 1 concluded by presenting a sample scenario, an Order process implemented in a BPM and SOA environment, discussed potential pitfalls, and described generic fault prevention and recovery patterns.

Part 2 concentrates on concrete fault handling and prevention measures in the integration layer that are realized through Oracle Service Bus (OSB). The integration layer covers typical elements and integration functionality, such as Adapters for connectivity to back-end systems, Routing, Transformation, and Filtering.


More resources



Stay tuned for parts III and IV of the article! 

About the authors


Ronald van Luttikhuizen is Managing Partner and Architect with Vennster and an Oracle ACE Director
Guido Schmutz is Technology Manager for SOA and Emerging Trends with Trivadis and an Oracle ACE Director

Wednesday, November 28, 2012

Article published | Fault Handling and Prevention (I)


Oracle Technology Network (OTN) published the article Fault Handling and Prevention - Part 1 (An Introduction to Fault Handling in a Service-Oriented Environment) by Ronald van Luttikhuizen and Guido Schmutz.

It is one thing to design and code the "happy flow" of your automated business processes and services. It is another thing to deal with unwanted, unexpected situations that might occur in your processes and services. The article, the first in a four-part series, will dive into fault handling and prevention in an environment based on Service-Oriented Architecture (SOA) and Business Process Management (BPM) principles. You will learn about the different types of faults that can occur and how fault handling in an SOA environment differs from fault handling in traditional systems. We will investigate what can go wrong in such environments based on a case study of an Order-to-Cash business process. For each of these problems you will learn about the out-of-the-box capabilities in Oracle Service Bus and Oracle SOA Suite that can be applied to prevent faults from happening and to deal with them when they do occur.

More resources



Stay tuned for parts II, III, and IV of the article! 


About the authors


Ronald van Luttikhuizen is Managing Partner and Architect with Vennster and an Oracle ACE Director
Guido Schmutz is Technology Manager for SOA and Emerging Trends with Trivadis and an Oracle ACE Director

Friday, November 23, 2012

DOAG 2012

This week the German Oracle User Group, or DOAG as it is called in German, held their yearly conference. Like other years, the location was the conference center in Nuremberg, a beautiful city in the south.

Vennster was well represented in the SOA/BPM Space, we did the following sessions:
  • SOA Made Simple: service design (Ronald van Luttikhuizen)
  • SOA Made Simple: creating a roadmap for your SOA (Lonneke Dikmans) 
  • Effective Fault Handling in Oracle SOA Suite 11g (Ronald van Luttikhuizen)
  • Introduction in Eventing in SOA Suite 11g (Ronald van Luttikhuizen)
  • Using the B2B Adapter in a Dutch government project (Ronald van Luttikhuizen)
  • Securing heterogeneous systems using Oracle WebServices Manager (Ronald van Luttikhuizen and Jens Peters)
  • Deployment in Oracle SOA Suite and Oracle BPM Suite (Lonneke Dikmans)
  • Stop generating your User Interface! Start designing it (Lonneke Dikmans)
You can find the slides by Ronald and me on slideshare:
Of course there were also other presentations by other presenters ;) DOAG is a big conference, with over 400 presentations. Most of them cover cases, others explain the latest developments. There is a number of tracks that are of interest if you are working in the 'middleware space': BPM, Middleware & SOA, development, Java and Strategy and Business.  The English spoken sessions are not as popular as the German language sessions, but both are well visited. 

I visited three sessions, one case study titled "Dynamische Benutzer-Workflows mit SOA und BPM-Suite" by Arne Brüning, one about the new developments in EclipseLink called "The Evolution of Java Persistence" by Doug Clarke and the last one was a session titled "NoSQL and SQL: Blending the Best of Both Worlds" by Andrew Morgan. All three happened to be presented by Oracle. They were very different in nature. The workflow session discussed a customer case. It was interesting from that point of view. I would have preferred more technical depth, but the presenter was well prepared and had an interesting story to tell. The session by Doug about Eclipse gave a nice overview of the latest developments and put them into perspective of the history of TopLink and EclipseLink. I think that this is a good strategy: it shows that EclipseLink is both proven and modern: it has been around for years and part of the original team is still working there PLUS they have solutions for new developments like JSON, REST services, NoSQL and multi-tenancy. The final presentation was an example how not to do that. The presenter put NoSQL in the title in an attempt to attract a crowd. But the session was really about MySQL clusters. A lot of people left the session while he was talking, because it was completely off topic. The presentation itself was not bad, but the title was misleading.  

Unfortunately I did not have time to see more sessions, because of all the presentations we were doing ourselves. There certainly was a lot more I would have liked to listen to and I hope we will be back next year!



Wednesday, October 3, 2012

Presentations at OpenWorld 2012


This blog contains a wrap up of our presentations at OpenWorld 2012.

Oracle Fusion Middleware Live Application Development (UGF10464)

In this three hour show moderated by Duncan Mills and Chris Muir, the audience could experience the dynamics between three different teams that are building an application based on Oracle Fusion Middleware:

  1. User Interface in ADF;
  2. Services in SOA Suite and Oracle Database; 
  3. Business processes in BPM Suite.

Behind the scenes at the presentation

It's the fourth time Vennster participated in the Live FMW Development sessions after appearances at ODTUG Kaleidoscope, UKOUG, and OBUG. For OpenWorld 2012 the team that prepared the application consisted of Lucas Jellema, Luc Bors, Aino Andriessen, Guido Schmutz, Lonneke Dikmans, and Ronald van Luttikhuizen. This time we tried a different approach in which we pre-built the application and focused on explaining and demoing it in the first part of the session. After that we made several changes and deployed the improved software components.




Some best-practices the team discussed:

  • Use Business Rules to allow for runtime modification of fast changing business logic instead of design time modifications and redeployment of services. Encapsulate useful Business Rules as separate services instead of adding them to existing SCA composites.
  • Include a heartbeat operation for every Web Service (e.g. by using the Mediator's Echo activity) so you can verify that all technical layers of the Web Service work without triggering a functional side effect. 
  • Invoke PL/SQL from DB Adapters instead of directly executing CRUD operations for additional decoupling.
  • Decouple components and introduce additional reliability and robustness by using events.

Effective fault handling in SOA Suite 11g (CON4832)

In this co-presentation with Guido Schmutz we explored how the out-of-the-box frameworks, patterns, and tools that are available in Oracle Service Bus and Oracle SOA Suite can help you to implement fault prevention and handling capabilities.

360 view during the Fault Handling presentation

The session was pretty well attended, and with an extended Q&A sessions at the end. I never had so many questions after a presentation; don't know if that's a compliment or not ;-) Some of the questions raised:

  • Wrapping asynchronous message exchanges as synchronous exchanges and vice versa. 
  • Where to execute long-running and statefull processes: SOA Suite rather than OSB.
  • Fault handling in fire-and-forget message exchanges: if there's no callback, implement fault handling in the service that is being called.
  • Can the Fault Management Framework of SOA Suite be used to catch internal BPEL faults: use catch activities for that purpose.
  • Transaction boundaries, dehydration points, and global transaction timeouts.
  • Compensation versus rollbacks.
  • Chaining exception policies using the Fault Management Framework.

The slides are available from Slideshare and answer some of these questions. A series of articles is underway that dives deeper into these subjects!


Thursday, September 20, 2012

Fault Handling Slides and Q&A


AMIS organized its annual Oracle OpenWorld and JavaOne preview event last Tuesday. The event is organized for people that don't attend OpenWorld and for presenters to rehearse their sessions and get feedback from the audience. Vennster gave two presentations:

  • Using Eclipse DBWS to Interface with Legacy Applications - Lonneke Dikmans
  • Effective Fault Handling in Oracle Service Bus and SOA Suite 11g - Ronald van Luttikhuizen; co-presented at OpenWorld with Guido Schmutz, Technology manager at Trivadis

This blog includes the updated slides of the Fault Handling presentation and contains a Q&A section that answers questions from the audience at the preview event. Lonneke will post the slides of her presentation in a follow-up blog. Last but not least, thanks to AMIS for hosting the event!

Fault Handling 

It is one thing to architect, design, and code the “happy flow” of your automated business processes and services. It is another thing to deal with situations you do not want or expect to occur in your processes and services. This session dives into fault handling in Oracle Service Bus 11g and Oracle SOA Suite 11g, based on an order-to-cash business process. Faults can be divided into business faults, technical faults, programming errors, and faulty user input. Each type of fault needs a different approach to prevent them from occurring or to deal with them. For example, apply User Experience (UX) techniques to improve the quality of your application so that faulty user input can be prevented as much as possible. This session shows and demos what patterns and techniques can be used in Oracle Service Bus and Oracle SOA Suite to prevent and handle technical faults as well as business faults.


Q&A

This section lists answers to the questions that were raised during the preview event.

Q: Where can retries be configured in Oracle Service Bus?
The retry mechanism is used to prevent faults caused by temporary glitches such as short network interruptions. A faulted message is resend (retried) and might succeed this time since the glitch has passed. Retries are an out-of-the-box feature that can be used in Oracle Service Bus and Oracle SOA Suite using the Fault Policy framework. By default, retries are disabled in Oracle Service Bus.

In Oracle Service Bus retries can be configured for several artifacts, among others the following:

  • Retries can be configured on Business Services as part of their Transport Configuration. 
  • Retries can be configured for inbound JCA-based Proxy Services as endpoint properties: jca.retry.countjca.retry.intervaljca.retry.backoff, and jca.retry.maxInterval.
  • Retries can be configured for outbound JCA-based Business Services using the same endpoint properties.
  • Retries can be configured for JMS Proxy Services and (S)FTP Proxy Services as part of the Advanced Settings.
See Oracle's Developer’s Guide for Oracle Service Bus 11g for all possible retry configurations.

Mind transaction boundaries and write operations when configuring retries. For example, consider a flow that is composed of two actions in which the first action inserts a record in the database and then commits, while the second action invokes a Web Service. When the second action fails and the entire flow is retried, you might end up with two new, identical records.

Q: What other load-balancing algorithms can be used besides round-robin?
When there are multiple instances of a service that are invoked from Oracle Service Bus, load-balancing can be configured on Business Services. Load-balancing can prevent faults by retrying messages that were sent to unavailable endpoints to other, hopefully active endpoints. Load-balancing is disabled by default. When enabled, the default algorithm that is used for load-balancing is round-robin.

The Oracle's Developer’s Guide for Oracle Service Bus 11g lists all possible load-balancing algorithms:

"Specify the load balancing algorithm as any one of the following values:
  • Round-robin - This algorithm dynamically orders the URLs that you enter in the Endpoint URI field for this business service. If the first one fails, it tries the next one, and so on until the retry count is exhausted. For every new message, there is a new order of URLs.
  • Random - This algorithm randomly orders the list of URLs that you enter in the Endpoint URI field for this business service. If the first one fails, it tries the next one, and so on until the retry count is exhausted.
  • Random-weighted - This algorithm randomly orders the list of URLs that you enter in the Endpoint URI field for this business service, but some are retried more than others based on the value you enter in the Weight field.
  • None - This algorithm orders the list of URLs that you enter in the Endpoint URI field for this business service from top to bottom."
Q: How is throttling implemented in Oracle Service Bus (in-memory, JMS, etc.)? Are messages in the throttling queue lost when the server fails?
Throttling introduces a message queue between a Proxy Service and a Business Service that temporarily stores messages to prevent backend systems from overloading. Whenever the number of messages exceeds the thresholds as configured in the throttling settings, the message is stored in a queue. When the number of messages that need to be processed decreases, messages are removed from the queue and sent to the backend system via the Business Service.

The Oracle Administrator’s Guide for Oracle Service Bus 11g discusses throttling and its configuration:

"A throttling queue is an in-memory queue. Messages that are placed in this queue are not recoverable when a server fails or when you restart a server. When you delete or rename a business service, all the messages in the throttling queue are discarded."

Q: What values can be used as key for the Result Cache feature?
The result cache in Oracle Service Bus can be used to store data in-memory. When data is queried that is present in the cache, the data is fetched from memory instead of the query being sent to, and executed by the backend system. This decreases the load on the backend system. Note that this feature only works for data that is static; meaning the data is seldom changed. If not, the cache will result stale and out-of-date information. As part of the Result Cache feature you need to configure the key that is used to determine if the associated data is in the cache.

The Oracle's Developer’s Guide for Oracle Service Bus 11g says:

"Cache Token Expression – Oracle Service Bus uses a cache key to identify cached results for retrieval or population, and the cache token portion of the cache key provides the unique identifier. You can use an expression—the Cache Token Expression—to generate the cache token part of the cache key to uniquely identify a cached result for the business service.

The Cache Token Expression must resolve to a String or the value of simple content, such as an attribute or an element with no child elements. If the expression evaluates to null or causes an error, results are not cached."

Friday, March 9, 2012

Enterprise Manager showing use of custom fault handler

In a recent project I explained the fault handling mechanism used in our Oracle SOA Suite 11g composites to a new team member. That particular project used a custom fault handler as part of the overall fault handling strategy. Custom fault handlers are "hooks" in SOA Suite 11g that you can use to plugin your own fault-handling code. In some situations this can be useful as substitute or addition to the fault handlers that are provided out-of-the-box such as retry and human intervention.

The new team member wanted to know whether faults were handled by the default actions provided by Oracle SOA Suite, or handed over to the custom Java class that implemented the custom fault handler. Besides including logging in the custom fault handler Java class itself you can also use the Enterprise Manager.

In Enterprise Manager you can inspect what fault handlers are executed, including the custom handlers (and their implementation class) which improves traceability. The figure below shows the instance trail of an errored process instance for which the SOA Suite has invoked a customer fault handler com.odtug.soa.tooling.FaultHandlerJavaAction. (This image was taken from my fault handling presentation at Kaleidoscope.)



Note that fault handling in Oracle SOA Suite 11g itself is covered in some previous blog posts, see part I, part II, part III, and part IV of Fault handling in Oracle SOA Suite 11g. These posts also discuss the use of custom fault handlers. There are some steps involved to create and deploy a custom fault handler. First time around this can be a non-trivial task. The steps required to deploy a custom fault handler are documenten in the Oracle Fusion Middleware Developer's Guide for Oracle SOA Suite.

Monday, September 20, 2010

Fault handling in Oracle SOA Suite 11g - Part IV

See part I, part II, and part III of this blog for more information on fault handling. The last component of our fault handling framework is the SCA composite that acts as generic fault handler. An example of such a composite would roughly do the following:

  • Dequeue an event from the fault queue causing an instance of this composite to be created;
  • Retrieve the fault information using the event payload and Oracle SOA Suite API’s;
  • Initiate a Human Task to notify administrators a fault has occurred in some composite instance.

You could either choose to pass the fault information to the Human Task itself or leave this to the application displaying the Human Task and the relevant information to deal with this task. In this case the fault information.

Since most of the above is straight-forward we will focus on retrieving the fault information using the Oracle SOA Suite API’s.

Retrieving fault information
Here are some snippets from a Java class that retrieves the fault information. This Java class could be exposed as Web Service, EJB Session Bean, or some other technology so it can be invoked from SCA composites.

Locator locator = LocatorFactory.createLocator();
FaultFilter faultFilter = new FaultFilter();
faultFilter.setECID(ecid);
List faults = locator.getFaults(faultFilter);

You could extend this example and use the Locator API to retrieve additional information such as the composite sensor data belonging to the composite instance that faulted. That way the administrators will have more information on the SCA composite instance.

Locator locator = LocatorFactory.createLocator();
Composite composite = locator.lookupComposite(compositeDN);
CompositeInstanceFilter compositeInstanceFilter = new CompositeInstanceFilter();
compositeInstanceFilter.setECID(ecid);
List instances = composite.getInstances(compositeInstanceFilter);
List sensors = instances.get(0).getSensorData();

And that concludes the final component of our generic fault handler!

Some notes that were acquired during the further implementation of this fault handler:

Faults that occur in BPEL flows -other then Invoke activities- will not be caught by the fault handling framework. An example would be an incorrect XPath expression in an Assign activity. You will need to use some other mechanism such as the Catch and CatchAll activities for that. These handlers could then enqueue an event on the same fault queue as our fault handler does. Or you could test your SCA composites using the out-of-the-box SOA Suite’s test framework to minimize the chance of errors in the BPEL flow itself. Usually there is a higher occurrence of runtime or unexpected faults when invoking external components such as Web Services then in your own BPEL components (given of course that you test your software).

It seems that not all faults are registered in the SOAINFRA database when “ora-terminate” is used as fault action. Especially faults that occur in Invoke activities of BPEL flows (compared to faults in Mediators and Adapters). When switching to “ora-retry” instead, faults and their information are stored in the COMPOSITE_INSTANCE_FAULT table. Switching from terminate to retry as outcome would mean the SOA composite in which the fault occurred will remain in “RUNNING” state according to the Enterprise Manager and will not be terminated.

Wednesday, September 15, 2010

Fault handling in Oracle SOA Suite 11g - Part III

Let's pick up the previous posts on fault handling (part I and part II) from where we left off: our custom Java class that handles faults. Remember that our class is supposed to enqueue an event containing the fault’s identifier and return the action to be executed by Oracle SOA Suite’s fault handling framework. Retrieval of the fault information itself is done by the SCA composite that acts as generic fault handler. This composite will be initiated based on the fault event and retrieve the fault information based on its identifier in the event payload using the Oracle SOA Suite API’s. The composite will then initiate a Human Task to notify administrators that there was a fault in one our composite instances.

Note that we enqueue a fault identifier (its ECID) instead of the fault information itself. When executing the Java class, Oracle SOA Suite is still in the middle of the fault handling mechanism. That means the fault and its corresponding information is not yet fully stored and accessible. After the event is published and control is returned from the Java fault handler class, Oracle SOA Suite will complete the fault handling mechanism and all fault information will be accessible using for instance the SOA Suite API’s.

The Java code
The Java class needs to implement the “IFaultRecoveryJavaClass” interface and its handleFault method. This method receives the fault context. The method enqueues an event on an AQ queue containing the fault identifier and returns “ora-terminate” to the fault handling framework. Alternatively you can also use JMS or EDN as queuing infrastructure. The choice depends on requirements, durability, personal flavor, and so on. For an example on publishing events on the Event Delivery Network using Spring you can read this blog by Guido Schmutz.

You will need to import the following libraries and JAR files to make the class compile:

  • SOA Runtime
  • Oracle JDBC
  • Java EE 1.5 API
  • Oracle XML Parser v2
  • SOA Designtime
  • SOA Workflow
  • WebLogic 10.3 Remote-Client


The class roughly looks like this:

package nl.vennster;


public class MyFaultPolicyJavaAction implements IFaultRecoveryJavaClass {

public String handleFault(IFaultRecoveryContext ctx) {
    UUID uuid = UUID.randomUUID();
    enqueueAqEvent(createEventPayload(ctx), uuid);
    return "ora-terminate";
}

}

The helper method to create the AQ event looks like the following:

private String createEventPayload(IFaultRecoveryContext context) {


String eventPayload = " UNKNOWN_ECID";
if (context instanceof RejectedMsgRecoveryContext) {
RejectedMsgRecoveryContext rejectedMessageContext = (RejectedMsgRecoveryContext) context;
String ecid = null;
if (rejectedMessageContext.getRejectedMessage() != null &&
    rejectedMessageContext.getRejectedMessage().getEcid() != null) {
    ecid = rejectedMessageContext.getRejectedMessage().getEcid();
}
else if (rejectedMessageContext.getFault() != null &&
    rejectedMessageContext.getFault().getECID() != null) {
    ecid = rejectedMessageContext.getFault().getECID();
    eventPayload = eventPayload.replace("UNKNOWN_ECID", ecid);
}
else if (context instanceof BPELFaultRecoveryContextImpl) {
    BPELFaultRecoveryContextImpl bpelFaultRecoveryContextImpl = (BPELFaultRecoveryContextImpl) context;
    eventPayload = eventPayload.replace(“UNKNOWN_ECID”, bpelFaultRecoveryContextImpl.getECID());
}

return eventPayload;
}

Finally, the helper method to enqueue the event on AQ:

public void enqueueAqEvent(String input, UUID uuid) throws JMSException, NamingException, IOException {
Session session = null;
MessageProducer publisher = null;
TextMessage message = null;
Context context = new InitialContext();
Properties properties = new Properties();
InputStream is = this.getClass().getClassLoader().getResourceAsStream(“aq.datasource.properties”);
properties.load(is);
QueueConnectionFactory connectionFactory = (QueueConnectionFactory)context.lookup((String) properties.get(“aq.queueconnectionfactory”));
javax.jms.Connection connection = connectionFactory.createConnection();
Queue queue = (Queue) context.lookup((String) properties.get(“aq.queue”));
session = connection.createSession(true, 0);
publisher = session.createProducer(queue);
message = session.createTextMessage(input);
message.setJMSCorrelationID(uuid.toString());
publisher.send(message);
}

I used the following properties file that defines the AQ connection factory and queue itself. You need to make sure these JNDI destinations exist on the Oracle WebLogic Server on which Oracle SOA Suite runs:

aq.queueconnectionfactory = aqjms/XAQueueConnectionFactory
aq.queue = eis/aqjms/ALG_ADMIN_QUEUE

Deploying the Java Fault Handler
You cannot just deploy the resulting JAR file containing the above Java class to Oracle SOA Suite. As documented in the Oracle Fusion Middleware Developer’s Guide for Oracle SOA Suite 11g you need to do the following:
You can add custom classes and JAR files to an SOA composite application. A SOA extension library for adding extension classes and JARs to an SOA composite application is available in the $ORACLE_HOME/soa/modules/oracle.soa.ext_11.1.1 directory. 
To add custom JARs:
  1. Copy the JAR files to this directory or its subdirectory.
  2. Run ant.
  3. Restart Oracle WebLogic Server.
This is required because of library classloading among others.

Read more on fault handling in part IV of this blog series.

Sunday, August 1, 2010

Fault handling in Oracle SOA Suite 11g - Part II

This previous blog explained why it is a good idea to address -and handle- business faults separately from technical errors. It also introduced a mechanism used in real life Oracle SOA Suite 11g projects to deal with technical errors in a generic way without having to add this functionality to all our SCA composites again and again. Now it is time to dive into the technical implementation of that mechanism and some nitty gritty details.

First things first: How do we get a hold of these technical errors and how can we determine what to do with them?

Oracle SOA Suite 11g offers a unified fault handling framework for SCA composites and their references, service adapters and components such as BPEL and Mediator components. The framework provides hooks you can use to configure fault handling and possibly call out to your own fault handling code. The unified framework is an improvement compared to the SOA Suite 10g stack that consisted of less integrated components (ESB, BPEL) that had their own fault handling mechanisms. The framework is heavily based on BPEL PM’s 10g fault handling framework.

In SOA Suite 11g you configure the fault handling framework on the level of SCA composites using two files: fault-policies.xml and fault-bindings.xml. By default these files need to be in the same directory as the composite.xml file.

Note that you can place these files somewhere else and have multiple SCA composites point to the same fault handling configuration. MDS is a nice candidate since it is a repository for shared artefacts such as reusable XSD’s, DVM’s, and so on. To do this you need to set the “oracle.composite.faultPolicyFile” and “oracle.composite.faultBindingFile” properties in the composite.xml files and point them to fault binding and policy files in the central MDS location. Whether you use this feature mostly depends on how unique your fault handling per SCA composite will be. For now, we will continue with the basic scenario in which we define fault policies per SCA composite.

First of all we will configure the fault-bindings.xml file. This file defines what elements are bound to what fault policy. Elements can be components, references, service adapters or an entire composite. The actual fault policy that is referred to will be defined later on in the fault-policies.xml file. Since business faults can be dealt with using BPEL activities such as Throw and Catch activities we want to have all remaining faults (all unexpected faults) in the entire composite to be handled the same way.

Let’s say we have a simple SCA composite with an inbound file adapter called “MyInboundFileService” and some other components such as a BPEL and Mediator components. Our fault-bindings.xml file could look like the following:


<?xml version="1.0" encoding="UTF-8"?>
<faultPolicyBindings version="2.0.1"
                     xmlns="http://schemas.oracle.com/bpel/faultpolicy">
    <composite faultPolicy="MyCompositeFaultPolicy"/>
</faultPolicyBindings>


In this example we bind fault handling for the entire composite to the -yet to be defined- policy “MyCompositeFaultPolicy”. Instead of the “composite” element you can use the “component” or “reference” elements to apply fault handling on a more granular level.

Next we need to define the fault-policies.xml file. This file defines the actual policies and the conditions when these policies should be executed.

Following the example we will define a single policy, namely “MyCompositeFaultPolicy”:



As you can see from the example we first define the criteria when the policy should be executed. In this case we want it to be executed in case of any technical error. More specifically in case the error is of type “mediatorFault”, “bindingFault” or “runtimeFault”. Note that we can define more intelligent conditions that can be content-based (e.g. based on process instance variables).


<?xml version="1.0"?>
<faultPolicies xmlns="http://schemas.oracle.com/bpel/faultpolicy">
  <faultPolicy version="2.0.1" id="Subsidie_FaultPolicy">
    <Conditions>
      <faultName xmlns:medns="http://schemas.oracle.com/mediator/faults"
 name="medns:mediatorFault">
        <condition>
          <action ref="MyFaultPolicyJavaAction"/>
        </condition>
      </faultName>
      <faultName xmlns:bpelx="http://schemas.oracle.com/bpel/extension"
 name="bpelx:bindingFault">
        <condition>
          <action ref="BPELJavaAction"/>
        </condition>
      </faultName>
      <faultName xmlns:bpelx="http://schemas.oracle.com/bpel/extension"
 name="bpelx:runtimeFault">
        <condition>
          <action ref="MyFaultPolicyJavaAction"/>
        </condition>
      </faultName>
    </Conditions>
    <Actions>
      <Action id="ora-terminate">
        <abort/>
      </Action>     
      <Action id="MyFaultPolicyJavaAction">
        <javaAction className="nl.vennster.MyFaultPolicyJavaAction"
                    defaultAction="ora-terminate">
          <returnValue value="ora-terminate" ref="ora-terminate"/>
        </javaAction>
      </Action>
    </Actions>
  </faultPolicy>
</faultPolicies>


When the error meets any of these criteria the actions within the “Actions” element will be executed. Instead of configuring default actions such as abort, retry or rethrow we redirect the fault to our own Java class called “MyFaultPolicyJavaAction”. This is allowed as long as such a class implements the “IFaultRecoveryJavaClass” class containing the methods “handleFault” and “handleRetrySuccess”. Since the fault may occur within synchronous processes the fault handling framework needs to know what to do after it delegates the fault to some external piece of code. In order to do so the “handleFault” method needs to return the outcome as String. This outcome should map to a predefined fault action. In our example we abort the process instance after our custom Java class has been executed by returning “ora-terminate” that is mapped to the default abort action. Next to that, Java actions need to define a “defaultAction” attribute in case the outcome cannot be mapped to a predefined fault policy.

For some reason rejected messages need to be defined separately. In other words, such faults remain uncaught when using the above fault handling configuration. An example of a rejected message can be an inbound file that cannot be parsed correctly by a File Adapter. To have rejected messages handled we need to specifically include it using the exact name of the adapter service or reference. In our case the inbound file adapter is named “MyInboundFileService”. Our fault-bindings.xml file now looks like this:


<?xml version="1.0"?>
<faultPolicyBindings version="2.0.1"
                     xmlns="http://schemas.oracle.com/bpel/faultpolicy">
    <composite faultPolicy="MyCompositeFaultPolicy"/>
    <service faultPolicy="RejectedMessages">
        <name>MyInboundFileService</name>
    </service>
</faultPolicyBindings>


Note that you can add more than one adapter name to the “service” element. That way all rejected messages of all adapters can be handled the same way. So for instance you can add “MyOutboundDatabaseService” to the “RejectedMessages” policy too.


<?xml version="1.0" encoding="UTF-8"?>
<faultPolicyBindings version="2.0.1"
                     xmlns="http://schemas.oracle.com/bpel/faultpolicy">
    <composite faultPolicy="MyCompositeFaultPolicy"/>
    <service faultPolicy="RejectedMessages">
        <name>MyInboundFileService</name>
<name>MyOutboundDatabaseService</name>
    </service>
</faultPolicyBindings>


We need to add a fault policy to the fault-policies.xml file so our Java class is executed:


<?xml version="1.0"?>
<faultPolicies xmlns="http://schemas.oracle.com/bpel/faultpolicy">
  <faultPolicy version="2.0.1" id="Subsidie_FaultPolicy">
    <Conditions>
      <faultName xmlns:medns="http://schemas.oracle.com/mediator/faults"
 name="medns:mediatorFault">
        <condition>
          <action ref="MyFaultPolicyJavaAction"/>
        </condition>
      </faultName>
      <faultName xmlns:bpelx="http://schemas.oracle.com/bpel/extension"
 name="bpelx:bindingFault">
        <condition>
          <action ref="BPELJavaAction"/>
        </condition>
      </faultName>
      <faultName xmlns:bpelx="http://schemas.oracle.com/bpel/extension"
 name="bpelx:runtimeFault">
        <condition>
          <action ref="MyFaultPolicyJavaAction"/>
        </condition>
      </faultName>
    </Conditions>
    <Actions>
      <Action id="ora-terminate">
        <abort/>
      </Action>     
      <Action id="MyFaultPolicyJavaAction">
        <javaAction className="nl.vennster.MyFaultPolicyJavaAction"
                    defaultAction="ora-terminate">
          <returnValue value="ora-terminate" ref="ora-terminate"/>
        </javaAction>
      </Action>
    </Actions>
  </faultPolicy>
  <faultPolicy version="2.0.1" id="RejectedMessages">
    <Conditions>
      <faultName xmlns:rjm="http://schemas.oracle.com/sca/rejectedmessages">
        <condition>
          <action ref="MyFaultPolicyJavaAction"/>
        </condition>
      </faultName>
    </Conditions>
    <Actions>
      <Action id="ora-terminate">
        <abort/>
      </Action>
      <Action id="MyFaultPolicyJavaAction">
        <javaAction className="nl.vennster.MyFaultPolicyJavaAction"
                    defaultAction="ora-terminate">
          <returnValue value="ora-terminate" ref="ora-terminate"/>
        </javaAction>
      </Action>
    </Actions>
  </faultPolicy>  
</faultPolicies>

Read more on fault handling in part III and part IV of this blog series.

Thursday, July 1, 2010

Fault handling in Oracle SOA Suite 11g - Part I

You generally want to differentiate between technical errors and functional faults within your processes and services. Functional faults are those that have meaning to the business and might be expected. Functional faults and handling these faults can be part of a process. Consider the example of electronic invoice handling in which an invoice is processed that has a total amount of $2000 while an organization only approved an amount of $1500. In this scenario we can use a human task to halt this particular process instance and assign it to the finance department. An employee of the finance department acquires the task and investigates the issue. He or she may conclude that the client sending the invoice was mistaken, that the invoice approval was not entered correctly in our backend IT-systems or that someone put a coffee mug on the invoice and hence the amount was wrongly interpreted by our scanning and OCR software. In any case, after this human intervention the process may continue again and follow the “happy flow” in our BPEL or BPM processes.

When it comes to technical faults you probably do not want to design error handling in the process itself. If you do, your processes and services will end up being cluttered with all kinds of additional process logic such as while loops, gotos, catches, event handling, and so on to try to recover from technical errors. Technical errors might not be recoverable at all; think of an invoice file that is incorrectly formatted, an invoice file that contains negative numbers while your service or process only accepts positive values, or an invoice file that is mangled during transport. Besides, trying to handle these errors makes your SCA composites look like a mix of spaghetti and circuit boards. Not exactly flexible, agile and manageable: the things we wanted to achieve with service- and process-orientation in the first place.

This blog series contains a possible mechanism to generically handle technical errors in your processes and services -that are wrapped as SCA composites- in Oracle SOA Suite 11g.

In one of our projects we came across a scenario in which administrators need to be notified in case of technical errors in any of the SCA composites. Next to the notification they want the corresponding composite to be terminated. Administrators then investigate the cause of the problem and possibly restart the process instances that are involved. Since every employee uses a task-driven portal, administrators want the error to be presented as a human task in this portal instead of receiving a bunch of e-mails. This needed to be implemented with a minimum of additional (business or process) logic.

To achieve this the following mechanism is used:

  • Use Oracle SOA Suite’s Fault Management Framework to redirect (technical) errors to a custom Java class;
  • Have the Java class fire an event containing the unique id of the instance using the Event Delivery Network (EDN) or Advanced Queuing (AQ);
  • Terminate the composite instance by using the Fault Management Framework and the outcome of the custom Java class;
  • Create a single SCA composite to handle all technical errors. This composite subscribes to the event, gathers information on the faulted composite instance, and presents this information as a human task that is assigned to administrators.


Read more on fault handling in part IIpart III and part IV of this blog series.