Find your content:

Search form

You are here

What are the limitations on using a semi-join in a batch apex query locator?

 
Share

I have a relativley complex batch apex process that needs to pull in a large number of master objects based on a field on the master OR the detail object. However in my apex logic I need to ensure that each master object is processed fully within a single execute method.

This means that I can't really use the detail object as my query locator as if I do there's a chance that I'll get different detail objects for the same master in different execute chunks.

So this brings me to using a semi-join in the query locator my batch job starts with. Something like: SELECT id FROM Master__c WHERE Id IN (SELECT master__c FROM Detail__c WHERE master__r.theFieldICareAbout__c = true OR theFieldICareAbout__c = true)

However the data sets this needs to work on are poorly defined at best, in some cases I could be working on as few as 20,000 records, but in others I could be working on something close to 1.5 million detail objects.

I don't have any orgs available for testing at the higher end of this spectrum, so I have to ask, will I hit any governors trying to handle a colossal semi-join like this? Will it just fizzle out and die?


Attribution to: ca_peterson

Possible Suggestion/Solution #1

If the Detail__c part of this query exceeds your 100,000 rows, you will receive an exception, before you even try to receive the Master__c records. I'm not sure exactly what you're going for, but here are a couple of approaches:

  • Query on the Master object with a subquery on the related Details within the execute() method. You might run with a limit on the batch size if that is a safe way to limit how many Detail rows you'll receive. I have run this kind of process with as few as 1 records per batch.
  • Query on the Detail object but sort by Master Id, so you can finish and update objects as you go along. Implementing Database.stateful will allow you to keep a running tally of any incomplete objects between execute runs. This will eliminate the requirement that one Master object fit within the governor restrictions of one execute() mathod.

Attribution to: Jeremy Nottingham

Possible Suggestion/Solution #2

Thefieldicareabout__c in the where clause should be an external Id so that it is indexed and available for optimised querying.


Attribution to: techtrekker
This content is remixed from stackoverflow or stackexchange. Please visit https://salesforce.stackexchange.com/questions/1675

My Block Status

My Block Content