Find your content:

Search form

You are here

screen scrape Salesforce with REST GET call from Apex

 
Share

Consider this code snippet from my answer to rao's question:

String requestUrl = '/services/data/v26.0/sobjects/User/describe';
Http http = new Http();
HttpRequest req = new HttpRequest();
req.setEndpoint(URL.getSalesforceBaseUrl().toExternalForm() + requestUrl);
req.setMethod('GET');
req.setHeader('Authorization', 'Bearer ' + UserInfo.getSessionId());

HTTPResponse res = http.send(req);
String output = res.getBody();
System.debug(output);

Works like a charm (provided you've added the instance to "Remote Site Settings" of course).

Now try with standard Salesforce page, for example

String requestUrl = '/001/o'; // or '/home/home.jsp' or anything else

And you get back this weird "placeholder" regardless which page you'll call.

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
    <meta HTTP-EQUIV="PRAGMA" CONTENT="NO-CACHE">
<script>
var escapedHash = '';
var url = 'https://test.salesforce.com/?ec=302&startURL=%2F001%2Fo';
if (window.location.hash) {
   escapedHash = '%23' + window.location.hash.slice(1);
}
if (window.location.replace){ 
window.location.replace(url + escapedHash);
} else {;
window.location.href = url + escapedHash;
} 
</script>
</head>
</html>

Any idea what to do to get to the actual page?

Pass some undocumented header? Go through the frontdoor.jsp like suggested on SO? Or maybe I should expose PageReference.getContent() as a REST service :D


Attribution to: eyescream

Possible Suggestion/Solution #1

Amazing help, Pat(@metadaddy)... this has allowed me to create some marvelous personal developer toys, like a recursive Show All Dependencies!

However, like Simon(@superfell) said, this is not only horribly unsupported and incredibly hard to maintain, but the implementation of this will most likely occur through heart-sinking Pattern and Matcher use, or unimaginable String search methods... since the page most likely won't successfully load into a Dom.Document out-of-the-box.

However, if you can survive without having this be a headless process (and you're really a glutton for punishment), may I suggest approaching this inside a VisualForce page using JavaScript? DOM access will make whatever you're going to do SO much easier.... and after you conquer the hurdle of learning Salesforce page drawing patterns and naming conventions, you'll be in much better shape than from server side coding.


Attribution to: MayTheSForceBeWithYou

Possible Suggestion/Solution #2

Please don't screen-scrape - it's just about the most fragile integration you can imagine. With the release of the Analytics API, it's also now largely unnecessary

Having said that, the Authorization HTTP header only works with API pages. For web pages like /001/o or /home/home.jsp you need to set the sid cookie instead. For example,

String requestUrl = '/001/o';
Http http = new Http();
HttpRequest req = new HttpRequest();
req.setEndpoint(URL.getSalesforceBaseUrl().toExternalForm() + requestUrl);
req.setMethod('GET');
req.setHeader('Cookie','sid='+UserInfo.getSessionId()); 

HTTPResponse res = http.send(req);
String output = res.getBody();
System.debug(output);

Yields

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html class="ext-strict"><head><script type="text/javascript" src="/jslibrary/1351189248000/sfdc/JiffyStubs.js"></script>
<title>Accounts: Home ~ salesforce.com - Developer Edition</title>
<!-- LOTS MORE VALID PAGE DATA -->

Attribution to: metadaddy
This content is remixed from stackoverflow or stackexchange. Please visit https://salesforce.stackexchange.com/questions/4692

My Block Status

My Block Content