Calling .NET code from XQuery

As I mentioned in a blog post several weeks ago, I have been working on a way to call .NET code directly from MarkLogic.

My goal is to be able to access the functionality of existing .NET assemblies so that I don’t need to spend time re-implementing any logic in XQuery. Some typical use cases might include authenticating against a proprietary access control system written in .NET, or connecting to S3 buckets using Amazon’s .NET SDK.

I’m calling my code library “ML.NET” and am pleased to report that it now has just about all the features needed to provide real value to my projects, including:

  • Dynamically compiling .NET code for invocation
  • Caching compiled assemblies so that code does not need to be re-compiled
  • Support for data types such as strings, booleans, integers, floats, XML documents, binaries, and also arrays of all these types
  • Authentication to prevent unauthorized users from executing .NET code
  • Automatic system clean up to remove unused cached assemblies

To make this a bit more concrete, I’ll step through a simple example that uses ML.NET to integrate Amazon’s Simple Storage Service (S3) with data stored in MarkLogic.

A Simple Example Using ML.NET

In this example, I’ve stored image metadata in MarkLogic while the images themselves are kept in an S3 bucket. Although Amazon provides a RESTful API to retrieve objects from buckets, signing the requests and parsing the responses can be tedious in pure XQuery. To simplify the process, I’ll use ML.NET to access Amazon’s .NET SDK from within XQuery.

The first step is to embed C# code in my XQuery by storing the text in a variable named $class-code. The $class-code defines a static class that retrieves an object from an S3 bucket and converts it to a byte array. Note that the “real” work is done by the AmazonS3, GetObjectRequest, and GetObjectResponse classes, which are all part of the Amazon SDK.

let $class-code := '
using Amazon;
using Amazon.S3;
using Amazon.S3.Model;
using System.IO;
using System;

public static class S3Retriever{
   public static byte[] Retrieve(AmazonS3 client, string bucket, string objectKey){
      GetObjectRequest request = new GetObjectRequest(){
         BucketName = bucket, Key = objectKey
      };
      GetObjectResponse response = client.GetObject(request);
      using (BinaryReader reader = new BinaryReader(response.ResponseStream))
      { return reader.ReadBytes( (int)response.ContentLength ); }
   }
}'
   

Next, I create “executable” code (stored inside $execute-code) that calls the static class and provides the image identifier and an S3 client that holds our Amazon credentials.

let $execute-code := '
   AmazonS3 s3Client = AWSClientFactory.CreateAmazonS3Client(key, secret);
   bytes = S3Retriever.Retrieve(s3Client, bucket, assetId);
'

Finally, in pure XQuery, I compile and invoke the C# in order to retrieve 10 images from Amazon and save them to my local filesystem.

let $_ := (
   mlnet:start("https://localhost/mlnetservice/mlnet.ashx", $username, $password),
   mlnet:custom-assemblies("AWSSDK.dll"),
   mlnet:classes($class-code),
   mlnet:set-outparam("bytes", mlnet:bytearray())
)
let $execution-id := ""
let $_ := 
   for $img at $index 
      in (//asset[mime-type eq 'image/jpeg' and byte-size lt 1024 * 500])[1 to 10]
   let $path := fn:concat("c:\images\", fn:string($img/filename))
   return (  
      mlnet:set-param("key", mlnet:string($key)),
      mlnet:set-param("secret", mlnet:string($secret)),
      mlnet:set-param("bucket", mlnet:string($bucket)),
      mlnet:set-param("assetId", mlnet:string($img/asset-id)),
      (
         if ($index eq 1) then xdmp:set($execution-id, mlnet:execute($execute-code))
         else mlnet:re-execute($execution-id)
      ),
      xdmp:save($path, mlnet:get("bytes"))   
   )

return mlnet:end()

Note that at the beginning of the XQuery, I call mlnet:start() with three parameters: the location of the ML.NET web service, a username, and a password. The web service is part of the ML.NET library and is responsible for compiling and invoking all the .NET code. In this case, I am accessing the service over SSL, which ensures that the username and password are encrypted. The service will check these user credentials against the MarkLogic Security database, thus ensuring that only authorized users are able to compile and execute code on the server.

After calling mlnet:start(), I reference the Amazon SDK assembly, set the class code so that it is available for use, and define an output parameter named “bytes” that I will use to retrieve the image as a byte array.

In the next block of code, I query MarkLogic for ten small JPEG files and then iterate over the sequence. In the first iteration, the code calls mlnet:execute(), which compiles and invokes the C# stored in $execute-code. Calling mlnet:execute() also returns an id that I can then pass to mlnet:re-execute in all subsequent iterations (in this example, I am using xdmp:set() to assign the id to $execution-id). mlnet:re-execute() re-uses previously compiled assemblies, thus avoiding the overhead of compiling .NET code in each iteration.

How long are compiled assemblies kept by the web service? In this case, the assemblies are deleted as soon as the XQuery calls mlnet:end(). Calling mlnet:start() creates a compilation and execution context. Calling mlnet:end() deletes all the assemblies that were created within that context.

If you forget to call mlnet:end(), the system will eventually delete old assemblies after an “expiration period”. You can specify the expiration period when you first set-up the ML.NET web service; I generally set the duration to 20 minutes.

Getting More Information

I plan to have a poster on ML.NET at the upcoming MarkLogic World conference in Washington, DC (May 1-3, 2012). Stop by Avalon Consulting at the Diamond Booth and I’ll be happy to demo the ML.NET library and discuss the technical implementation in greater detail.

Demian Hess About Demian Hess

Demian Hess is Avalon Consulting, LLC's Director of Digital Asset Management and Publishing Systems. Demian has worked in online publishing since 2000, specializing in XML transformations and content management solutions. He has worked at Elsevier, SAGE Publications, Inc., and PubMed Central. After studying American Civilization and Computer Science at Brown University, he went on to complete a Master's in English at Oregon State University, as well as a Master's in Information Systems at Drexel University.

Leave a Comment

*