From XQuery to JavaScript – MarkLogic’s Bold Platform Play

MarkLogic's JavaScript Strategy

MarkLogic will incorporate JavaScript as a native authoring language in MarkLogic Server 8.0, out later this year

San Francisco Airport would seem an odd place to change the world, though no doubt it’s been the hub of any number of game changers over the years, but the MarkLogic World conference in San Francisco, held at the Waterfront Marriott just south of the iconic airport, may very well have presaged a radical change for the company and perhaps far beyond.

Over the last couple of years, MarkLogic World has become leaner but also more focused, with announcements of changes for their eponymous data server that were both unexpected and in general have proven very successful. The announcement this year was, in that light, about par for the course: For MarkLogic 8, due out later this year, the MarkLogic team was taking the ambitious step of compiling and integrating Google’s V8 JavaScript engine directly into the core of the database.

In essence, the erstwhile XML application is becoming a fully functional Javascript/JSON database. Javascript programmers will be able to use what amounts to the core of the Node.js server to access JSON, XML, binaries and RDF from the database, run full text, geoSpatial and sparql queries, use the robust application server and performance monitoring capabilities, and ultimately do everything within MarkLogic that XQuery developers have been able to do from nearly the first days of the product’s existence.

This move was driven by a few hard realities. One of the single biggest gating factors that MarkLogic has faced has been the need to program the application interfaces in XQuery. The language is very expressive, but has really only established itself within a few XML-oriented databases, even as JavaScript and JSON databases has seen an explosion of developers, implementations, and libraries. Organizations that purchased MarkLogic found themselves struggling to find the talent to code it, and that in turn meant that while MarkLogic has had some very impressive successes, it was beginning to gain a reputation as being too hard to program.

MarkLogic 8 will likely reverse that trend. People will be able to write query functions and modules in JavaScript, will be able to invoke JavaScript from XQuery (and vice versa), can use JavaScript dot notation as well as XPath notation, will be able to import and use community (possibly node.js compatible) JavaScript script modules, and can mix XML and JSON as native types. It is very likely that this may very well add rocket fuel to the MarkLogic server, as it becomes one of the first to effectively manage the trifecta of XML, JSON, and RDF (and their respective languages) within the same polyglot environment.

MarkLogic CEO Gary Bloom deserves a lot of the credit for this if he can pull it off. A couple of years ago, the company was somewhat dispirited, there were a number of high profile departures, and the organization had just gone through three CEOs in two years. Bloom managed to turn around morale, adding semantic support last year (which is significantly enhanced with MarkLogic 8, see below), cutting licensing prices by 2/3, refocusing the development team, and significantly expanding the sales teams. That’s paid off significant dividends – there were a number of new customers in attendance this year at the SF event, and this is one of a six stop “world tour” that will see key management and technical gurus reach out to clients in Washington, DC, Baltimore, New York, London, and Chicago.

In addition to the JavaScript news, MarkLogic also announced that they would take the next step in completing the Semantic layer of the application. This includes completion of the Sparql 1.1 specification (including the rest of the predicate paths specification and aggregate operations), adoption of the Sparql Update Facility and inference support. While the JavaScript/JSON announcement tended to overshadow this, there is no question that MarkLogic sees Semantics as a key part of its data strategy over the next few years. This particular version represents the second year of a three year effort to create an industry leading semantic triple store, and it is very likely that most of what will be in ML 9 will be ontology modeling tools, admin capabilities and advanced analytics tools.

The inferencing support is, in its own way, as much of a gamble as the JavaScript efforts, in an attempt to consolidate on the semantics usage by publishers, news organizations, media companies, government agencies, and others that see semantic triple stores as analytics tools. This becomes even more complex given the fact that such inferencing needs to be done quickly within the context of dynamic updates, the MarkLogic security model and similar constraints. If they pull it off (and there’s a fair amount of evidence to indicate they will), not only will MarkLogic vault to the top of the Semantics market but this may also dramatically increase RDF/Sparql in the general development community, especially given that semantics capabilities (including Sparql) will be as available to JavaScript developers as it will be to xQuery devs.

The final announcement from MarkLogic was the introduction of a bitemporal index. Bitemporality is probably not that big of an issue in most development circles, but in financial institutions, especially those that need to deal with regulatory issues, this is a very big deal. The idea behind bitemporality is that documents entered into the database may differ in time from when the database is processed by an application. This distinction can make a big difference about financial transactions, and may have an impact upon regulatory restrictions. Bitemporality makes it possible for a document to effectively maintain multiple date stamps, which in turn can be used to ascertain what documents are “in effect” at a given time. In a way, this makes it possible to use MarkLogic as a “time machine”, rolling the database back in time to see what resources were or weren’t active at that time.

Will this mean that you’ll see applications being developed where all of the tools of MarkLogic – from XQuery to Javascript, Semantics to SQL and XSLT – will be used to build applications, as MarkLogic Chief Architect Jason Hunter challenged me at lunch one day of the session? Is there a use case where that even makes sense? After a lot of thought, I will have to throw in the towel. I think there are definitely places where you may end up using SPARQL and SQL together – if you had slurped up a relational table that you wanted to preserve in its original form while working with RDF data, the case is definitely there, and any time you work with XML content, there are often good reasons to use XSLT – for formatting complex XML output or doing specialized tree-walking processing (you can do that in XQuery, but XSLT is usually more intuitive there), but the challenge comes in directly using XQuery and JSON together.

The reason for this difficulty is that XQuery and JSON fulfill very similar roles. For instance, suppose that you have RDF that describes an organization’s sales revenue, and you wanted to compare sector sales in the various quarters of 2014. This can actually be handled by a single SPARQL query that looks something like (salesReport.sp): 

select ?sectorLabel as "Sector"
              ?quarterLabel as "Quarter"
              ?salesAgentLabel as "Agent"
              ?revenue as "Revenue"
    where (
         ?company company:identifier ?companyID
         ?sector sector:company ?company.
         ?quarter quarter:company ?company.
         ?sector 
               rdf:type ?sectorType
               sector:estabishedDate ?sectorStartDate;
               sector:reorgDate ?sectorReorgDate;
               rdfs:label ?sectorLabel.
            filter(gYear(?sectorStartDate) <= ?year)
            filter(gYear(?sectorEndDate > ?year)
         ?salesAgent
               rdf:type ?saleAgentType; 
               salesAgent:sale ?sale;
               rdfs:label ?label.
         ?sale sale:salesQuarter ?quarter;
               sale:salesSector ?sector;
               sale:revenue ?revenue.
         ?quarter rdfs:label ?quarterLabel;
               rdf:type ?quarterType.
         }  order by ?sectorLabel ?quarterLabel ?salesAgentLabel desc(?revenue)

 

Now, in XQuery, the report for this is fairly simple to generate:

let $companyID = "AVALON"
let $year := 2014
return
<report>{
 sparql:invoke('salesReport.sp',map:new("companyID",$companyID),map:entry("year",$year))) !
<record>
      <sector>{map:get(.,"Sector")}</sector>,
      <quarter>{map:get(.,"Quarter")}</quarter>
      <agent>{map:get(.,"Agent")}</agent>
      <revenue>{map:get(.,"Revenue")}</revenue>
</record>}</report>

In a (currently hypothetical) Javascript version, it may very well end up being about as simple:

function(){
var companyID = "MARKLOGIC"
var year = 2014
return {report: sparql:invoke('salesReport.sp',
  function(obj,index){return {record:{
    sector: obj.Sector,
    quarter: obj.Quarter,
    agent: obj.Agent,
    revenue: obj:Revenue
    }
 })
}}();

Note that in both cases, I’ve explicitly broken out the mapping to make it obvious what was happening, but the four assignments could also have been replaced by

<record>
     {map:keys(.) ! element {fn:lower-case($key)} {map:get(.,$key)}}
</record>}</report>

and

function(obj,index){
    var new-obj = {}
    forEach(obj,function(key,value){
        new-obj [ lower-case(key) ] =  value;
        })
    return {record:new-obj}
 })

respectively.

The principle difference that exists between the two implementations is in the use of functional callbacks in JavaScript as opposed to the somewhat more purely declarative model of XQuery, but these aren’t significant in practice … and that is the crux of Jason’s (correct) assertion – it’s possible that you may end up wanting to invoke XQuery in line directly from JavaScript (or vice versa) but unlikely, because there is a lot of overlap.

On the other hand, what I do expect to see is situations where in a development team, you may have some people work with XQuery and other people work with JavaScript, but each will break their efforts into modules of functions that can be imported. For instance, you may have an advanced mathematics library (say, giving you “R” like functionality, for those familiar with the hot new statistical analysis language) that may be written in Javascript. XQuery should be able to use those functions:

import module namespace rlite = "http://www.marklogic.com/packages/r-lite" at "/MarkLogic/Packages/R-Lite.js";
let $b := rlite:bei(2,20)
return "Order 2 Bessel at x = 20 is " || $b

Similar Javascript should be able to use existing libraries as well as any that are engineered in XQuery (here an admin package):

var Admin = importPackage("http://www.marklogic.com/packages/admin", "/MarkLogic/Packages/Admin.xqm");

"There are " + fn.count(Admin.servers()) + " servers currently in operation";

The package variable Admin holds methods. It may be that this gets invoked as Admin::servers(), depending upon the degree to which MarkLogic is going to alter the native V8 implementation in order to facilitate such packages (and provide support for inline XML, among a host of other issues).

Ironically, one frustratring problem for MarkLogic may be in their best practice use of the dash (“-“) in variable and function names. My guess is that xdmp:get-request-field() may end up getting rendered as xdmp.get_request_field() in Javascript, but until the review betas rolls out, it will be difficult to say for sure.

However, the biggest takeaway from this is that if you’re a MarkLogic XQuery developer, XQuery will continue to be supported, while if you’re a JavaScript developer looking to get into MarkLogic, the ML8 implementation is definitely something you should explore. For that matter, if you are looking for a good NoSQL data server, MarkLogic 8 should be on your radar regardless of whether you are a JavaScript developer or are a CIO wanting to use the vast pool of JavaScript developers to help build your enterprise ready data applications.

Subscribe to this blog (add your name to our mailing list) to get notified regarding my updates as I evaluate MarkLogic8 through Avalon’s Early Adopter participation.  And reach out directly to me (caglek@avalonconsult.com), or comment here, if there is anything specific you’d like me to explore in the EA version on your behalf.

 

Kurt Cagle About Kurt Cagle

Kurt Cagle is the Principal Evangelist for Semantic Technology with Avalon Consulting, LLC, and has designed information strategies for Fortune 500 companies, universities and Federal and State Agencies. He is currently completing a book on HTML5 Scalable Vector Graphics for O'Reilly Media.

Comments

  1. Zdeno Tubel says:

    Hi Kurt,

    Thank you for sharing news from SF.

    Do you know something more about that bitemporal index? I was able to eval/invoke XQueries in past time also in previous versions of ML with “only” limitation you could not join them together. Will this index allow me to do it? Will I be able to write single cts:query that returns documents that satisfy my query know but not five minutes ago?

    • Given the structure that I saw in the demos you should be able to do that; when enabled, it provides a way to give you a state not just of a single document but essentially all bitemporal enabled documents without the need to utilize the eval/invoke mechanism, and without the limitations involved in making queries at specific merge points.

Leave a Comment

*