In my first post, I detailed the history that led up to my working on the Apache Solr JDBC driver and becoming an Apache Lucene/Solr committer. This post will describe the Solr JDBC driver and its usage. The next few posts in the series will be detailed guides on how to use the Solr JDBC driver with SQL clients and database visualization tools.
The first reference I could find of Apache Solr and JDBC dates back to 2008 with SOLR-373. Even though it took almost 8 years, the Solr JDBC driver is a new feature of Solr 6 that enables JDBC connectivity to a Solr Cloud cluster. By opening Solr up to SQL queries, this enables more developers to access the power of a full text search engine for analytics without learning a new query language. JDBC opens up not only Java applications to query Solr with SQL, but also a variety of business intelligence (BI) tools.
The Solr JDBC driver builds on Solr Parallel SQL that was introduced by Joel Bernstein in SOLR-7560 and included the ability to handle SQL queries with the /sql handler. Joel also developed the initial Solr JDBC driver in SOLR-7986. I improved the Solr JDBC driver to support some BI tools with SOLR-8502. Although the Solr JDBC driver isn’t complete, it now supports tools like DbVisualizer and there is already work to support more in SOLR-8659. With the first release of the Solr JDBC driver, only a subset of the SQL language is supported as detailed on the Parallel SQL reference guide page. A SQL optimizer and join capabilities are also planned for future releases.
The Solr JDBC driver is easy to get started with, requiring a Solr Cloud cluster and a few jars on the client classpath. Currently the setup of the Solr JDBC driver requires either a Maven dependency on org.apache.solr:solr-solrj or copying the following jars from the extracted solr-6.0.0.tgz to the client classpath:
Note: SOLR-8680 was created to try to make this a single jar
Once these jars are on the client classpath, one can connect over JDBC with the following connection string format using the driver “org.apache.solr.client.solrj.io.sql.DriverImpl”:
An example of a connection string could be:
The latest documentation for connecting over the Solr JDBC driver is available on the Apache Solr Reference Guide Parallel SQL page under Sending Queries JDBC and SQL Clients and Database Visualization Tools. Additionally, Sematext published a blog post that describes in detail how to use Solr JDBC with Java. In some cases, the Solr collection will need to be configured as detailed here to work with the Solr JDBC driver and Parallel SQL.
SQL Clients and Database Visualization Tools
A few SQL clients and database visualization tools have been tested to work with the Solr JDBC driver. There are continuing efforts to expand the JDBC support to enable more clients under SOLR-8659. Below are a few screenshots of SQL tools connecting to Solr over the JDBC driver.
Apache Zeppelin (incubating)
What is next?
In a series of posts I’ll be making over the next couple weeks, I’ll be posting some step by step guides on how to configure a few SQL clients and database visualization tools to connect to Solr over the JDBC driver. The official Solr documentation Apache Solr Reference Guide will be updated to include more detail about connection parameters and some information about specific clients. If you have any questions about the Solr JDBC driver or want to help contribute, the Solr website has a section on Community and how to use the solr-user mailing list.
Additionally, myself and my team at Avalon are available to provide expert help with Solr, search on Hadoop (primarily Hortonworks or Cloudera), or general big data projects. If you’ve got an initiative that could use some expertise to guide you down the right path, you can reach us at info <at> avalonconsult.com. You can also follow our LinkedIn page or sign up to our mail list to be alerted to future installments of my blog series on Apache Solr and the JDBC driver.
The next post in my series, Apache Solr JDBC – DbVisualizer, is now posted.