Modernizing the Online HR System

In the mid-1990s, one of my first assignments as an IT Consultant was to build an online HR system.  The client wanted an extranet portal system enabling searching capabilities on personnel by departments or names or location, accessible to partners as well as internally.  During this time, all companies were rushing to move applications to the Internet and make everything accessible “online”.  The first HTML page was an overall organization chart with drill-down capability.  The report should start at the President and progress to department executives, then drilldown to managers and so forth.  Employees were grouped under their managers and departments.

This report vexed the IT department for some time.  At first glance, you would think this was a simple SQL query that would sort by department then manager then position and name etc.  The IT department was convinced that this issue was a “reporting” software issue.  They had purchased a couple enterprise reporting programs in the past but never succeeded in representing the data in the exact manner desired.  All of the reporting software applications could execute a query, group the results and provide aggregations for summaries etc.  They believed it was the reporting application’s functionality to blame for failing to represent the data as a Tree-like-Hierarchy, which they wanted with drilldowns.

The company purchased an enterprise reporting system called Actuate, in the hopes that this application could solve the initial high priority requirement.  I started to work on this problem, like others who came before me; I started to work with the data.  I tried different Order By statements and Group By statements, and never could get the data to “look” just right.  Then, I researched data buffers internally available to the reporting software.  I wrote some code to buffer the data, then iterate through the data to find Manager -> Employee relationships and build the Tree-Hierarchy internally.  Eventually, I came up with a solution but it was not optimal.  The code I wrote buffered a LOT of data, and I had to loop through this data to find relationships and hierarchy on every employee.  You see, this is essentially a looping issue.  A SQL statement returns an employee, and then you have to decide if this employee is a manager based upon other employee data.  The data was stored in an Oracle database similar to this snapshot:

hr_images

Note the Manager ID field and the Employee ID field above and you will see that the data is nicely formatted to represent a Tree-like-Hierarchy.  Employee ID 103 is followed by another Emp ID of 105 who reports to 103, and so forth.  In the above simple example, you may think a solution would be to Sort By Employee ID, but this is not the case because actual employee IDs are random and could be any combination of alphanumeric characters.

In order to solve the initial client request, I wrote a SQL statement that is built-in to Oracle databases.  It is called a “Connect By Prior” statement and orders the data appropriately in a Tree-like-Hierarchy.  The final solution had nothing to do with the reporting software.  The “Connect By Prior” statement orders the data by doing an internal loop to find manager and employee relationships.  The report simply displayed the data as returned from the SQL.

hr_report

I think back on that project and wondered why someone structured the data in this manner.  Why not create a separate composite table with employee to manager relationships?  A composite table would have enabled an easier approach to querying the data in several different manners.  During this time, space was an issue and storing the relationship in the same table saved space.  Today, we don’t really have the same consideration concerning disk space or even processing.  We have huge amounts of disk space available for relatively inexpensive costs.  We also have Amazon (among several others) offering immense amounts of processing capabilities on demand in the “cloud”.  I still believe we must be conservative and practical when dealing with processing or space, but we can take on costs that were unthinkable just a few years ago.  Additionally, new database technologies have emerged to facilitate new approaches to solutions (more on that a bit later in this post).

This story exemplifies the common misunderstandings many of us have when assigning blame in IT.  The client was convinced that a reporting software application should solve this issue internally.  Although I did that initially with a data buffer, it wasn’t the most appropriate solution.  The real issue was a lack of knowledge about the structure of the data in one table and the functionality of Oracle to represent the data for reporting.

Today’s Perspective

How would you solve this requirement today?  Imagine the same scenario and think about building an extranet application accessible to hundreds of thousands of people, tens of thousands of concurrent users, secured, and flexible enough to adapt with data format changes and scale.

The system could be similar to LinkedIn; including user profiles, resumes, background information, contact information, addresses, and user interests or hobbies along with a running dashboard of posts and news.  LinkedIn obviously has many features that we don’t necessarily require, but it could serve as an example of an online company directory.  Our scale would be smaller but if the system is used at GM for example, it could support 200,000 employees and potentially millions of suppliers.

Since my days working as a web app developer in the 1990s, not much has changed in the architecture of web based systems.  We have multiple tiers with a database server, an application server like WebLogic and an Internet server like Apache.  We put in a big-ip load balancer and start to cluster the tiers based on estimates for use.  For database load, we might scale up with more powerful and expensive servers.  Then, we test performance and response rates using something like jMeter.  Typically, we end-up over estimating use cases and therefore end-up setting up too many servers at a bloated expense.  This also means that some servers are sitting idle or only using a small percentage of their processing capabilities.  This model is changing today and becoming much more efficient.  Web based information systems are becoming faster, better and cheaper for two essential reasons.

  • Companies like Amazon, Microsoft and RackSpace offer solid on-demand cloud services.  You pay for what you use, instead of having idle processing and too much space invested into an internal server farm locally administrated.  (Of course many companies are using these services but still somewhat hesitant because of data concerns.)
  • The promise of Open Source software has matured and become stable enough, such as Apache, that many companies are using it for enterprise grade systems.  Or companies are prototyping systems while using Open Source applications for free, then purchasing an enterprise supported license after successful proof-of-concept.  This model is similar to that of Couchbase.
  • For reference see these two recent stories:  OSS and Cloud which feature companies IBM and Netflix.

An interesting change in web application architecture is the adaptation of different data storage technology.  This really started at large Internet companies like Google, Facebook, and Yahoo.  These companies work with large datasets, which are expanding incredibly fast with an exponential growth rate.  They were running into limitations with traditional relational databases and pioneered alternatives.  Their work ignited an Open Source effort to create NoSQL (Not only SQL) database projects and now there are commercial efforts based on these projects.

Couchbase is one of these companies and as I learn more about their technology, I was convinced that a modern HR Information System could leverage this technology and improve the original system we built back in the 1990s.

With a NoSQL database technology like Couchbase, we could build a system that stores the employee -> manager relationship along with department path within an employee JSON document.  This simplifies Tree-like-Hierarchy reporting.  There is no schema rules enforced with NoSQL databases unlike relational databases.  This provides flexibility to developers and requires the developer to carefully design JSON documents for integrity and performance.  More importantly, it provides the application flexibility to evolve through time and adapt quickly to changes.  The technology is easy to scale with off the shelf commodity servers, so clustering is optimized with new servers added or removed on the fly to handle usage requirements.  Finally, there are internal tools for administration and solid APIs for development.

The original system we developed was based upon an internal HR desktop application.  We reused the schema and data from that system to create an online extranet web application.  Today, we could approach this differently and store the data in a more flexible manner using JSON documents.  As the landscape changes in user generated data, we can change with it, and eliminate the necessity of handling data changes that affect multiple tables in a relational database.

The role of relational databases is still required today for many types of applications.  This is just one example for an HR application where NoSQL is more effective and adaptive while users generate more data in posts or hobbies or creating groups etc.  It also would be flexible enough to handle reporting and analytic requirements today and in the future.

In order to test my ideas about developing such a system based on NoSQL, I prototyped an example.  First, I downloaded the Couchbase server (for free) and worked on developing the organization report like I did in Oracle back in the 1990s.  The first step was to get the data into Couchbase after installing the software.  There is a simple setup executable and I had no problem getting a local instance running.  In order to copy the data from Oracle to Couchbase, I created a Java script to query Oracle for employees.  Then, I placed the query results into JSON documents similar to the example below:

hr_json

This is a simple JSON document and as the development matures with multiple attributes for an employee, the document will also evolve.  For instance, you could create another JSON document describing “Hobbies” and relate that interest back to an employee by adding key-values “Hobbies”: [{ “id”: “100” },{ “id”: “101” }].

After creating my employee JSON documents, I uploaded them to Couchbase using the “.upsert” API command.  Once that was completed, I created a N1QL statement.  N1QL is Couchbase’s ability to query JSON documents in a language similar to SQL.  It was really helpful for me, since I have a long background in relational databases and PL/SQL.  Here is the simple statement that returned a Tree-like-Hierarchy:

“select emp_id, manager_id, level, dept_name, job_title, fname, lname, phone, email from default order by org_path”

Finally, I put the result set into an HTML page and used the jQuery Accordian Widget to create the “drill-down” functionality like the previous example image included earlier in this post.  All of this work took about a week of my time.  The original project took several weeks’ worth of development.  This is an extremely powerful and easy process to get something running and ready to show potential clients.  In the future, I may continue this development and add more functionality and complexity (stay posted!).  The above N1QL query is simple because the “from” clause hits one data source “default”.  You could combine data buckets and JOIN them similarly to SQL relational based databases.

At Avalon, we have a history of leveraging new information technology systems which transforms data investments into actionable business results.  Our Big Data practice has years of experience in Hadoop along with other nontraditional and trending technologies.  A significant advantage for NoSQL databases is the ease for building out web applications in the cloud using commodity servers.  This attractive opportunity enables web applications to flexibly evolve as user generated data formats change.  It also enables scale flexibility for adding or removing servers as usage demands change.  Couchbase is a great example of a NoSQL database with several compelling case studies.

Will Thayer About Will Thayer

Will Thayer is a Principal Consultant Technologist at Avalon Consulting, LLC. Will has 18 years’ experience in planning, strategy, development, and training. His expertise includes web application development, management information systems, and system development life cycle practices. As an Adjunct Professor at the University of Denver, Will taught graduate and undergraduate students for 5 years. His research in Open EDI and XML EDI has appeared in books as well as trade periodicals. Will lives in Evergreen, Colorado where he enjoys skiing, hiking, biking, and camping in the Rocky Mountains with his wife Robyn and daughter Sophia.

Leave a Comment

*