Apache Lucene/Solr Committer

Needham Software LLC is proud to announce that Founder and Chief Architect Gus Heck has been promoted to Committer status on the Lucene/Solr project. This is a great honor, and further invigorates his plans to continue enhancing Solr.

With respect to services offered by Needham Software LLC this ensures that any updates to Solr required by our clients can be contributed in a timely manner. It will no longer be necessary to wait for a committer on the project to find time and interest to adopt and sponsor proposed functionality. Of course, any changes to Solr are still subject to community review and the inclusion of features can never be guaranteed. All submissions still must be to the benefit of the Lucene/Solr community, and if the community does not concur regarding the value of the contribution it will not be able to be committed to the project. What can be guaranteed is that it will not be delayed indefinitely because no committer is interested or the interested committers are swamped with work.

Next Steps

In terms of planned contributions, not much changes other than perhaps the timeline/turnaround. Features currently on Gus’s short list are:

  • Streaming Expressions – Enhancements to improve usability, functionality and safety 
  • Unit tests – There is a big push to improve the unit tests in the project right now, and contributing to this effort will be a priority
  • SOLR-3191 – This issue has been lingering for a very long time, and it provides the ability to succinctly and intuitively exclude fields from the response
  • Time Routed Aliases – More enhancements are needed to make this feature easier to use effectively

JesterJ 1.0 Beta2 Released

Several new features are now available in JesterJ, which is slowly but surely moving towards full 1.0. My recent preparation for my talk at Activate 2018 has inspired me to fix a few things and make these improvements available in a release

New Features in 1.0 Beta 2

Plan Visualization

This is the key feature that really drove the production of the Beta 2 release. I added this so I could more easily show what is going on in JesterJ for my talk, and I’ve found it’s tremendously helpful and gratifying to be able to see the connections/layout of a JesterJ ingestion DAG. I just had to share it! Here’s an example from my talk:

Blue ovals represent Implementations of Scanner.java producing data and arrows indicate document flow. This image nicely demonstrates the DAG (Directed Acyclic Graph) feature. Three data sources feed four flows into three terminal steps that are all putting data into solr. The bottom right oval is a step configured with a SendToSolrCloudProcessor.java instance that adds user click through data to a TRA (Time Routed Alias). The middle oval is sending to a standard solr collection (non-time series) and the bottom left is invoking a streaming expression update() to build up partial calculations for an NPMR (non parametric multiplicative regression) model in solr. The second and third from the bottom on the right are examples of custom implementations of Processor.java that I hope to detail in a future post.

Other Enhancements

  • Java Config Jar command line Parameter – When considering the use of Java Config initially I was unsure I would keep it long term so I added a Java property to identify the jar file containing the config, but since this has turned out to be convenient and is the supported mode of operation, I’ve added a (required) positional command line parameter for it.  I also intend to further simplify the command line before 1.0 final
  • Defaults for some methods in Scanner.java – Another user-friendliness enhancement. The initial implmentation of Scanner.java was very forward looking. Two methods have been repeatedly implemented with no-op’s and so the default keyword has been used to make these no-op implementations the standard unless overridden 
  • RouteByStepName warns if documents are dropped. This router was dropping documents silently, which could be useful at times, but in coordination with the Router/Scanner bug (see below) lead me into a very long debugging session. I decided it’s better to warn about dropped documents.

Bug Fix: IP Adress changes

One of the more irritating bugs I ran into cropped up in a prior presentation where I had intended to show JesterJ working in real time. I had everything all set, verified it multiple times, shut my laptop and went to the Meetup. When it came to my part of the Meet Up, I went to start JesterJ, only to get a stack trace.

I can hardly blame my audience if they were unimpressed 🙁

The underlying problem lies in that the embedded Cassandra within JesterJ writes down the ip address in a config file (~/.jj/cassandra/cassandra.yml) and moving the laptop causes this to change, so Cassandra fails to bind the old address and JesterJ won’t start. This is mostly not a problem for production instances where ip addresses shouldn’t change frequently, but if things do need to be moved, this would become a major irritation. I can also see someone excited to use JesterJ taking their laptop to a meeting to show their team/boss and the switch from docking station to wireless changes their IP… boom!

Other Bug Fixes

  • Routers on Scanners work again. Some time ago a bug crept in that caused the router added to a scanner to never be built from it’s builder. The system was defaulting to RouteByStepName which was silently dropping documents. 
  • Log4J incremented to 2.6.1 – This is to avoid LOG4J2-1409 but to minimize changes in project dependencies (and associated licensing documentation work) this was a minimal upgrade rather than a move to the latest/greatest.

There were a few other changes to the build and issues relating to our use (and misconfiguration) of Artifactory, but those probably aren’t worth blogging about. As always you can stay abreast of all changes by following the issues on Github

Follow Blog via Email

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 3 other subscribers