TechAscent - Delightful Software Solutions
2018-10-22

Native Pointers: Playing Well with Others

The JVM ecosystem is enormous and includes a lot of good code. But everyone reaches a point in life when it becomes time to venture out into the wider world and start to explore. Of course, the wider world is a scary place with dangers that do not exist within the padded walls of the JVM. Pointers can equal zero (eek!).

For many specialized or performance sensitive tasks consumption of libraries from other ecosystems provides huge leverage. Historically, Java has not done an exceptional job of building the bridge to code with a C-level interface; and there is lots and lots of great code with such an interface. JNI is clumsy, especially when compared to Haskell or C# where consuming C code is a matter of including the header file you want and off you go.

One effort to improve the situation for Java is JavaCPP, which attempts to provide a structured bridge between Java and a some small amount of C++. Essentially, it stands on the basis of a single raw pointer type, and then builds on this basis by providing some generic operations on those pointers (e.g., get/set the base address). JavaCPP's overall strategic direction is sound, because it ignores specific library binding details, and instead builds a common platform on which a binding to any library can theoretically be built.

We use JavaCPP, and our lowest level bindings can be seen at tech.javacpp-datatype.

Generally speaking, we aim to improve the experience of using C libraries in Clojure. Doing so will enable us to leverage C libraries to solve challenging and interesting problems.

What could possibly go wrong?

user> (require '[tech.datatype :as dtype])
nil
user> (require '[tech.datatype.javacpp :as jcpp-dtype])
nil
user> ;;Now we load an actual jcpp library (opencv) to load
user> ;;code stubs that aren't in the base library.
user> (import '[org.bytedeco.javacpp opencv_core])
#<Class@33ef751b org.bytedeco.javacpp.opencv_core>
user> (println opencv_core/ACCESS_FAST)
67108864
nil
user> (def crashy (-> (jcpp-dtype/make-empty-pointer-of-type :float32)
                      (jcpp-dtype/offset-pointer 100)
                      (jcpp-dtype/set-pointer-limit-and-capacity 100)
                      (dtype/copy! (float-array 100))))
[nREPL] Connection closed unexpectedly (connection broken by remote peer)

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007fbf994c7c97, pid=5932, tid=6499
#
# JRE version: OpenJDK Runtime Environment (10.0.2+13) (build 10.0.2+13-Ubuntu-1ubuntu0.18.04.2)
# Java VM: OpenJDK 64-Bit Server VM (10.0.2+13-Ubuntu-1ubuntu0.18.04.2, mixed mode, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0xcecc97]
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P" (or dumping to /home/chrisn/dev/blog/native-pointers/core.5932)
#
# An error report file with more information is saved as:
# /home/chrisn/dev/blog/native-pointers/hs_err_pid5932.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
#
Subprocess failed

Ahh, that was great. Moving on...

Something Useful

We hope the libraries we build are useful for more than dumping core. Here is an example of moving fluidly between collections of ints and floats, a common operation when working with varying image formats:

user> (require '[tech.datatype :as dtype])
nil
user> (require '[tech.datatype.javacpp :as jcpp-dtype])
nil
user> (import '[org.bytedeco.javacpp opencv_core])
#<Class@33ef751b org.bytedeco.javacpp.opencv_core>
user> (println opencv_core/ACCESS_FAST)
67108864
nil
user> (def float-ptr (jcpp-dtype/make-pointer-of-type :float32 20))
#'user/float-ptr
user> (def long-data (dtype/make-array-of-type :int64 (range 20)))
#'user/long-data
user> (dtype/copy! long-data float-ptr)
#<org.bytedeco.javacpp.FloatPointer@6f3e23a1 org.bytedeco.javacpp.FloatPointer[address=0x7fb9780d2d10,position=0,limit=20,capacity=20,deallocator=org.bytedeco.javacpp.Pointer$NativeDeallocator[ownerAddress=0x7fb9780d2d10,deallocatorAddress=0x7fb969d3ece0]]>
user> (def byte-data (dtype/make-array-of-type :int8 20))
#'user/byte-data
user> (-> (dtype/copy! float-ptr byte-data)
          vec)
[0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19]

Data interchange between the JVM and native pointers is now fast and easy. We covered why it is fast in our post about our datatype library.

Moreover, typed pointers provide access to unsigned datatypes (also often seen when working with images):

user> (range 255 235 -1)
(255 254 253 252 251 250 249 248 247 246 245 244 243 242 241 240 239 238 237 236)
user> (jcpp-dtype/make-typed-pointer :uint8 (range 255 235 -1))
#tech.datatype.javacpp.TypedPointer
{:datatype :uint8,
 :ptr #<org.bytedeco.javacpp.BytePointer@623899d7 org.bytedeco.javacpp.BytePointer[address=0x7f6fd0be7bd0,position=0,limit=20,capacity=20,deallocator=org.bytedeco.javacpp.Pointer$NativeDeallocator[ownerAddress=0x7f6fd0be7bd0,deallocatorAddress=0x7f6fbb258d60]]>}
user> (def typed-ptr *1)
#'user/typed-ptr
user> (def result-data (short-array 20))
#'user/result-data
user> (dtype/copy! typed-ptr result-data)
#<[S@31ed8324>
user> (dtype/->vector typed-ptr)
[255 254 253 252 251 250 249 248 247 246 245 244 243 242 241 240 239 238 237 236]
user> (dtype/->vector result-data)
[255 254 253 252 251 250 249 248 247 246 245 244 243 242 241 240 239 238 237 236]
user> (dtype/->vector (jcpp-dtype/->ptr-backing-store typed-ptr))
[-1 -2 -3 -4 -5 -6 -7 -8 -9 -10 -11 -12 -13 -14 -15 -16 -17 -18 -19 -20]

Typed pointers are stored with no overhead at the native data size, but still provide mechanisms (see: tech.datatype.java-unsigned) for automatically and correctly converting between different datatypes. By default, conversion is safe/checked; unchecked conversions are available with an optional argument.

Limitations With this Approach

The garbage collector in the JVM does not even try to track these kinds of pointers. Cavalier usage of native datatypes can lead to hard crashes (as seen above). Crashes are also seen when the JVM thinks it was allocated more space (-Xmx) than the machine is able or willing to provide. This has happened in production; if you see process died with error 137, then you may be getting OOM killed. At the very least use -Xmx and -Xms for more assurance.

Now, in the wider world, there is a resource management system that is both battle tested and built for this type of thing. It is called Scope Based Resource Management, or RAII. We knew we needed something like this in Clojure in order to to do serious interop with native code.

So, we built tech.resource. It functions similar to with-open but by using protocols it enables binding to anything, not just objects that implement .close. It is simple: A resource context is a list of resources, and unwinding merely calls release on everything, in reverse of the order they were added.

user> (require '[tech.resource :as resource])
nil
user> (resource/with-resource-context
        (let [flt-data (jcpp-dtype/make-pointer-of-type :float32 (range 10))
              int-ary (int-array (dtype/ecount flt-data))]
          (dtype/copy! flt-data int-ary)))
#<[I@185d8bdb>
user> (vec *1)
[0 1 2 3 4 5 6 7 8 9]

user> (resource/with-resource-context
        (let [flt-data (float-array (range 10))
              int-data (jcpp-dtype/make-typed-pointer :int32 10)]
          (dtype/copy! flt-data int-data)))
#tech.datatype.javacpp.TypedPointer
{:datatype :int32,
 :ptr #<org.bytedeco.javacpp.IntPointer@675a2901 org.bytedeco.javacpp.IntPointer[address=0x0,position=0,limit=10,capacity=10,deallocator=null]>}
user> (dtype/->vector *1)
NullPointerException   org.bytedeco.javacpp.IntPointer.asBuffer (IntPointer.java:195)

user> (defrecord MyResource []
        resource/PResource
        (release-resource [me]
          (println "Released!!")))
#<Class@4303b490 user.MyResource>
user> (resource/with-resource-context
        (resource/track (->MyResource)))
Released!!
#user.MyResource {}

What do you suppose would happen if you created a float pointer in a resource context that unwound, but the pointer somehow escaped that context? Maybe nothing, but maybe also heap corruption, JVM classtable corruption, and/or instantaneous and unceremonious program halt.

Because of this, we use the resource system to build safe Clojure and JVM datatype pathways into and out of the parts of our system that depend on native interop. The provided efficient conversions to bulk storage systems minimize reasons to expose the 'native-ness' of systems outside well-tested internal internal contexts.

Some Hidden Truths

Perhaps less than expected

There is not much code in our JavaCPP binding of our datatype library. Happily, JavaCPP pointers are already convertible to nio buffers. Much of the datatype library is already built for nio buffers, thus we only need some utility methods for pointer manipulation:

(extend-type Pointer
  resource/PResource
  (release-resource [ptr] (release-pointer ptr))
  dtype-base/PAccess
  (set-value! [ptr ^long offset value] (dtype-base/set-value! (ptr->buffer ptr)
                                                              offset value))
  (set-constant! [ptr offset value elem-count]
    (dtype-base/set-constant! (ptr->buffer ptr) offset value elem-count))
  (get-value [ptr ^long offset] (dtype-base/get-value (ptr->buffer ptr) offset))
  mp/PElementCount
  (element-count [ptr] (.capacity ptr))
  dtype-base/PContainerType
  (container-type [ptr] :typed-buffer)
  dtype-base/PCopyRawData
  (copy-raw->item! [raw-data ary-target target-offset options]
    (dtype-base/copy-raw->item! (ptr->buffer raw-data) ary-target
                                target-offset options))

  PToPtr
  (->ptr-backing-store [item] item)

  primitive/PToBuffer
  (->buffer-backing-store [src]
    (ptr->buffer src))

  primitive/PToArray
  (->array [src] nil)
  (->array-copy [src] (primitive/->array-copy (unsigned/->typed-buffer src))))
Perhaps way more than expected

JavaCPP has an extensive and impressive set of bindings. Undoubtedly, it provides a powerful way to bind the JVM ecosystem to C code. Further connection to our datatype library simplifies interacting with native numeric data.

These connections help in three common cases:

  1. No bindings, only C source or shared library: Use JavaCPP directly.

  2. Bindings exist but do not use JavaCPP: Implementing protocols enables participation in datatype and resource management. Construct JavaCPP pointers from a tuples of (address, length, datatype).

  3. Bindings exist and do use JavaCPP: This may just work, or you may have to implement a few protocols (in a future post we will explore binding to opencv).


At TechAscent, we dig leverage.

Contact us

Make software work for you.

Get In Touch