TechAscent - Delightful Software Solutions

Our Datatype Library

Numeric Transmogrification

What could be more humble than copying a sequence of numbers from one place to another? Who in their right mind would invest any energy or thought into such a thing? Why is it often such a pain in the ass?

This post is about about contiguous collections of primitive datatypes only, so no structs, just contiguous collections of numbers. These arise often when doing image processing, interfacing between various systems, or especially in neural net toolkits such as Cortex. Typical native interfaces for numeric computing expose functionality via pointers to packed buffers of a single primitive datatype.

We often must efficiently convert between a buffer of one type of data into another. For example, byte into float or perhaps an image of unsigned bytes into unsigned shorts. Doing this naively, in a type-unaware fashion, can inordinately increase program runtime, while creating special cases for each pair of source and destination type can result in a combinatorial explosion of potential operations.

Moreover, the Java virtual machine natively supports a paltry number of datatypes. One often needs additional ones. Image processing requires unsigned types. Contemporary graphics and machine learning (GPU tasks) derive dramatic benefit from the use of half-float and (sometimes) byte-float types. We want to get and set values in collections of these types, as well as bulk-copy contiguous regions.

Finally, Java has 2 incongruent ways to store a packed buffer of primitives; arrays and nio buffers. There are potentially surprising inconsistencies. For example, If you create a nio buffer from an array then it is convertible back to the original array, but of you create a nio buffer from a native pointer then it is no longer convertible to an array.

Also, stuff like this:

user> (import '[java.nio ByteBuffer])
#<Class@1fb3ebeb java.nio.ByteBuffer>
user> (def test-array (byte-array (range 5)))
user> (def test-buffer (ByteBuffer/wrap test-array))
user> (.put test-buffer 0 (byte 100))
#<java.nio.HeapByteBuffer@6141b603 java.nio.HeapByteBuffer[pos=0 lim=5 cap=5]>
user> (vec test-array)
[100 1 2 3 4]
user> ;;creating a view requires mutation of the original object
user> (.position test-buffer 4)
#<java.nio.HeapByteBuffer@6141b603 java.nio.HeapByteBuffer[pos=4 lim=5 cap=5]>
user> (def slice-buffer (.slice test-buffer))
user> (.put slice-buffer 0 (byte 20))
#<java.nio.HeapByteBuffer@225dd267 java.nio.HeapByteBuffer[pos=0 lim=1 cap=1]>
user> (vec test-array)
[100 1 2 3 20]
user> (vec (.array slice-buffer))
[100 1 2 3 20]
user> ;;WTF?  Array always points to the original array; no way to make a sub-array
user> ;;Even with nio buffer magic

The tech.datatype library provides a platform to manage the combinatorial explosion of operations required for efficient conversion between storage types and datatypes, along the ability to make a true view of any storage buffer. Clojure's metaprogramming facilities make it an ideal language for this, expressing these sorts of ideas in Clojure is relatively simple. That being said, it is still metaprogramming, which is not for the feint of heart.

The conversion must also be correct for arbitrary datatypes and thus it cannot rely entirely on built-in functionality. Clojure itself adds in the notion of checked and unchecked conversions between numeric types. Better would be an extensible mechanism for storing and either safely or efficiently converting between arbitrary datatypes. Ultimately, best would be to enable external consumers of the library to extend this system, including efficient conversions where available.

The Core of the Problem

user> (require '[tech.datatype :as dtype])

user> (def byte-data (dtype/make-array-of-type :int8 [1 2 3 4 5]))
user> (dtype/->vector byte-data)
[1 2 3 4 5]
user> (def float-buffer (dtype/make-buffer-of-type :float32 5))
user> (dtype/->vector float-buffer)
[0.0 0.0 0.0 0.0 0.0]
user> (dtype/copy! byte-data float-buffer)
#<java.nio.HeapFloatBuffer@4c1c9fd java.nio.HeapFloatBuffer[pos=0 lim=5 cap=5]>
user> (dtype/->vector float-buffer)
[1.0 2.0 3.0 4.0 5.0]
user> (dtype/copy-raw->item! [5 6 6] byte-data 0 {})
[#<[B@66bc77f8> 3]
user> (dtype/->vector (first *1))
[5 6 6 4 5]
user> (dtype/->vector byte-data)
[5 6 6 4 5]

Believe it or not, all of these operations hit fast paths. So at each stage, we only run fully-typed fast for loops using aget/aset for arrays, typecasts where necessary, and .put, .get for the nio buffers.

This makes the data a lot more fluid. There is are anything->packed and (dtype/copy-raw->item! anything item offset) operations, so you can get data into the packed buffers quickly and easily, and you can always use vec on an array to get Clojure data back. One place this is immediately useful is getting data into core.matrix and then back out into whatever packed buffer you may need.

What about safety?

user> (def byte-data (dtype/make-array-of-type :int8 [128 129 130]))
CompilerException java.lang.IllegalArgumentException: Value out of range for byte: 128, compiling:(form-init10868090203833027614.clj:69:22)
user> (def byte-data (dtype/make-array-of-type :int8 [128 129 130] {:unchecked? true}))
user> (dtype/->vector byte-data)
[-128 -127 -126]
user> (def short-data (short-array [1000 2000 3000]))
user> (dtype/copy! short-data byte-data)
IllegalArgumentException Value out of range for byte: 1000  clojure.lang.RT.byteCast (
user> (dtype/copy! short-data 0 byte-data 0 (dtype/ecount short-data) {:unchecked? true})
user> (dtype/->vector byte-data)
[-24 -48 -72]


And what about the compute cost of those checks? And how optimized is it over a generic pathway that calls get-value/set-value protocol functions?

chrisn@chrisn-lt-2:~/dev/tech.datatype$ lein test
WARNING: cast already refers to: #'clojure.core/cast in namespace: tech.datatype.base, being replaced by: #'tech.datatype/base/cast
WARNING: cast already refers to: #'clojure.core/cast in namespace: tech.datatype, being replaced by: #'tech.datatype/cast

lein test tech.datatype-test
{:array-copy "Elapsed time: 89.46547 msecs"
 :buffer-copy "Elapsed time: 287.270626 msecs"
 :dtype-copy "Elapsed time: 285.719587 msecs"
 :unchecked-dtype-copy "Elapsed time: 107.709031 msecs"
 :raw-copy "Elapsed time: 107.892818 msecs"
 :generic-copy "Elapsed time: 2102.287243 msecs"}

lein test tech.datatype.unsigned-test

Ran 7 tests containing 154 assertions.
0 failures, 0 errors.

The Un-Cola

The library also contains support for unsigned types via typed-buffer in the namespace. Unsigned types are exposed in the public interface via typed buffers:

(dtype/make-typed-buffer :uint8 [255 254 253])
{:buffer #<java.nio.HeapByteBuffer@b7aaba1 java.nio.HeapByteBuffer[pos=0 lim=3 cap=3]>,
 :dtype :uint8}
user> (def ubyte-buf *1)
user> (dtype/->vector ubyte-buf)
[255 254 253]
user> (dtype/copy! ubyte-buf byte-data)
IllegalArgumentException Value out of range for byte: 255  clojure.lang.RT.byteCast (
user> (dtype/copy! ubyte-buf 0 byte-data 0 3 {:unchecked? true})
user> (dtype/->vector byte-data)
[-1 -2 -3]
user> (def short-data (short-array [256 254 253]))
user> (dtype/copy! short-data ubyte-buf)
ExceptionInfo Value out of range  clojure.core/ex-info (core.clj:4739)
user> (dtype/copy! short-data 0 ubyte-buf 0 3 {:unchecked? true})
{:buffer #<java.nio.HeapByteBuffer@b7aaba1 java.nio.HeapByteBuffer[pos=0 lim=3 cap=3]>,
 :dtype :uint8}
user> (dtype/->vector ubyte-buf)
[0 254 253]

Many Bothans died to bring us this information. Cue dramatic music.

Note that the ubyte data is stored as a bytebuffer nio buffer. This allows interchange with native libraries (like opencv, for instance) that store data with unsigned datatypes.

Abandon Hope All Ye Who Enter Here

Going a bit further, and considering implementation details, say we want to generate a map of typed functions. A good starting point would be a map of type keywords to functions that will cast anything to that type. The current implementation goes about it in this way:

  1. Define an array that you will iterate.
  2. Define a compile time expression that will produce the cast.
  3. Define a macro that iterates the array, calls the expression and produces your map.
  4. Def something to that macro:

user> (def datatypes [:int8 :int16 :int32])
user> (defmacro dtype->cast
        [dtype item]
        (condp = dtype
          :int8 `(unchecked-byte ~item)
          :int16 `(unchecked-short ~item)
          :int32 `(unchecked-int ~item)))
user> (def test-val 10)
user> (dtype->cast :int8 test-val)
user> (type *1)
#<Class@2437c6dc java.lang.Byte>
user> (defmacro cast-map
        (->> (for [dtype datatypes]
               [dtype `(fn [item#]
                         (dtype->cast ~dtype item#))])
             (into {})))

user> (def casters (cast-map))
user> casters
{:int16 #<Fn@79444d1f user/fn__23468>,
 :int32 #<Fn@7eddde6c user/fn__23470>,
 :int8 #<Fn@ad44a6 user/fn__23466>}

Such macros are interesting because some of their code is definitely evaluated at compile time but they produce runtime values. In this way, we produce conversion functions built specifically for the conversion between these types:

user> (defmacro fancy-cast-map
        ;;Use for to do cartesian join
        (->> (for [dtype datatypes
                   dest-dtype datatypes]
               [[dtype dest-dtype]
                `(fn [item#]
                   (->> (dtype->cast ~dtype item#)
                        (dtype->cast ~dest-dtype)))])
             (into {})))

user> (def fancy-cast (fancy-cast-map))
user> fancy-cast
{[:int16 :int16] #<Fn@4a03d0cc user/fn__23533>,
 [:int16 :int32] #<Fn@23b20082 user/fn__23531>,
 [:int16 :int8] #<Fn@799ada4c user/fn__23537>,
 [:int32 :int16] #<Fn@444855df user/fn__23539>,
 [:int32 :int32] #<Fn@1099782a user/fn__23523>,
 [:int32 :int8] #<Fn@239cccab user/fn__23525>,
 [:int8 :int16] #<Fn@c7b3a2e user/fn__23529>,
 [:int8 :int32] #<Fn@a1b60d3 user/fn__23535>,
 [:int8 :int8] #<Fn@51181192 user/fn__23527>}
user> ((get fancy-cast [:int32 :int16]) 4)
#<Class@1f89ab83 java.lang.Short>

Now imagine those stand for array types; you may begin to see how to build specialized conversion functions between array types. Similar abstraction accounts for the additional dimension of container types. Moreover, a dynamic lookup mechanism enables other types to participate if they can be efficiently converted to a known container type. In that case you do not have to add any conversion functions but only the conversion of the container itself.

As far as creating type-specific, optimized code, this is about the best you can do without involving a separate compilation process. Much later in this series of posts we will bind to a compiler and generate extremely fast relatively arbitrary code.

Clojure imposes an interesting limitation here. Each one of those functions gets its own Java class. So it may be possible to build out a large nested case comparison and do the lookup at runtime, to all inlined code, in order to minimize jar size. Doing so may hit the limit on the size of a function in java. An interesting trade-off bubbling up from Clojure's Java symbiosis for sure.

At TechAscent, we do this kind of thing exactly once, and then the computer does it for us, automatically, forever.

Contact us and we can do the same for your business.

Make software work for you.

Get In Touch