Tutorial

JZarr provides classes and functions to handle N-dimensional arrays data whose data can be divided into chunks and each chunk can be compressed.

General Information

In the JZarr API, data inputs and outputs are allways one-dimensional arrays of primitive Java types double, float, long, int, short, byte. Users have to specify the N-dimensionality of the data by a shape parameter requested by many of the JZarr API operations.

To read or write data portions to or from the array, a shape describing the portion and an offset is needed. The zarr array offsets are zero-based (0).

For Example:
If you need to write data to the upper left corner of a 2 dimensional zarr array you have to use an offset of new int[]{0, 0}.

Note

All data persisted using this API can be read in with the Python zarr API without limmitations.

If you are already familiar with the Python zarr package then JZarr provide similar functionality, but without NumPy array behavior.

If you need array objects which behave almost like NumPy arrays you can wrap the data using ND4J INDArray from deeplearning4j.org. You can find examples in the data writing and reading examples below.

Alternatively you can use ucar.ma2.Array from netcdf-java Common Data Model to wrap the data.

Creating an array

JZarr has several functions for creating arrays. For example:

ZarrArray jZarray = ZarrArray.create(new ArrayParams()
        .shape(10000, 10000)
        .chunks(1000, 1000)
        .dataType(DataType.i4)
);

A System.out.println(array); then creates the following output

com.bc.zarr.ZarrArray{shape=[10000, 10000], chunks=[1000, 1000], dataType=i4, fillValue=0, compressor=zlib/level=1, store=InMemoryStore, byteOrder=BIG_ENDIAN}

The code above creates a 2-dimensional array of 32-bit integers with 10000 rows and 10000 columns, divided into chunks where each chunk has 1000 rows and 1000 columns (and so there will be 100 chunks in total).

For a complete list of array creation routines see the array creation module documentation.

Writing and reading data

This example shows how to write and read a region to an array.

Creates an array with size [5 rows, 7 columns], with data type int and with a fill value of -9999.

ZarrArray array = ZarrArray.create(new ArrayParams()
        .shape(5, 7)
        .dataType(DataType.i4) // integer data type
        .fillValue(-9999)
);

Prepare the data which should be written to the array with a shape of [3, 5] and an offset of [1, 1].

// define the data which should be written
int[] data = {
        11, 12, 13, 14, 15,
        21, 22, 23, 24, 25,
        31, 32, 33, 34, 35
};
int[] shape = {3, 5}; // the actual N-D shape of the data
int[] offset = {2, 0}; // and the offset into the original array

Write the prepared data.

array.write(data, shape, offset);

Read the entire data from the array.

int[] entireData = (int[]) array.read();

Print out the data read.

OutputHelper.Writer writer = out -> {
    DataBuffer buffer = Nd4j.createBuffer(entireData);
    out.println(Nd4j.create(buffer).reshape('c', array.getShape()));
};

Creates the following output

[[     -9999,     -9999,     -9999,     -9999,     -9999,     -9999,     -9999], 
 [     -9999,     -9999,     -9999,     -9999,     -9999,     -9999,     -9999], 
 [        11,        12,        13,        14,        15,     -9999,     -9999], 
 [        21,        22,        23,        24,        25,     -9999,     -9999], 
 [        31,        32,        33,        34,        35,     -9999,     -9999]]

The output displays that the data written before (written with an offset of [1, 1]) is surrounded by the fill value -9999.

Note

Nd4j is not part of the JZarr library. It is only used in this showcase to demonstrate how the data can be used.

Persistent arrays

In the examples above, compressed data (default compressor) for each chunk of the array was stored in main memory. JZarr arrays can also be stored on a file system, enabling persistence of data between sessions. For example:

ZarrArray created = ZarrArray.create("docs/examples/output/example_3.zarr", new ArrayParams()
        .shape(1000, 1000).chunks(250, 250).dataType(DataType.i4).fillValue(-9999)
);

The array above will store its configuration metadata (zarr header .zarray) and all compressed chunk data in a directory called ‘docs/examples/output/example_3.zarr’ relative to the current working directory.

The created zarr header file .zarray written in JSON format.

{
  "chunks": [
    250,
    250
  ],
  "compressor": {
    "id": "zlib",
    "level": 1
  },
  "dtype": ">i4",
  "fill_value": -9999,
  "filters": null,
  "order": "C",
  "shape": [
    1000,
    1000
  ],
  "zarr_format": 2
}

Write some data to the created persistent array.

created.write(42, new int[]{3, 4}, new int[]{21, 22});

Note

There is no need to close an array. Data are automatically flushed to disk, and files are automatically closed whenever an array is modified.

Then we can reopen the array and read the data

ZarrArray opened = ZarrArray.open("docs/examples/output/example_3.zarr");
int[] redShape = {5, 6};
final int[] data = (int[]) opened.read(redShape, new int[]{20, 21});

Which now looks like:

[[     -9999,     -9999,     -9999,     -9999,     -9999,     -9999], 
 [     -9999,        42,        42,        42,        42,     -9999], 
 [     -9999,        42,        42,        42,        42,     -9999], 
 [     -9999,        42,        42,        42,        42,     -9999], 
 [     -9999,     -9999,     -9999,     -9999,     -9999,     -9999]]

Resizing and appending

Currently not implemented.

Compressors

A number of different compressors can be used with JZarr. Different compressors can be provided via the compressor keyword argument accepted by all array creation functions. For example:

ZarrArray jZarray = ZarrArray.create(new ArrayParams()
        .shape(243, 324, 742)  // three or more dimensions
        .compressor(CompressorFactory.create("zlib", 8)) // 8 : compression level
);

Note

In this very beginning phase we only implemented the zlib compressor. More compressors will be implemented in the future.

Additionally, in the future, developers should be able to register their own Compressors in the CompressorFactory. A compressor must extend the abstract Compressor class.

Filters

Currently not implemented.

Groups

JZarr supports hierarchical organization of arrays via groups. As with arrays, groups can be stored in memory, on disk, or via other storage systems that support a similar interface.

To create a group, use the zarr.group() function: