*Karta* provides a set of tools for analysing geographical data. The organization of *Karta* is around a set of classes for representing vector and raster data. These classes contain built-in methods for common tasks, and are easily extended for more specialized processing. This tutorial provides a brief introduction to the elements of *Karta*.

Should you come across any mistakes, please file a bug or submit a pull request on Github!

The following examples are shown using Python 3, however *Karta* is supported on Python 2.7+ and Python 3.4+.

**Vector data** are data that can be treated as a set of connected or disconnected vertices. Examples might be road networks, a set of borders, geophysical survey lines, or the path taken by a bottle floating in an ocean current. In *Karta*, these data are classified as belonging to *Point*, *Line* and *Polygon* classes, and their *Multipart* equivalents *Multipoint*, *Multiline*, and *Multipolygon*. Some questions that might be asked
of vector data include

- which of these points are contained in this Polygon?
- do these Lines intersect, and where?
- what is the average distance travelled by a particle?
- what municipalities does this river flow through?

**Raster data** is data that are typically thought of in terms of pixels or a grid of values covering a surface. Examples might be an elevation model, a satellite image, or an upstream area map. Depending on what the data represents, one might

- compute slope, aspect, and hillshades on an elevation model
- resample or interpolate a grid
- mask a land cover map according to management boundaries
- apply a pansharpening algorithm to multispectral satellite imagery
- extract an elevation profile along a path

The term **coordinate reference system** refers to a system of relating numerical coordinates to actual positions on Earth. *Karta* includes methods for geodetic calculations and basic support of projected and geographical coordinates, as well as coordinate system classes backed by *pyproj*.

This section demonstrates the creation and manipulation of vector data.

In [1]:

```
from karta import Point, Line, Polygon, Multipoint, Multiline, Multipolygon
```

The `Point`

, `Line`

, and `Polygon`

classes can all be instantiated by providing vertices, and optionally, associated properties. The *Multipart* `Multipoint`

, `Multiline`

, and `Multipolygon`

classes are similar, and can additionally include part-specific tabular metadata.

In [2]:

```
pt = Point((-123.1, 49.25))
print(pt)
```

In [3]:

```
mpt = Multipoint([(-122.93, 48.62),
(-123.10, 48.54),
(-122.90, 48.49),
(-122.81, 48.56)],
data={"color": ["red", "blue", "green", "yellow"],
"value": [2, 1, 3, 5]})
print(mpt)
print(mpt.data)
```

In [4]:

```
line = Line([(-124.35713, 49.31437),
(-124.37857, 49.31720),
(-124.39442, 49.31833),
(-124.40311, 49.31942),
(-124.41052, 49.32203),
(-124.41681, 49.32477),
(-124.42278, 49.32588)])
print(line)
```

In [5]:

```
poly = Polygon([(-25.41, 67.03),
(-24.83, 62.92),
(-12.76, 63.15),
(-11.44, 66.82)])
print(poly)
```

Each geometrical object now contains a vertex/vertices in a cartesian plane.

We may be interested in determining whether the `Point`

created about is within the `Polygon`

:

In [6]:

```
print(poly.contains(pt)) # False
```

In [7]:

```
# but this one is:
pt2 = Point((-25, 65))
print(poly.contains(pt2)) # True
```

We also can test whether the `Line`

from above crosses the `Polygon`

:

In [8]:

```
print(line.intersects(poly)) # False
```

Or compute the shortest distance between the `Point`

and the `Line`

:

In [9]:

```
print(line.shortest_distance_to(pt))
```

There are methods for computing the nearest vertex to an external point, or the nearest point on an edge to an external point:

In [10]:

```
pt = Point((0.0, 60.0))
print(poly.nearest_vertex_to(pt))
print(poly.nearest_on_boundary(pt))
```

The positions of objects with multiple vertices can be sliced and iterated through:

In [11]:

```
subline = line[2:-2]
print(subline)
for pt in subline:
print(pt)
```

A slice that takes part of a polygon returns a line.

In [12]:

```
print(poly[:2])
```

Points have a `distance()`

method that calculates the distance to another point.

In [13]:

```
pt = Point((-123.1, 49.25))
pt2 = Point((-70.66, 41.52))
print(pt.distance(pt2))
```

By default, geometries in Karta use a planar cartesian coordinate system. If our positions are meant to be geographical coordinates, then we can provide the `crs`

argument to each geometry at creation, as in

In [14]:

```
from karta.crs import LonLatWGS84
pt = Point((-123.1, 49.25), crs=LonLatWGS84)
pt2 = Point((-70.66, 41.52), crs=LonLatWGS84)
pt.distance(pt2)
```

Out[14]:

which now gives the great circle distance between point on the Earth, in meters. We can mix coordinate systems to some degree, with *Karta* performing the necessary transformations in the background:

In [15]:

```
from karta.crs import WebMercator
pt_web = Point((-14000000, 6300000), crs=WebMercator)
print(pt.distance(pt_web)) # distance in coordinate system units of *pt*
```

When the coordinate system is specified, all geometrical methods obey that coordinate system. We can use this to perform queries, such which American state capitols are within 2000 km of Mexico City?

In [16]:

```
from karta.examples import us_capitols
mexico_city = Point((-99.13, 19.43), crs=LonLatWGS84)
# Filter those within 2000 km of Mexico City
nearby = list(filter(lambda pt: pt.distance(mexico_city) < 2000e3, us_capitols))
for capitol in nearby:
print("{0:4.0f} km {1}".format(mexico_city.distance(capitol)/1e3, capitol.properties["n"]))
```

In [17]:

```
# Or, list capitols from nearest to furthest from Mexico City
distances = map(lambda pt: mexico_city.distance(pt), us_capitols)
distances_capitols = sorted(zip(distances, us_capitols))
for d, pt in distances_capitols:
print("{km:.0f} km {name}".format(km=d/1e3, name=pt.properties["n"]))
```

All of the above calculations are performed on a geoid. The `LonLatWGS84`

coordinate system means to use geographical (longitude and latitude) coordinates on the WGS 84 ellipsoid.

By using the `data`

keyword argument, additional data can be associated with a *Multipart* vector geometry. The data can be a list or a dictionary of lists.

In [18]:

```
mp = Multipoint([(1, 1), (3, 1), (4, 3), (2, 2)],
data={"species": ["T. officianale", "C. tectorum",
"M. alba", "V. cracca"]})
```

These data live in the `.data`

attribute, which is a `Table`

instance. For convenience, the data can also be accessed via the `.d`

attribute, which provides a streamlined syntax supporting key-lookups, indexing, and slicing.

In [19]:

```
mp.d
```

Out[19]:

In [20]:

```
mp.d["species"]
```

Out[20]:

In [21]:

```
mp.d[1:3]
```

Out[21]:

The data are propagated through indexing operations on their parent geometry:

In [22]:

```
pt = mp[2]
print(pt, "-", pt.properties["species"])
```

Geometry-level metadata at the geometry level can be provided using the `properties`

keyword argument, which accepts a dictionary. Derived geometries carry the properties of their parent geometry.

In [23]:

```
poly = Polygon([(-25.41, 67.03),
(-24.83, 62.92),
(-12.76, 63.15),
(-11.44, 66.82)],
properties={"geology": "volcanic",
"alcohol": "brennivin"})
print(poly[0:3].properties)
```

The `get_coordinate_lists`

method and `coordinates`

attribute provide lists of coordinates for plotting or data export.

Higher-level plotting operations are provided by the separate `karta.plotting`

submodule, not described here.

In [24]:

```
import matplotlib.pyplot as plt
%matplotlib inline
plt.plot(*line.coords())
```

Out[24]:

Data can be read from and written to several common formats, including ESRI shapefiles (through bindings to the *pyshp* module), GeoJSON, and GPX. Convenience functions are kept in the `karta.vector.read`

namespace.

Each geometry has appropriate methods to write data:

In [25]:

```
line.to_shapefile("line.shp")
pt_web.to_geojson("point.geojson")
```

Raster data are primarily represented by the `karta.RegularGrid`

class. `RegularGrid`

instances have a CRS, a Null-data value, a geotransform, and one or more *bands*, which containing the actual data.

To provide flexibility, different band classes are provided by `karta.raster.bands`

using different strategies for data storage.

- The simplest case,
`SimpleBand`

, uses a numpy array to store all data. This makes it reasonably fast, but can be memory-hungry with large rasters. - The default case,
`CompressedBand`

, uses chunking and compression via the*blosc*library to reduce the memory footprint of the raster data at a small speed cost. `GdalFileBand`

reads data directly from a valid GDAL datasource, using the least memory but performing the slowest.

Note:

`GdalFileBand`

doesn't currently handle all raster operations supported by the other band types. Many operations implicitly convert to in-memory`CompressedBand`

representation.

In [26]:

```
import numpy as np
from karta.raster import RegularGrid, SimpleBand, CompressedBand, read_gtiff
ls8 = read_gtiff("LC08_L1TP_011031_20180930_20181010_01_T1_B8.TIF")
print(ls8.bands) # list of one CompressedBand instance
```

In [27]:

```
# Print grid dimensions
print(ls8.size)
```

In [28]:

```
# Print grid extent
print(ls8.extent())
```

In [30]:

```
# Visualize data
plt.imshow(ls8[::10,::10, 0], origin="bottom", extent=ls8.extent(), cmap=plt.cm.binary, vmin=3e3, vmax=10e3)
plt.colorbar()
```

Out[30]:

When opening or creating a `RegularGrid`

, a non-default band type can be specified as a keyword argument. The following code re-opens the same grid as a `SimpleBand`

and verifies that all data are the same.

In [31]:

```
ls8_numpy = read_gtiff("LC08_L1TP_011031_20180930_20181010_01_T1_B8.TIF", bandclass=SimpleBand)
np.all(ls8[:,:] == ls8_numpy[:,:]) # True
```

Out[31]:

In the above, the slice syntax `[:,:]`

is used to get an array of all grid data. Because the grid `ls8`

has only a single band in this case, the data array has two dimensions. The normal simple slicing rules apply, i.e. one can do things like:

In [32]:

```
subgrid = ls8[2000:3000, 4000:4500]
print(subgrid.shape)
```

In [33]:

```
every_other = ls8[::2, ::2]
print(every_other.shape)
```

In [34]:

```
ls8.transform
```

Out[34]:

Grid geolocation is based on an affine matrix transformation represented by the `.transform`

attribute, as well as an associated coordinate system under the `.crs`

attribute. The extent and bounding box of the grid can be retrieved using the respective properties:

In [35]:

```
print(ls8.bbox())
print(ls8.extent())
```

The `extent`

is of the form *(xmin, xmax, ymin, xmax)*, and refers to grid centers. The `bbox`

is of the form *(xmin, ymin, xmax, ymax)*, and refers to grid edges.

To get swaths of grid coordinates conveniently and in arbitrary coordinate systems, used the `.coordinates()`

method, which returns a `CoordinateGenerator`

instance that can be indexed to generate coordinate pairs.

In [36]:

```
coords = ls8.coordinates(crs=LonLatWGS84)
print(coords[0,0])
print(coords[-1,-1])
print(coords[:5,:5])
```

*Nearest-neighbour* and *bilinear* sampling of grid points is supported via the `.sample()`

method. It is also possible to resample the full grid at a new resolution, or to sample along a profile.

In [37]:

```
val = ls8.sample(Point((500000, 8750000), crs=ls8.crs)) # sample a point using the grid CRS
print(val)
```

In [38]:

```
# Resample grid at 100 m postings:
ls8_coarse = ls8.resample(100, 100)
print("Original resolution:", ls8.resolution)
print("Resampled resolution:", ls8_coarse.resolution)
```

In [39]:

```
# Generate a line and sample at intervals along it
transit = Line(zip(np.linspace(350000, 450000), np.linspace(4550000, 4700000)), crs=ls8.crs)
mp, val = ls8.profile(transit)
plt.subplot(2, 1, 1)
x, y = mp.coords()
plt.scatter(x, y, c=val.flatten(), edgecolor="none", cmap=plt.cm.binary)
plt.subplot(2, 1, 2)
dist = Line(mp).cumulength() # convert sample points to a line and extract
# the cumulative distance along it
plt.plot(dist, val[0])
```

Out[39]:

Grids can be trimmed or expanded using the `.resize()`

method, which takes a new bounding box as an argument.

**Note**

When getting raster data, the array provided by slicing is not necessarily a view of the underlying data, and may be a copy instead. Modifying the array is not guaranteed to modify the raster. When the raster data must be replaced by an element-wise computation, use the `Grid.apply(func)`

method, which operates in-place. The `apply`

method may be chained.

```
# Example
grid.apply(lambda x: x**2) \
.apply(np.sin) \
.apply(lambda x: np.where(x < 0.5, grid.nodata, x))
```

This handles nodata pixels automatically. If the raster data must be replaced by arbitrary data, set it explicitly with `Grid[:,:] = ...`

.

```
# Example
grid[:,:] = np.convolve(np.ones([3,3])/9.0, grid[:,:], mode='same')
```

In [40]:

```
ls8_small = ls8.resize((350000, 4600000, 450000, 4650000))
plt.imshow(ls8_small[:,:,0], origin="bottom", extent=ls8_small.extent(), cmap=plt.cm.binary, vmin=3e3, vmax=12e3)
```

Out[40]:

`RegularGrid`

instances¶New `RegularGrid`

instances are created by specifying a geotransform. The geotransform is represented by a tuple of the form

`transform = (xll, yll, dx, dy, sx, sy)`

where `xll`

and `yll`

are the coordinates of the lower left grid corner, `dx`

and `dy`

specify resolution, and `sx`

and `sy`

specify grid skew and rotation.

The following creates an empty global grid with 5 degree resolution, oriented "north-up" and "east-right", and then plots the pixel centers:

In [41]:

```
from karta.crs import LonLatWGS84, GallPetersEqualArea
newgrid = RegularGrid((-180, -80, 5, 5, 0, 0), values=np.zeros((160//5, 360//5)), crs=LonLatWGS84)
# visualize the coordinate positions on a Gall-Peters projection
coords = newgrid.coordinates(crs=GallPetersEqualArea)
x, y = coords[:,:]
_ = plt.plot(x, y, ".k", ms=2)
```

TODO