Skip to content
Snippets Groups Projects
Commit 854e67d8 authored by Aiko Voigt's avatar Aiko Voigt
Browse files

Major restructuring of codei, updates of examples, added benchmarks

parent 378ddcc9
No related branches found
No related tags found
No related merge requests found
Showing with 87876 additions and 158 deletions
dist/
tricco/__pycache__
LICENSE 0 → 100644
This diff is collapsed.
......@@ -4,7 +4,7 @@ A library for computing connected components on an unstructured triangular grid.
Authors:
* Aiko Voigt (University of Vienna)
* Petra Schwer (Otto von Guericke University of Magdeburg)
* Noam von Rotberg (Otto von Guericke University of Magdeburg)
* Nicole Knopf (Karlsruhe Institute of Technology)
* Petra Schwer (Otto von Guericke University of Magdeburg)
%% Cell type:markdown id: tags:
# Notebook for benchmarking runtime of cubulation
Tests are performed on an exclusive compute node of the DKRZ supercomputer Mistral in Hamburg, Germany.
A compute node has the following specs (https://www.dkrz.de/up/systems/mistral/configuration):
* 2x 12-core Intel Xeon E5-2680 v3 (Haswell) @ 2.5GHz
* 24 cores (48 logical CPUs)
* 64 GB main memory
As for the grids we consider limited-area ICON grids that cover a large part of the North Atlantic. They were for example used for the NAWDEX simulations described Senf, F., A. Voigt et al, 2020: Increasing Resolution and Resolving Convection Improve the Simulation of Cloud‐Radiative Effects Over the North Atlantic, JGR Atmospheres. https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2020JD032667.
%% Cell type:markdown id: tags:
Load required packages. Note: adding cc3d to system path is needed because it is required by tricco.
%% Cell type:code id: tags:
``` python
import timeit
import sys
sys.path.append('/pf/b/b380459/connected-components-3d/')
sys.path.append('/pf/b/b380459/BigDataClouds/tricco/')
import tricco
```
%% Cell type:markdown id: tags:
Define the start triangle and radius of outward search as a function of grid resolution.
%% Cell type:code id: tags:
``` python
dict_start = {'80000m': 5783, '40000m': 18538, '20000m': 69309, '10000m': 264792}
dict_radius = {'80000m': 102 , '40000m': 230 , '20000m': 460 , '10000m': 2000 }
```
%% Cell type:code id: tags:
``` python
# path to grid files
gridpath = '/work/bb1018/b380459/NAWDEX/grids/'
# model resolutions under investigation
for res in ['80000m', '40000m', '20000m', '10000m']:
print('Working on resolution of', res)
gridfile = 'icon-grid_nawdex_78w40e23n80n_R'+res+'.nc'
print('Time to read the grid:')
%timeit -r 1 -n 1 tricco.grid_functions.grid = tricco.prepare_grid(model='ICON',path=gridpath, file=gridfile)
print('Time to compute cubulation:')
%timeit -r 1 -n 1 tricco.compute_cubulation(start_triangle=dict_start[res], radius=dict_radius[res], print_progress=False)
```
%% Output
Working on resolution of 80000m
Time to read the grid:
310 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
Time to compute cubulation:
11.7 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
Working on resolution of 40000m
Time to read the grid:
163 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
Time to compute cubulation:
1min 13s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
Working on resolution of 20000m
Time to read the grid:
144 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
Time to compute cubulation:
11min 34s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
Working on resolution of 10000m
Time to read the grid:
312 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
Time to compute cubulation:
%% Cell type:markdown id: tags:
# Notebook for benchmarking runtime of cubulation
Tests are performed on an exclusive compute node of the DKRZ supercomputer Mistral in Hamburg, Germany.
A compute node has the following specs (https://www.dkrz.de/up/systems/mistral/configuration):
* 2x 12-core Intel Xeon E5-2680 v3 (Haswell) @ 2.5GHz
* 24 cores (48 logical CPUs)
* 64 GB main memory
As for the grids we consider limited-area ICON grids that cover a large part of the North Atlantic. They were for example used for the NAWDEX simulations described Senf, F., A. Voigt et al, 2020: Increasing Resolution and Resolving Convection Improve the Simulation of Cloud‐Radiative Effects Over the North Atlantic, JGR Atmospheres. https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2020JD032667.
%% Cell type:markdown id: tags:
Load required packages. Note: adding cc3d to system path is needed because it is required by tricco.
%% Cell type:code id: tags:
``` python
import timeit
import sys
sys.path.append('/pf/b/b380459/connected-components-3d/')
sys.path.append('/pf/b/b380459/BigDataClouds/tricco/')
import tricco
```
%% Cell type:markdown id: tags:
Define the start triangle and radius of outward search as a function of grid resolution.
%% Cell type:code id: tags:
``` python
dict_start = {'80000m': 5783, '40000m': 18538, '20000m': 69309, '10000m': 264792}
dict_radius = {'80000m': 102 , '40000m': 230 , '20000m': 460 , '10000m': 2000 }
```
%% Cell type:code id: tags:
``` python
# path to grid files
gridpath = '/work/bb1018/b380459/NAWDEX/grids/'
# model resolutions under investigation
for res in ['80000m', '40000m', '20000m', '10000m']:
print('Working on resolution of', res)
gridfile = 'icon-grid_nawdex_78w40e23n80n_R'+res+'.nc'
print('Time to read the grid:')
%timeit -r 1 -n 1 tricco.grid_functions.grid = tricco.prepare_grid(model='ICON',path=gridpath, file=gridfile)
print('Time to compute cubulation:')
%timeit -r 1 -n 1 tricco.compute_cubulation(start_triangle=dict_start[res], radius=dict_radius[res], print_progress=False)
```
%% Output
Working on resolution of 80000m
Time to read the grid:
310 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
Time to compute cubulation:
11.7 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
Working on resolution of 40000m
Time to read the grid:
163 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
Time to compute cubulation:
1min 13s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
Working on resolution of 20000m
Time to read the grid:
144 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
Time to compute cubulation:
11min 34s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
Working on resolution of 10000m
Time to read the grid:
312 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
Time to compute cubulation:
Source diff could not be displayed: it is too large. Options to address this: view the blob.
File added
File added
File added
This diff is collapsed.
[build-system]
requires = [
"setuptools>=42",
"wheel"
]
build-backend = "setuptools.build_meta"
setup.py 0 → 100644
from setuptools import setup, find_packages
VERSION = '0.0.4'
DESCRIPTION = 'TriCCo'
LONG_DESCRIPTION = 'TriCCo: a python package for connected component labeling on triangular grids'
setup(
name="tricco",
version=VERSION,
author="Aiko Voigt",
author_email="<aiko.voigt@univie.ac.at>",
description=DESCRIPTION,
long_description=LONG_DESCRIPTION,
url = "https://gitlab.phaidra.org/voigta80/tricco",
install_requires=['numpy', 'xarray', 'connected-components-3d', 'networkx'],
keywords=['python', 'connected components', 'triangular grids'],
classifiers= [
"Intended Audience :: Science/Research",
"Programming Language :: Python :: 3",
"Operating System :: OS Independent",
],
packages=find_packages(exclude=["benchmarks","examples"]),
python_requires=">=3.8",
)
# add higher-level user functions
from tricco.prepare import prepare_grid, prepare_field, prepare_field_lev
from tricco.compute import compute_cubulation, compute_connected_components_2d, compute_connected_components_3d
import numpy as np
import grid_functions as gridf
import cubulation_functions as cubulf
from .grid_functions import get_edges_of_cell
from .cubulation_functions import EnhancedTri, cubing_next_round, shift_coordinates, compute_connected_components, compute_list_of_components
# make_cubulation
# compute_cubulation
# Assign to each triangle a 3d-coordinate called cube coordinate.
# INPUT:
# start_triangle int the triangle index where the cubulatiion algorithm starts
......@@ -14,33 +14,33 @@ import cubulation_functions as cubulf
# np.array([<triangle index>, <cube array>])
# where <cube array> is an array with three entries that encode the cube coordinate of the triangle with <triangle index>.
def make_cubulation(start_triangle, radius, print_progress=False):
def compute_cubulation(start_triangle, radius, print_progress=False):
# initialization
init = cubulf.EnhancedTri(start_triangle, np.array([0,0,0])) # assign coordinates (0,0,0) to start_triangle
init = EnhancedTri(start_triangle, np.array([0,0,0])) # assign coordinates (0,0,0) to start_triangle
cube_coordinates = [np.array([init.triangle, init.cube], dtype=object)]
visited_triangles = [init.triangle]
outmost = [init] # list for EnhancedTri that were added last round and thus lie "at the boundary of discovery"
# colour the edges of start_triangle with 0,1,2
edge_colours = { gridf.get_edges_of_cell(start_triangle)[x] : x for x in (0,1,2) }
edge_colours = { get_edges_of_cell(start_triangle)[x] : x for x in (0,1,2) }
# expand outwards
for n in range(radius):
# update outmost triangles
outmost = cubulf.cubing_next_round(cube_coordinates, visited_triangles, outmost, edge_colours)
outmost = cubing_next_round(cube_coordinates, visited_triangles, outmost, edge_colours)
# optional: show progress
if print_progress: print('Round ', n+1, 'finished. Visited ', len(cube_coordinates), \
'triangles, thereof ', len(outmost), 'new.')
# return coordinates after shifting them to positivity
return cubulf.shift_coordinates(cube_coordinates, radius)
return shift_coordinates(cube_coordinates, radius)
# make_connected_2d_components:
# compute_connected_2d_components:
# From given cubuluation, compute connected components and output them as a list of lists
# INPUT:
# cubulation list of arrays output of make_cubulation
# cubulation list of arrays output of compute_cubulation
# field_cube numpy array input field in cubulated 3d-coordinates
# connectivity string use 'vertex' (default) or 'edge' connectivity
# OUTPUT:
......@@ -48,24 +48,24 @@ def make_cubulation(start_triangle, radius, print_progress=False):
# - Each inner list corresponds to a connected component and contains
# all triangle indices that belong to this connected component
def make_connected_components_2d(cubulation, field_cube, connectivity = 'vertex'):
def compute_connected_components_2d(cubulation, field_cube, connectivity = 'vertex'):
# compute connected components on field_cube
# components_cube is a 3-dimensional array that contains
# the connected component index of a triangle with cube coordinates (x,y,z) at position component_array[x][y][z]
components_cube = cubulf.make_connected_components(field_cube, connectivity)
components_cube = compute_connected_components(field_cube, connectivity)
# a component list of triangle indices is generated for each connected component:
# returns a list containing all component lists
components = cubulf.make_list_of_components (cubulation, components_cube)
components = compute_list_of_components(cubulation, components_cube)
return components
# make_connected_3d_components:
# compute_connected_3d_components:
# From given cubuluation, compute connected components and output them as a list of lists
# INPUT:
# cubulation list of arrays output of make_cubulation
# cubulation list of arrays output of compute_cubulation
# field_cube numpy array input field in dimensions of (lev x cubulated coordinates)
# connectivity string use 'vertex' (default) or 'edge' connectivity
# OUTPUT:
......@@ -73,7 +73,7 @@ def make_connected_components_2d(cubulation, field_cube, connectivity = 'vertex'
# - Each inner list corresponds to a connected component and contains all tuples of (level, triangle index) that
# belong to this connected component
def make_connected_components_3d(cubulation, field_cube, connectivity = 'vertex'):
def compute_connected_components_3d(cubulation, field_cube, connectivity = 'vertex'):
nlev = np.shape(field_cube)[0] # number of vertical levels
......@@ -81,7 +81,7 @@ def make_connected_components_3d(cubulation, field_cube, connectivity = 'vertex'
# in the following these are refered to as 2d connected components
components2d = []
for lev in range(nlev):
aux = make_connected_components_2d(cubulation, field_cube[lev], connectivity)
aux = compute_connected_components_2d(cubulation, field_cube[lev], connectivity)
for items in aux:
components2d.append([lev]+sorted(items))
del aux
......
import numpy as np
import cc3d # external library for connected component labeling on cubic grids
import grid_functions as gridf
from .grid_functions import get_vertices_of_edge, get_edges_of_cell, get_neighbors_of_cell
# class to enhance a triangle with its cube coordinate
# for a given triangle create an object consisting of the triangle index and the triangle's cube coordinate
......@@ -79,8 +78,8 @@ def colour_new_edges(old_triangle_edges, new_triangle_edges, joint_edge, edge_co
second_comparison_edge = comparison_edges.pop()
# get vertices of first_edge and first_comparison_edge
first_vertices = gridf.get_vertices_of_edge( first_edge )
first_comparison_vertices = gridf.get_vertices_of_edge( first_comparison_edge )
first_vertices = get_vertices_of_edge( first_edge )
first_comparison_vertices = get_vertices_of_edge( first_comparison_edge )
# if first_edge and first_comparison_edge have no shared vertices:
# the two edges must be parallel and thus, have the same colour
......@@ -145,7 +144,7 @@ def cubing_next_round(cube_coordinates, visited_triangles, outmost, edge_colours
new_outmost = []
for old in outmost: # consider all EnhancedTri's at the border of visited
for neigh in gridf.get_neighbors_of_cell(old.triangle ): # consider all neighbours
for neigh in get_neighbors_of_cell(old.triangle ): # consider all neighbours
new = EnhancedTri(neigh, 0)
if new.triangle == -9999: # use feature: triangles at the grid border claim to be adjacent to -9999
break
......@@ -157,8 +156,8 @@ def cubing_next_round(cube_coordinates, visited_triangles, outmost, edge_colours
visited_triangles.append ( new.triangle )
# preparation: obtain edges of new_triangle and old_triangle and derive their joint_edge
old_triangle_edges = set(gridf.get_edges_of_cell(old.triangle)) # this is a set for python reasons...
new_triangle_edges = set(gridf.get_edges_of_cell(new.triangle))
old_triangle_edges = set(get_edges_of_cell(old.triangle)) # this is a set for python reasons...
new_triangle_edges = set(get_edges_of_cell(new.triangle))
joint_edge = ( new_triangle_edges & old_triangle_edges ).pop()
# colour the edges of new.triangle
......@@ -174,7 +173,7 @@ def cubing_next_round(cube_coordinates, visited_triangles, outmost, edge_colours
# shift cube coordinates such that they are all positive
# INPUT:
# cube_coordinates array stores triangle - cube_coordinate pairs
# radius int same as in make_cube_coordiantes
# radius int same as in compute_cube_coordiantes
# OUTPUT:
# shifted cube_coordinates
def shift_coordinates (cube_coordinates, radius):
......@@ -197,7 +196,7 @@ def shift_coordinates (cube_coordinates, radius):
# connectivity string use 'vertex' (default) or 'edge' connectivity
# OUTPUT:
# component_cube, 3d array containing the component indices
def make_connected_components(field_cube, connectivity = 'vertex'):
def compute_connected_components(field_cube, connectivity = 'vertex'):
import sys
......@@ -217,13 +216,13 @@ def make_connected_components(field_cube, connectivity = 'vertex'):
# which triangle belongs to which component?
# INPUT:
# cubulation array output of make_cube_coordinates
# component_cube 3d arry output of make_connected_components
# cubulation array output of compute_cube_coordinates
# component_cube 3d arry output of compute_connected_components
# OUTPUT:
# Returns a list of lists.
# Each inner list corresponds to one connected component and contains the indices
# of all triangles that belong to it
def make_list_of_components (cubulation, component_cube):
def compute_list_of_components(cubulation, component_cube):
# number of found components on cubical grid
ncomp = component_cube.max()
......
# need to get access to the grid dataset
import config
ds_grid = config.ds_grid
# xarray dataset that holds the grid
grid = None
# get_neighbors_of_cell:
# returns edge neighbors of a given triangle
def get_neighbors_of_cell(triangle):
return ds_grid.neighbor_cell_index[:,triangle].values
return grid.neighbor_cell_index[:,triangle].values
# get_edges_of_cell:
# returns the three edges of a given triangle
def get_edges_of_cell(triangle):
return ds_grid.edge_of_cell[:, triangle].values
return edges.values
return grid.edge_of_cell[:, triangle].values
# get_vertices_of_edge:
# returns the two vertices of a given edge
def get_vertices_of_edge(edge):
return ds_grid.edge_vertices[:, edge].values
\ No newline at end of file
return grid.edge_vertices[:, edge].values
......@@ -13,11 +13,17 @@ import xarray as xr
# - the three edges of a given triangle
# - the two vertices of a given edge
def prepare_grid(model, path, file):
if model == 'ICON':
_ds_grid = __grid_icon(path+file)
return _ds_grid
# implementation for ICON model
def grid_icon(path, file):
ds_grid = xr.open_dataset(path + file)
def __grid_icon(_gridfile):
ds_grid = xr.open_dataset(_gridfile)
ds_grid = ds_grid[['neighbor_cell_index', 'edge_of_cell', 'edge_vertices']]
# in the ICON model grid, the indexing of the triangle cells and vertices
......@@ -33,6 +39,7 @@ def grid_icon(path, file):
return ds_grid
# prepare_field
# returns the data field as a numpy array, also applies threshold-based mask
# INPUT:
......@@ -42,10 +49,22 @@ def grid_icon(path, file):
# OUTPUT:
# field both on the triangular grid as well as on the cubulated grid
def prepare_field(model, path, file, var, threshold, cubulation):
if model == 'ICON':
_field, _field_cube = __field_icon(path, file, var, threshold, cubulation)
return _field, _field_cube
def prepare_field_lev(model, path, file, var, threshold, cubulation):
if model == 'ICON':
_field, _field_cube = __field_icon_lev(path, file, var, threshold, cubulation)
return _field, _field_cube
# implementation for ICON model
# single-level data
def field_icon(path, file, var, threshold, cubulation):
def __field_icon(path, file, var, threshold, cubulation):
# field on triangular grid
field = xr.open_dataset(path + file)[var].squeeze().values
......@@ -71,7 +90,7 @@ def field_icon(path, file, var, threshold, cubulation):
return field, field_cube
# data on multiple levels
def field_icon_lev(path, file, var, threshold, cubulation):
def __field_icon_lev(path, file, var, threshold, cubulation):
# field on triangular grid
field = xr.open_dataset(path + file)[var].squeeze().values
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment