Module `tikz.extended_wilkinson`

Extended-Wilkinson algorithm for tick values and labels

Implementation of J. Talbot, S. Lin, & P. Hanrahan, An Extension of Wilkinson’s Algorithm for Positioning Tick Labels on Axes, IEEE Trans. Vis. Comput. Graph., 16(6), 1036-1043, 2010. doi:10.1109/TVCG.2010.130

Based on the R implementation and additional information by Justin Talbot.

Other than the R code, this implementation includes the legibility score, and the result provides detailed information about the optimal presentation of tick values as labels. In line with that, the parameter target number of ticks m has been replaced by a target physical tick density (ρ_t in the paper).

Compared to the paper, there are two limitations: Of the eight label formats, only 'Decimal' and 'Factored scientific' are implemented; and tick labels are always '0-extended', i.e. if stripping of trailing zeros is desired, it must be performed by the user.

Expand source code

"""
Extended-Wilkinson algorithm for tick values and labels

Implementation of J. Talbot, S. Lin, & P. Hanrahan, An Extension of
Wilkinson’s Algorithm for Positioning Tick Labels on Axes, *IEEE Trans.
Vis. Comput. Graph.*, 16(6), 1036-1043, 2010.
[doi:10.1109/TVCG.2010.130](https://doi.org/10.1109/tvcg.2010.130)

Based on the
[R implementation](https://rdrr.io/rforge/labeling/src/R/labeling.R)
and [additional information](https://github.com/jtalbot/Labeling/issues/1)
by Justin Talbot.

Other than the R code, this implementation includes the *legibility* score,
and the result provides detailed information about the optimal presentation of
tick values as labels. In line with that, the parameter target number of ticks
`m` has been replaced by a target physical tick density (ρ<sub>t</sub> in the
paper).

Compared to the paper, there are two limitations: Of the eight label formats,
only 'Decimal' and 'Factored scientific' are implemented; and tick labels are
always '0-extended', i.e. if stripping of trailing zeros is desired, it must be
performed by the user.
"""

# Copyright (C) 2020 Carsten Allefeld

from math import log10, ceil, floor
from itertools import count
from decimal import Decimal as D


class cfg:
    "`tikz.extended_wilkinson` configuration variables"

    Q = [D(1), D(5), D(2), D('2.5'), D(4), D(3)]
    """
    preference-ordered list of nice step sizes

    Values must be of type `decimal.Decimal`. The default step sizes are 1, 5,
    2, 2.5, 4, and 3.
    """

    w = [0.25, 0.2, 0.5, 0.05]
    """
    list of weights of score components

    The default weights are 0.25 for *simplicity*, 0.2 for *coverage*, 0.5 for
    *density*, and 0.05 for *legibility*.
    """

    font_metrics = {'offset': 0.1, '-': 0.678, '1': 0.5, '2': 0.5, '3': 0.5,
                    '4': 0.5, '5': 0.5, '6': 0.5, '7': 0.5, '8': 0.5, '9': 0.5,
                    '0': 0.5, '.': 0.278, 'height': 0.728}
    """
    default font metrics

    Font metrics are used to calculate the width and height of tick labels.
    They are specified as a `dict`, where each character that can occur in a
    tick label (`'-'`, `'0'`–`'9'`, `'.'`) is a key associated with the
    character's width; it also contains an `'offset'` to be added to the total
    width, and a `'height'`. All numbers are in units of the font size.

    The default values are for La/TeX's standard math font (Computer Modern
    Roman).
    """


class TicksGenerator:
    """
    generator of tick values and labels

    This class stores parameters of tick generation that are likely to stay the
    same across several axes:

    `font_sizes`
    :   admissible font sizes, in TeX pt (2.54 cm / 72.27)

    `density`
    :   target density of ticks, in cm<sup>–1</sup>

    `font_metrics`
    :   used to calculate the width and height of tick labels,
        see `cfg.font_metrics` (default)

    `only_loose`
    :   whether the range of tick values is forced to encompass the
      range of data values

    Ticks for a specific axis are generated by calling the
    [`ticks()`](#tikz.extended_wilkinson.TicksGenerator.ticks) method on an
    instance.
    """

    def __init__(self, font_sizes, density,
                 font_metrics=None, only_loose=True):
        if font_metrics is None:
            font_metrics = cfg.font_metrics
        self.font_sizes = sorted(font_sizes)
        self.rt = density
        self.font_metrics = font_metrics
        self.only_loose = only_loose

    # scoring functions, including the approximations for limiting the search

    def _simplicity(self, i, start, j, k):
        # v: is zero included in the ticks?
        # modifications
        # - (lmin % lstep < eps or lstep - (lmin % lstep) < eps),
        #   means lmin / lstep = start / j is an integer
        # - lmin <= 0 means start <=0
        # - lmax >= 0 means start + j * (k - 1) >= 0
        v = (start % j == 0 and start <= 0 and start + j * (k - 1) >= 0) * 1
        return 1 - (i - 1) / (len(cfg.Q) - 1) - j + v

    def _simplicity_max(self, i, j):
        # upper bound on _simplicity w.r.t. k, z, start
        # = w.r.t. v
        return 1 - (i - 1) / (len(cfg.Q) - 1) - j + 1

    def _coverage(self, dmin, dmax, lmin, lmax):
        return (1 - 0.5 * ((dmax - lmax)**2 + (dmin - lmin)**2)
                / (0.1 * (dmax - dmin))**2)

    def _coverage_max(self, dmin, dmax, span):
        # upper bound on _coverage w.r.t. start
        range = dmax - dmin
        # The original code has a branching which I don't think is necessary.
        # if span > range:
        #     half = (span - range) / 2
        #     return 1 - 0.5 * (2 * half ** 2) / (0.1 * range)**2
        # else:
        #     return 1
        half = (span - range) / 2
        return 1 - 0.5 * (2 * half ** 2) / (0.1 * range)**2

    def _density(self, k, m, dmin, dmax, lmin, lmax):
        r = (k - 1) / (lmax - lmin)
        rt = (m - 1) / (max(lmax, dmax) - min(dmin, lmin))
        return 2 - max((r / rt, rt / r))

    def _density_max(self, k, m):
        # From original code, which I don't understand.
        if k >= m:
            return 2 - (k - 1) / (m - 1)
        else:
            # Probably just the trivial upper bound.
            return 1

    def _score(self, s, c, d, l):
        # combined score
        return cfg.w[0] * s + cfg.w[1] * c + cfg.w[2] * d + cfg.w[3] * l

    # optimization algorithm

    def ticks(self, dmin, dmax, axis_length, axis_horizontal):
        """
        generate tick values and labels for a specific axis

        `dmin`
        :   data values lower bound

        `dmax`
        :   data values upper bound

        `axis_length`
        :   physical length of the axis, in cm

        `axis_horizontal`
        :   whether the axis is oriented horizontally (rather than vertically)

        Returns a `Ticks` object.
        """

        # The implementation here is based on the R code, which is defined
        # in terms of `m`, the target number of ticks. It optimizes w.r.t.
        # the ratio between the two quantities
        #   r = (k - 1) / (lmax - lmin)
        #   rt = (m - 1) / (max(lmax, dmax) - min(dmin, lmin))
        # We want to instead specify the physical density (e.g. in 1/cm),
        # stored as a class attribute `self.rt`, and the parameter `length`
        # (e.g. in cm). Assuming that the axis spans `min(dmin, lmin)` to
        # `max(lmax, dmax)`, while the ticks span lmin to lmax, the
        # optimization should use the ratio of
        #   r = (k - 1) / (length * (lmax - lmin))
        #       * (max(lmax, dmax) - min(dmin, lmin))
        # to `self.rt`.
        # It turns out that the two ratios are equivalent if one sets
        m = self.rt * axis_length + 1

        if dmin > dmax:
            dmin, dmax = dmax, dmin

        # threshold for optimization
        ticks = None
        best_score = -2

        # We combine the j and q loops into one to enable breaking out of both
        # simultaneously, by iterating over a generator; and we create an
        # index i corresponding to q at the same time. i is `match(q, Q)[1]`
        # and replaces `q, Q` in function calls.
        JIQ = ((j, i, q)
               for j in count(start=1)
               for i, q in enumerate(cfg.Q, start=1))
        for j, i, q in JIQ:
            sm = self._simplicity_max(i, j)

            if self._score(sm, 1, 1, 1) < best_score:
                break

            for k in count(start=2):      # loop over tick counts
                dm = self._density_max(k, m)

                if self._score(sm, 1, dm, 1) < best_score:
                    break

                delta = (dmax - dmin) / (k + 1) / (j * float(q))

                for z in count(start=ceil(log10(delta))):
                    step = float(q) * j * 10**z

                    cm = self._coverage_max(dmin, dmax, step * (k - 1))

                    if self._score(sm, cm, dm, 1) < best_score:
                        break

                    min_start = floor(dmax / step) * j - (k - 1) * j
                    max_start = ceil(dmin / step) * j

                    if min_start > max_start:
                        continue

                    for start in range(min_start, max_start + 1):
                        lmin = start * step / j
                        lmax = lmin + step * (k - 1)
                        # lstep = step

                        if self.only_loose:
                            if lmin > dmin or lmax < dmax:
                                continue

                        s = self._simplicity(i, start, j, k)
                        c = self._coverage(dmin, dmax, lmin, lmax)
                        d = self._density(k, m, dmin, dmax, lmin, lmax)

                        score = self._score(s, c, d, 1)

                        if score < best_score:
                            continue

                        # Exact tick values in terms of loop variables:
                        #   lmin = q * start * 10**z
                        #   lmax = q * (start + j * (k - 1)) * 10 ** z
                        #   lstep = float(q) * j * 10**z
                        decimal_values = [q * (start + j * ind)
                                          * D('1E1') ** z
                                          for ind in range(k)]

                        # Create `Ticks` object
                        ticks = Ticks(
                            amin=min(lmin, dmin),
                            amax=max(lmax, dmax),
                            decimal_values=decimal_values)
                        # and initiate internal optimization for label
                        # legibility.
                        ticks._optimize(
                            self.font_sizes,
                            self.font_metrics,
                            axis_length,
                            axis_horizontal)

                        l = ticks.opt_legibility                                    # noqa E741

                        score = self._score(s, c, d, l)

                        if score > best_score:
                            best_score = score

        if ticks is None:
            # no solution found: no ticks
            print('Warning: Could not determine ticks.')
            ticks = Ticks(
                amin=dmin,
                amax=dmax,
                decimal_values=[],
                labels=[])
        return ticks


class Ticks:
    """
    represents tick values and labels

    This class is not intended to be instantiated by the user, but
    `Ticks` objects are obtained via `TicksGenerator.ticks`.
    """

    def __init__(self, amin, amax, decimal_values,
                 labels=None, plabel=None, font_size=None, horizontal=None):
        self.amin = amin
        "axis lower bound"

        self.amax = amax
        "axis upper bound"

        self.values = [float(dv) for dv in decimal_values]
        "list of tick values"

        self.decimal_values = decimal_values
        "list of exact tick values, as `decimal.Decimal`s"

        self.labels = labels
        "list of tick labels strings"

        self.plabel = plabel
        """
        power label string

        If `plabel` is not `None`, it represents a decadic power factored from
        the tick values. It is intended to be displayed in the form
        10<sup>`plabel`</sup> either at the side of the axis or as/with a unit
        in the axis label.
        """

        self.font_size = font_size
        "tick label font size, in TeX pt (2.54 cm / 72.27)"

        self.horizontal = horizontal
        """
        whether the tick label is to be displayed in horizontal orientation
        (rather than vertical)
        """

    def _optimize(self, font_sizes, font_metrics,
                  axis_length, axis_horizontal):
        """
        optimize label legibility in terms of format, font size, and
        orientation
        """

        # tick values
        values = self.values
        # minimum font size
        fs_min = min(font_sizes)
        # target font size
        fs_t = max(font_sizes)

        # optimization
        self.opt_legibility = float('-inf')
        # format
        for f in range(2):
            # legibility score for format
            if f == 0:
                # format 'Decimal'
                vls = [(1e-4 < abs(v) < 1e6) * 1 for v in values]
                leg_f = sum(vls) / len(vls)
            else:
                # format 'Factored scientific'
                leg_f = 0.3

            # tick labels
            if f == 0:
                # format 'Decimal'
                labels = self._labels_Decimal()
                plabel = None
            else:
                # format 'Factored scientific'
                labels, plabel = self._labels_Scientific()

            # widths and heights of tick labels, in units of font size
            widths = [self._label_width(l, font_metrics) for l in labels]
            heights = [self._label_height(l, font_metrics) for l in labels]

            # font size
            for fs in font_sizes:
                # legibility score for font size
                if fs == fs_t:
                    leg_fs = 1
                else:
                    leg_fs = 0.2 * (fs - fs_min + 1) / (fs_t - fs_min)

                # distance between ticks, in units of font size
                step = (
                    (values[1] - values[0])         # numerical
                    / (self.amax - self.amin)       # relative to axis
                    * axis_length                   # physical, in cm
                    /
                    (fs / 72.27 * 2.54)             # font size, in cm
                    )

                # orientation
                for o in range(2):
                    # legibility score for orientation
                    if o == 0:              # horizontal orientation
                        leg_or = 1
                    else:                   # vertical orientation
                        leg_or = -0.5

                    # legibility score for overlap
                    # extents of labels along the axis, in units of font size
                    if (o == 0) == axis_horizontal:
                        # label and axis have the same orientation
                        extents = widths
                    else:
                        # label and axis have different orientations
                        extents = heights
                    # minimum distance between neighboring labels
                    # We can apply the minimum here, since overlap legibility
                    # is an increasing function of distance.
                    dist = min(step - (extents[i] + extents[i + 1]) / 2
                               for i in range(len(extents) - 1))
                    # score; we interpret em as font size
                    if dist >= 1.5:
                        leg_ov = 1
                    elif dist > 0:
                        leg_ov = 2 - 1.5 / dist
                    else:
                        leg_ov = float('-inf')

                    # total legibility score
                    leg = (leg_f + leg_fs + leg_or + leg_ov) / 4

                    # aggregate
                    if leg > self.opt_legibility:
                        self.opt_legibility = leg
                        self.labels = labels
                        self.plabel = plabel
                        self.font_size = fs
                        self.horizontal = (o == 0)

    def _label_width(self, label, font_metrics):
        "get width of tick label"

        w = sum(map(font_metrics.get, label)) + font_metrics['offset']
        return w

    def _label_height(self, label, font_metrics):
        "get height of tick label"

        h = font_metrics['height']
        return h

    def _labels_Decimal(self):
        "get tick labels in 'Decimal' format"

        # get values
        dvs = self.decimal_values
        # create labels
        labels = ['{:f}'.format(dv) for dv in dvs]
        return labels

    def _labels_Scientific(self):
        "get tick labels in 'Scientific format'"

        # get values
        dvs = self.decimal_values
        # get largest power of 10 than can be factored out
        z0 = min([floor(log10(abs(dv))) for dv in dvs if dv != 0])
        # get values adjusted to that power
        dvs = [dv * D('1E1') ** (-z0) for dv in dvs]
        # create labels
        labels = ['{:f}'.format(dv) for dv in dvs]
        plabel = '{:d}'.format(z0)
        return labels, plabel

Classes

class cfg (*args, **kwargs)

tikz.extended_wilkinson configuration variables

Expand source code

class cfg:
    "`tikz.extended_wilkinson` configuration variables"

    Q = [D(1), D(5), D(2), D('2.5'), D(4), D(3)]
    """
    preference-ordered list of nice step sizes

    Values must be of type `decimal.Decimal`. The default step sizes are 1, 5,
    2, 2.5, 4, and 3.
    """

    w = [0.25, 0.2, 0.5, 0.05]
    """
    list of weights of score components

    The default weights are 0.25 for *simplicity*, 0.2 for *coverage*, 0.5 for
    *density*, and 0.05 for *legibility*.
    """

    font_metrics = {'offset': 0.1, '-': 0.678, '1': 0.5, '2': 0.5, '3': 0.5,
                    '4': 0.5, '5': 0.5, '6': 0.5, '7': 0.5, '8': 0.5, '9': 0.5,
                    '0': 0.5, '.': 0.278, 'height': 0.728}
    """
    default font metrics

    Font metrics are used to calculate the width and height of tick labels.
    They are specified as a `dict`, where each character that can occur in a
    tick label (`'-'`, `'0'`–`'9'`, `'.'`) is a key associated with the
    character's width; it also contains an `'offset'` to be added to the total
    width, and a `'height'`. All numbers are in units of the font size.

    The default values are for La/TeX's standard math font (Computer Modern
    Roman).
    """

Class variables

var Q: preference-ordered list of nice step sizes

Values must be of type decimal.Decimal. The default step sizes are 1, 5, 2, 2.5, 4, and 3.
var w: list of weights of score components

The default weights are 0.25 for simplicity, 0.2 for coverage, 0.5 for density, and 0.05 for legibility.
var font_metrics: default font metrics

Font metrics are used to calculate the width and height of tick labels. They are specified as a dict, where each character that can occur in a tick label ('-', '0'–'9', '.') is a key associated with the character's width; it also contains an 'offset' to be added to the total width, and a 'height'. All numbers are in units of the font size.

The default values are for La/TeX's standard math font (Computer Modern Roman).

class TicksGenerator (font_sizes, density, font_metrics=None, only_loose=True)

generator of tick values and labels

This class stores parameters of tick generation that are likely to stay the same across several axes:

font_sizes: admissible font sizes, in TeX pt (2.54 cm / 72.27)
density: target density of ticks, in cm^–1
font_metrics: used to calculate the width and height of tick labels, see cfg.font_metrics (default)
only_loose: whether the range of tick values is forced to encompass the range of data values

Ticks for a specific axis are generated by calling the ticks() method on an instance.

Expand source code

class TicksGenerator:
    """
    generator of tick values and labels

    This class stores parameters of tick generation that are likely to stay the
    same across several axes:

    `font_sizes`
    :   admissible font sizes, in TeX pt (2.54 cm / 72.27)

    `density`
    :   target density of ticks, in cm<sup>–1</sup>

    `font_metrics`
    :   used to calculate the width and height of tick labels,
        see `cfg.font_metrics` (default)

    `only_loose`
    :   whether the range of tick values is forced to encompass the
      range of data values

    Ticks for a specific axis are generated by calling the
    [`ticks()`](#tikz.extended_wilkinson.TicksGenerator.ticks) method on an
    instance.
    """

    def __init__(self, font_sizes, density,
                 font_metrics=None, only_loose=True):
        if font_metrics is None:
            font_metrics = cfg.font_metrics
        self.font_sizes = sorted(font_sizes)
        self.rt = density
        self.font_metrics = font_metrics
        self.only_loose = only_loose

    # scoring functions, including the approximations for limiting the search

    def _simplicity(self, i, start, j, k):
        # v: is zero included in the ticks?
        # modifications
        # - (lmin % lstep < eps or lstep - (lmin % lstep) < eps),
        #   means lmin / lstep = start / j is an integer
        # - lmin <= 0 means start <=0
        # - lmax >= 0 means start + j * (k - 1) >= 0
        v = (start % j == 0 and start <= 0 and start + j * (k - 1) >= 0) * 1
        return 1 - (i - 1) / (len(cfg.Q) - 1) - j + v

    def _simplicity_max(self, i, j):
        # upper bound on _simplicity w.r.t. k, z, start
        # = w.r.t. v
        return 1 - (i - 1) / (len(cfg.Q) - 1) - j + 1

    def _coverage(self, dmin, dmax, lmin, lmax):
        return (1 - 0.5 * ((dmax - lmax)**2 + (dmin - lmin)**2)
                / (0.1 * (dmax - dmin))**2)

    def _coverage_max(self, dmin, dmax, span):
        # upper bound on _coverage w.r.t. start
        range = dmax - dmin
        # The original code has a branching which I don't think is necessary.
        # if span > range:
        #     half = (span - range) / 2
        #     return 1 - 0.5 * (2 * half ** 2) / (0.1 * range)**2
        # else:
        #     return 1
        half = (span - range) / 2
        return 1 - 0.5 * (2 * half ** 2) / (0.1 * range)**2

    def _density(self, k, m, dmin, dmax, lmin, lmax):
        r = (k - 1) / (lmax - lmin)
        rt = (m - 1) / (max(lmax, dmax) - min(dmin, lmin))
        return 2 - max((r / rt, rt / r))

    def _density_max(self, k, m):
        # From original code, which I don't understand.
        if k >= m:
            return 2 - (k - 1) / (m - 1)
        else:
            # Probably just the trivial upper bound.
            return 1

    def _score(self, s, c, d, l):
        # combined score
        return cfg.w[0] * s + cfg.w[1] * c + cfg.w[2] * d + cfg.w[3] * l

    # optimization algorithm

    def ticks(self, dmin, dmax, axis_length, axis_horizontal):
        """
        generate tick values and labels for a specific axis

        `dmin`
        :   data values lower bound

        `dmax`
        :   data values upper bound

        `axis_length`
        :   physical length of the axis, in cm

        `axis_horizontal`
        :   whether the axis is oriented horizontally (rather than vertically)

        Returns a `Ticks` object.
        """

        # The implementation here is based on the R code, which is defined
        # in terms of `m`, the target number of ticks. It optimizes w.r.t.
        # the ratio between the two quantities
        #   r = (k - 1) / (lmax - lmin)
        #   rt = (m - 1) / (max(lmax, dmax) - min(dmin, lmin))
        # We want to instead specify the physical density (e.g. in 1/cm),
        # stored as a class attribute `self.rt`, and the parameter `length`
        # (e.g. in cm). Assuming that the axis spans `min(dmin, lmin)` to
        # `max(lmax, dmax)`, while the ticks span lmin to lmax, the
        # optimization should use the ratio of
        #   r = (k - 1) / (length * (lmax - lmin))
        #       * (max(lmax, dmax) - min(dmin, lmin))
        # to `self.rt`.
        # It turns out that the two ratios are equivalent if one sets
        m = self.rt * axis_length + 1

        if dmin > dmax:
            dmin, dmax = dmax, dmin

        # threshold for optimization
        ticks = None
        best_score = -2

        # We combine the j and q loops into one to enable breaking out of both
        # simultaneously, by iterating over a generator; and we create an
        # index i corresponding to q at the same time. i is `match(q, Q)[1]`
        # and replaces `q, Q` in function calls.
        JIQ = ((j, i, q)
               for j in count(start=1)
               for i, q in enumerate(cfg.Q, start=1))
        for j, i, q in JIQ:
            sm = self._simplicity_max(i, j)

            if self._score(sm, 1, 1, 1) < best_score:
                break

            for k in count(start=2):      # loop over tick counts
                dm = self._density_max(k, m)

                if self._score(sm, 1, dm, 1) < best_score:
                    break

                delta = (dmax - dmin) / (k + 1) / (j * float(q))

                for z in count(start=ceil(log10(delta))):
                    step = float(q) * j * 10**z

                    cm = self._coverage_max(dmin, dmax, step * (k - 1))

                    if self._score(sm, cm, dm, 1) < best_score:
                        break

                    min_start = floor(dmax / step) * j - (k - 1) * j
                    max_start = ceil(dmin / step) * j

                    if min_start > max_start:
                        continue

                    for start in range(min_start, max_start + 1):
                        lmin = start * step / j
                        lmax = lmin + step * (k - 1)
                        # lstep = step

                        if self.only_loose:
                            if lmin > dmin or lmax < dmax:
                                continue

                        s = self._simplicity(i, start, j, k)
                        c = self._coverage(dmin, dmax, lmin, lmax)
                        d = self._density(k, m, dmin, dmax, lmin, lmax)

                        score = self._score(s, c, d, 1)

                        if score < best_score:
                            continue

                        # Exact tick values in terms of loop variables:
                        #   lmin = q * start * 10**z
                        #   lmax = q * (start + j * (k - 1)) * 10 ** z
                        #   lstep = float(q) * j * 10**z
                        decimal_values = [q * (start + j * ind)
                                          * D('1E1') ** z
                                          for ind in range(k)]

                        # Create `Ticks` object
                        ticks = Ticks(
                            amin=min(lmin, dmin),
                            amax=max(lmax, dmax),
                            decimal_values=decimal_values)
                        # and initiate internal optimization for label
                        # legibility.
                        ticks._optimize(
                            self.font_sizes,
                            self.font_metrics,
                            axis_length,
                            axis_horizontal)

                        l = ticks.opt_legibility                                    # noqa E741

                        score = self._score(s, c, d, l)

                        if score > best_score:
                            best_score = score

        if ticks is None:
            # no solution found: no ticks
            print('Warning: Could not determine ticks.')
            ticks = Ticks(
                amin=dmin,
                amax=dmax,
                decimal_values=[],
                labels=[])
        return ticks

Methods

def ticks(self, dmin, dmax, axis_length, axis_horizontal)

generate tick values and labels for a specific axis

dmin: data values lower bound
dmax: data values upper bound
axis_length: physical length of the axis, in cm
axis_horizontal: whether the axis is oriented horizontally (rather than vertically)

Returns a Ticks object.

Expand source code

def ticks(self, dmin, dmax, axis_length, axis_horizontal):
    """
    generate tick values and labels for a specific axis

    `dmin`
    :   data values lower bound

    `dmax`
    :   data values upper bound

    `axis_length`
    :   physical length of the axis, in cm

    `axis_horizontal`
    :   whether the axis is oriented horizontally (rather than vertically)

    Returns a `Ticks` object.
    """

    # The implementation here is based on the R code, which is defined
    # in terms of `m`, the target number of ticks. It optimizes w.r.t.
    # the ratio between the two quantities
    #   r = (k - 1) / (lmax - lmin)
    #   rt = (m - 1) / (max(lmax, dmax) - min(dmin, lmin))
    # We want to instead specify the physical density (e.g. in 1/cm),
    # stored as a class attribute `self.rt`, and the parameter `length`
    # (e.g. in cm). Assuming that the axis spans `min(dmin, lmin)` to
    # `max(lmax, dmax)`, while the ticks span lmin to lmax, the
    # optimization should use the ratio of
    #   r = (k - 1) / (length * (lmax - lmin))
    #       * (max(lmax, dmax) - min(dmin, lmin))
    # to `self.rt`.
    # It turns out that the two ratios are equivalent if one sets
    m = self.rt * axis_length + 1

    if dmin > dmax:
        dmin, dmax = dmax, dmin

    # threshold for optimization
    ticks = None
    best_score = -2

    # We combine the j and q loops into one to enable breaking out of both
    # simultaneously, by iterating over a generator; and we create an
    # index i corresponding to q at the same time. i is `match(q, Q)[1]`
    # and replaces `q, Q` in function calls.
    JIQ = ((j, i, q)
           for j in count(start=1)
           for i, q in enumerate(cfg.Q, start=1))
    for j, i, q in JIQ:
        sm = self._simplicity_max(i, j)

        if self._score(sm, 1, 1, 1) < best_score:
            break

        for k in count(start=2):      # loop over tick counts
            dm = self._density_max(k, m)

            if self._score(sm, 1, dm, 1) < best_score:
                break

            delta = (dmax - dmin) / (k + 1) / (j * float(q))

            for z in count(start=ceil(log10(delta))):
                step = float(q) * j * 10**z

                cm = self._coverage_max(dmin, dmax, step * (k - 1))

                if self._score(sm, cm, dm, 1) < best_score:
                    break

                min_start = floor(dmax / step) * j - (k - 1) * j
                max_start = ceil(dmin / step) * j

                if min_start > max_start:
                    continue

                for start in range(min_start, max_start + 1):
                    lmin = start * step / j
                    lmax = lmin + step * (k - 1)
                    # lstep = step

                    if self.only_loose:
                        if lmin > dmin or lmax < dmax:
                            continue

                    s = self._simplicity(i, start, j, k)
                    c = self._coverage(dmin, dmax, lmin, lmax)
                    d = self._density(k, m, dmin, dmax, lmin, lmax)

                    score = self._score(s, c, d, 1)

                    if score < best_score:
                        continue

                    # Exact tick values in terms of loop variables:
                    #   lmin = q * start * 10**z
                    #   lmax = q * (start + j * (k - 1)) * 10 ** z
                    #   lstep = float(q) * j * 10**z
                    decimal_values = [q * (start + j * ind)
                                      * D('1E1') ** z
                                      for ind in range(k)]

                    # Create `Ticks` object
                    ticks = Ticks(
                        amin=min(lmin, dmin),
                        amax=max(lmax, dmax),
                        decimal_values=decimal_values)
                    # and initiate internal optimization for label
                    # legibility.
                    ticks._optimize(
                        self.font_sizes,
                        self.font_metrics,
                        axis_length,
                        axis_horizontal)

                    l = ticks.opt_legibility                                    # noqa E741

                    score = self._score(s, c, d, l)

                    if score > best_score:
                        best_score = score

    if ticks is None:
        # no solution found: no ticks
        print('Warning: Could not determine ticks.')
        ticks = Ticks(
            amin=dmin,
            amax=dmax,
            decimal_values=[],
            labels=[])
    return ticks

class Ticks (amin, amax, decimal_values, labels=None, plabel=None, font_size=None, horizontal=None)

represents tick values and labels

This class is not intended to be instantiated by the user, but Ticks objects are obtained via TicksGenerator.ticks().

Expand source code

class Ticks:
    """
    represents tick values and labels

    This class is not intended to be instantiated by the user, but
    `Ticks` objects are obtained via `TicksGenerator.ticks`.
    """

    def __init__(self, amin, amax, decimal_values,
                 labels=None, plabel=None, font_size=None, horizontal=None):
        self.amin = amin
        "axis lower bound"

        self.amax = amax
        "axis upper bound"

        self.values = [float(dv) for dv in decimal_values]
        "list of tick values"

        self.decimal_values = decimal_values
        "list of exact tick values, as `decimal.Decimal`s"

        self.labels = labels
        "list of tick labels strings"

        self.plabel = plabel
        """
        power label string

        If `plabel` is not `None`, it represents a decadic power factored from
        the tick values. It is intended to be displayed in the form
        10<sup>`plabel`</sup> either at the side of the axis or as/with a unit
        in the axis label.
        """

        self.font_size = font_size
        "tick label font size, in TeX pt (2.54 cm / 72.27)"

        self.horizontal = horizontal
        """
        whether the tick label is to be displayed in horizontal orientation
        (rather than vertical)
        """

    def _optimize(self, font_sizes, font_metrics,
                  axis_length, axis_horizontal):
        """
        optimize label legibility in terms of format, font size, and
        orientation
        """

        # tick values
        values = self.values
        # minimum font size
        fs_min = min(font_sizes)
        # target font size
        fs_t = max(font_sizes)

        # optimization
        self.opt_legibility = float('-inf')
        # format
        for f in range(2):
            # legibility score for format
            if f == 0:
                # format 'Decimal'
                vls = [(1e-4 < abs(v) < 1e6) * 1 for v in values]
                leg_f = sum(vls) / len(vls)
            else:
                # format 'Factored scientific'
                leg_f = 0.3

            # tick labels
            if f == 0:
                # format 'Decimal'
                labels = self._labels_Decimal()
                plabel = None
            else:
                # format 'Factored scientific'
                labels, plabel = self._labels_Scientific()

            # widths and heights of tick labels, in units of font size
            widths = [self._label_width(l, font_metrics) for l in labels]
            heights = [self._label_height(l, font_metrics) for l in labels]

            # font size
            for fs in font_sizes:
                # legibility score for font size
                if fs == fs_t:
                    leg_fs = 1
                else:
                    leg_fs = 0.2 * (fs - fs_min + 1) / (fs_t - fs_min)

                # distance between ticks, in units of font size
                step = (
                    (values[1] - values[0])         # numerical
                    / (self.amax - self.amin)       # relative to axis
                    * axis_length                   # physical, in cm
                    /
                    (fs / 72.27 * 2.54)             # font size, in cm
                    )

                # orientation
                for o in range(2):
                    # legibility score for orientation
                    if o == 0:              # horizontal orientation
                        leg_or = 1
                    else:                   # vertical orientation
                        leg_or = -0.5

                    # legibility score for overlap
                    # extents of labels along the axis, in units of font size
                    if (o == 0) == axis_horizontal:
                        # label and axis have the same orientation
                        extents = widths
                    else:
                        # label and axis have different orientations
                        extents = heights
                    # minimum distance between neighboring labels
                    # We can apply the minimum here, since overlap legibility
                    # is an increasing function of distance.
                    dist = min(step - (extents[i] + extents[i + 1]) / 2
                               for i in range(len(extents) - 1))
                    # score; we interpret em as font size
                    if dist >= 1.5:
                        leg_ov = 1
                    elif dist > 0:
                        leg_ov = 2 - 1.5 / dist
                    else:
                        leg_ov = float('-inf')

                    # total legibility score
                    leg = (leg_f + leg_fs + leg_or + leg_ov) / 4

                    # aggregate
                    if leg > self.opt_legibility:
                        self.opt_legibility = leg
                        self.labels = labels
                        self.plabel = plabel
                        self.font_size = fs
                        self.horizontal = (o == 0)

    def _label_width(self, label, font_metrics):
        "get width of tick label"

        w = sum(map(font_metrics.get, label)) + font_metrics['offset']
        return w

    def _label_height(self, label, font_metrics):
        "get height of tick label"

        h = font_metrics['height']
        return h

    def _labels_Decimal(self):
        "get tick labels in 'Decimal' format"

        # get values
        dvs = self.decimal_values
        # create labels
        labels = ['{:f}'.format(dv) for dv in dvs]
        return labels

    def _labels_Scientific(self):
        "get tick labels in 'Scientific format'"

        # get values
        dvs = self.decimal_values
        # get largest power of 10 than can be factored out
        z0 = min([floor(log10(abs(dv))) for dv in dvs if dv != 0])
        # get values adjusted to that power
        dvs = [dv * D('1E1') ** (-z0) for dv in dvs]
        # create labels
        labels = ['{:f}'.format(dv) for dv in dvs]
        plabel = '{:d}'.format(z0)
        return labels, plabel

Instance variables

var amin: axis lower bound
var amax: axis upper bound
var values: list of tick values
var decimal_values: list of exact tick values, as decimal.Decimals
var labels: list of tick labels strings
var plabel: power label string

If plabel is not None, it represents a decadic power factored from the tick values. It is intended to be displayed in the form 10^plabel either at the side of the axis or as/with a unit in the axis label.
var font_size: tick label font size, in TeX pt (2.54 cm / 72.27)
var horizontal: whether the tick label is to be displayed in horizontal orientation (rather than vertical)