Formatting Options
sciform provides a variety of options for converting numbers into
formatted strings.
These options include control over the rounding strategy, scientific
notation formatting, separation characters and more.
Exponent Mode
sciform supports a variety of exponent modes.
To display numbers across a wide range of magnitudes, scientific
formatting presents numbers in the form:
num = mantissa * 10**exp
Where exp is an integer.
The different exponent modes control how mantissa and exp are
chosen for a given input number num.
Fixed Point
Fixed point notation is standard notation in which a number is displayed directly with no extra exponent.
>>> from sciform import Formatter
>>> formatter = Formatter(exp_mode="fixed_point")
>>> print(formatter(123.456))
123.456
>>> print(formatter(123.456, 0.001))
123.456 ± 0.001
Percent
Percent mode is similar to fixed point mode.
For percent mode, the number is multiplied by 100 and a % symbol is
appended to the end of the formatted string.
>>> formatter = Formatter(exp_mode="percent")
>>> print(formatter(0.12345))
12.345%
>>> print(formatter(0.12345, 0.001))
(12.3 ± 0.1)%
Scientific Notation
Scientific notation is used to display numbers.
In scientific notation, the exponent is uniquely chosen so that the
mantissa m satisfies 1 <= m < 10.
>>> formatter = Formatter(exp_mode="scientific")
>>> print(formatter(123.456))
1.23456e+02
>>> print(formatter(123.456, 0.001))
(1.23456 ± 0.00001)e+02
By default the exponent is expressed using ASCII characters, e.g.
e+02.
The sign symbol is always included and the exponent value is left padded
so that it is at least two digits wide.
These behaviors for ASCII exponents cannot be modified.
However the Superscript Exponent Format mode can be used to represent the
exponent using Unicode superscript characters.
Engineering Notation
Engineering notation is similar to scientific notation except the
exponent is chosen to be an integer multiple of 3.
In standard engineering notation, the mantissa m satisfies
1 <= m < 1000.
Engineering notation is compatible with the SI prefixes which are
defined for any integer multiple of 3 exponent, e.g.:
369,000,00 Hz = 369e+06 Hz = 369 MHz
Here it may be more convenient to display the number with an exponent of 6 for rapid identification of the “Mega” order of magnitude, as opposed to an exponent of 8 which would be chosen for standard scientific notation.
>>> formatter = Formatter(exp_mode="engineering")
>>> print(formatter(123.456))
123.456e+00
>>> print(formatter(123.456, 0.001))
(123.456 ± 0.001)e+00
Shifted Engineering Notation
Shifted engineering notation is the same as engineering notation except
the exponent is chosen so that the mantissa m satisfies
0.1 <= m < 100.
>>> formatter = Formatter(exp_mode="engineering_shifted")
>>> print(formatter(123.456))
0.123456e+03
>>> print(formatter(123.456, 0.001))
(0.123456 ± 0.000001)e+03
Fixed Exponent
The user can coerce the exponent for the formatting to a fixed value.
>>> formatter = Formatter(exp_mode="scientific", exp_val=3)
>>> print(formatter(123.456))
0.123456e+03
To explicitly force sciform to automatically select the exponent
then set exp_val="auto".
This is the default value in the global options.
Note that the forced exponent must be consistent with the requested exponent mode. For fixed point and percent modes an explicit fixed exponent must equal 0. For engineering and shifted engineering modes an explicit fixed exponent must be an integer multiple of 3. Because of this constrained behavior, it is recommended to only use a fixed exponent with the scientific exponent mode.
Exponent String Replacement
sciform provides a number of formatting options for replacing
exponent strings such as 'e-03' with conventional strings such as
'm' (the SI prefix for milli) to succinctly communicate the order
of magnitude.
Exponent strings can be replaced with either SI prefixes or parts-per
identifiers.
See Supported Exponent Replacements for all default supported
replacements.
Furthermore, it is possible to customize Formatter
objects or the global options settings to map additional translations,
in addition to those provided by default.
>>> formatter = Formatter(exp_mode="engineering", exp_format="prefix")
>>> print(formatter(4242.13))
4.24213 k
>>> formatter = Formatter(exp_mode="engineering", exp_format="parts_per")
>>> print(formatter(12.3e-6))
12.3 ppm
Extra Exponent Replacements
In addition to the default
exponent replacements, The user can modify the
available exponent replacements using a number of options.
The SI prefix and parts-per replacements can be modified
using the extra_si_prefixes and extra_parts_per_forms options,
respectively, and passing in dictionaries with keys corresponding to
integer exponents and values corresponding to translated strings.
The entries in these dictionaries overwrite any default translation
mappings.
>>> formatter = Formatter(
... exp_mode="scientific",
... exp_format="prefix",
... extra_si_prefixes={-2: "c"},
... )
>>> print(formatter(3e-2))
3 c
Passing None for the value for a corresponding exponent value will
force that exponent to not be translated.
>>> formatter = Formatter(exp_mode="engineering", exp_format="parts_per")
>>> print(formatter(3e-9))
3 ppb
>>> formatter = Formatter(
... exp_mode="engineering",
... exp_format="parts_per",
... extra_parts_per_forms={-9: None},
... )
>>> print(formatter(3e-9))
3e-09
Keys into the extra translations dictionaries must be integers and values must consist of only English alphabetic characters.
Two helper options exist to add additional SI prefix translations corresponding to:
{-2: 'c', -1: 'd', +1: 'da', +2: 'h'}
These SI prefixes are excluded by default because they do not correspond
to the integer-multiple-of-3 prefixes which are compatible with
engineering notation.
However, they can be easily be included using the add_c_prefix and
add_small_si_prefixes options.
>>> formatter = Formatter(
... exp_mode="scientific",
... exp_format="prefix",
... add_c_prefix=True,
... )
>>> print(formatter(0.025))
2.5 c
>>> formatter = Formatter(
... exp_mode="scientific",
... exp_format="prefix",
... add_small_si_prefixes=True,
... )
>>> print(formatter(25))
2.5 da
A parts-per-thousand form, ppth, can be accessed with
the add_ppth_form option.
Note that ppth is not a standard notation for “parts-per-thousand”,
but it is one that the author has found useful.
>>> formatter = Formatter(
... exp_mode="engineering",
... exp_format="parts_per",
... add_ppth_form=True,
... )
>>> print(formatter(12.3e-3))
12.3 ppth
Note that the helper flags will not overwrite value/string pairs already specified in the extra translations dictionary:
>>> formatter = Formatter(
... exp_mode="scientific",
... exp_format="prefix",
... add_c_prefix=True,
... )
>>> print(formatter(0.012))
1.2 c
>>> formatter = Formatter(
... exp_mode="scientific",
... exp_format="prefix",
... extra_si_prefixes={-2: "zzz"},
... add_c_prefix=True,
... )
>>> print(formatter(0.012))
1.2 zzz
Note that there is never merging of local and global extra
translations.
If any local extra translation settings are configured directly with
e.g. extra_si_prefixes or with a helper like
add_small_si_prefixes then no global extra translations will be
used.
>>> from sciform import GlobalOptionsContext
>>> formatter = Formatter(
... exp_mode="scientific",
... exp_format="prefix",
... extra_si_prefixes={-4: "zzz"},
... )
>>> with GlobalOptionsContext(add_c_prefix=True):
... print(formatter(0.012))
1.2e-02
>>> formatter = Formatter(
... exp_mode="scientific",
... exp_format="prefix",
... add_c_prefix=True,
... )
>>> with GlobalOptionsContext(extra_si_prefixes={1: "zzz"}):
... print(formatter(12.4))
1.24e+01
If all local extra translation settings are left unset then all global extra translation settings will be populated at format time. This behavior is the same as the behavior for all other options.
Rounding
sciform provides four rounding strategies controlled by the
round_mode and ndigits options.
Significant figure rounding is selected by setting
round_mode="sig_fig"and passing in a positiveintforndigitsspecifying the number of significant figures to which to round.Digits-past-the-deciaml rounding is selected by setting
round_mode="dec_place"and passing anintforndigitsspecifying the digit place to which to round.All digits rounding is selected by setting
round_mode="all". In this mode “all” digits of the number are shown where the definition of “all” depends on the type of the numerical input data. See below for more details. In this mode thendigitsoption has no effect.Particle data group (PDG) rounding is selected by setting
round_mode="pdg". See below for the defintion of PDG rounding. In this mode thendigitsoption has no effect.
Rounding always applies to the mantissa determined after identifying the
appropriate exponent for display based on exp_mode and exp_val.
In some cases, the rounding results in a modification to the chosen
exponent.
This is taken into account before the final presentation.
>>> formatter = Formatter(
... exp_mode="scientific",
... round_mode="dec_place",
... ndigits=2,
... )
>>> print(formatter(9.99))
9.99e+00
>>> formatter = Formatter(
... exp_mode="scientific",
... round_mode="dec_place",
... ndigits=1,
... )
>>> print(formatter(9.99))
1.0e+01
By default, sciform uses
“round-to-even” or “banker’s rounding”:
>>> formatter = Formatter(
... round_mode="sig_fig",
... ndigits=2,
... )
>>> print(formatter("2.45"))
2.4
>>> print(formatter("2.35"))
2.4
All sciform inputs are converted to Decimal as the first step
for formatting before determining the mantissa and exponent and applying the
rounding algorithm.
This conversion can sometimes have surprising effects for rounding.
See Note on Decimals and Floats for more details.
Note that for value/uncertainty formatting, if the uncertainty is finite and non-zero, the rounding is applied to the uncertainty first then the value is rounded to the same decimal place (as long as it is finite).
Significant Figures
Significant figure rounding is selected by setting round_mode="sig_fig" and
ndigits equal to a positive integer (zero excluded). In this mode, After the
exponent is first calculated, the digits place for the most-significant digit of
the mantissa is identified.
Then the mantissa is rounded to the specified number of significant figures
below that digits place.
E.g. for 12345.678 the most-significant digit appears in the
ten-thousands, or 104, place.
To express this number to 4-significant digits means we should round it
to the tens, or 101, place resulting in 12350.
>>> formatter = Formatter(
... exp_mode="engineering",
... round_mode="sig_fig",
... ndigits=4,
... )
>>> print(formatter(12345.678))
12.35e+03
Note that 1001 rounded to 1, 2, or 3 significant figures results in 1000. This demonstrates that we can’t determine how many significant figures a number was rounded to (or “how many significant figures a number has”) just by looking at the resulting string.
Digits-Past-the-Decimal Place
Decimal place, or digits-past-the-decimal, rounding is selected by setting
round_mode="dec_place" and ndigits equal to any integer.
In this mode ndigits specifies the decimal place to which the mantissa will
be rounded.
The convention for ndigits is the same as that for the built-in
round function.
E.g. ndigits=2 means to round to two digits past the decimal place,
the hundredths or 10-2 place, so that 12.987 would be
rounded to 12.99.
>>> formatter = Formatter(exp_mode="engineering", round_mode="dec_place", ndigits=4)
>>> print(formatter(12345.678))
12.3457e+03
Unlike the built in number formatting, it is possible for ndigits <= 0:
>>> formatter = Formatter(
... exp_mode="fixed_point",
... round_mode="dec_place",
... ndigits=-2,
... )
>>> print(formatter(12345.678))
12300
All Digits
“All” digits rounding is selected by setting round_mode="all".
In this case, the ndigits option is ignored.
This mode attempts to display the input digits with full precision.
What exactly “full precision” means depends on the type of the input.
int, str and Decimal inputs are converted to
Decimal as the first step for formatting.
For these inputs the number of digits shown in "all" mode will be the
smaller of the number of significant digits in the number or the
decimal module precision (default 28).
float inputs are first converted to str before being converted
to Decimal.
The string conversion converts the float into the shortest string representation
that will round trip to the same value as the input float.
For the 64-bit floats used in python this will be at most 17 digits, but it is
typically many fewer digits.
Unless the decimal module precision is decreased below 17 digits, then
the result of "all" rounding will print the same digits as present in the
string representation of the input float.
>>> from decimal import Decimal
>>> formatter = Formatter(exp_mode="scientific", round_mode="all")
>>> float_num = 1/9
>>> dec_num = Decimal(float_num).normalize()
>>> print(float_num)
0.1111111111111111
>>> print(formatter(float_num))
1.111111111111111e-01
>>> print(dec_num)
0.1111111111111111049432054187
>>> print(formatter(dec_num))
1.111111111111111049432054187e-01
PDG Significant Figures
The PDG rounding is selected by setting round_mode="pdg".
In this case, the ndigits option is ignored.
Typically value/uncertainty pairs are formatted with one or two
significant figures displayed for the uncertainty.
The Particle Data Group has
published an algorithm
for deciding when to
display uncertainty with one versus two significant figures.
sciform can apply the PDG rounding algorithm either to value/uncertainty
pairs or to values alone.
The algorithm is as follows:
Determine the three most significant digits of the number (without rounding). E.g. if the number is 0.004857 then these digits would be 485. Call 485 the scaled number.
If the scaled number is between 100 and 354 (inclusive) then the number is rounded and displayed to one digit below its most significant digit. This means it will have two significant digit. E.g. if the uncertainty is 3.03 then it will appear as as 3.0
If the scaled number is between 355 and 949 (inclusive) then the number is rounded and displayed to the same digit as the most significant digit. This means it will have one significant digit. E.g. if the uncertainty is 0.76932 then it will appear as 0.8
If the scaled number is between 950 and 999 (inclusive) then the number is rounded and displayed to the same digit as the most significant digit. But 950 and above will always be rounded to 1000 if we round to the hundreds place. This means there will be two significant digits. E.g. if the number is 0.0099 then it will be displayed as 0.010.
The rationale for this algorithm is that it uses more significant digits in the bottom section of an order-of-magnitude range where the fractional error due to rounding is larger, but it saves ink in the upper section of an order-of-magnitude range where the fractional error is not as large. For a more thorough discussion see Significant digit 354 rule from Particle Data Group.
>>> formatter = Formatter(
... round_mode="pdg",
... )
>>> print(formatter(0.0123))
0.012
>>> print(formatter(0.0483))
0.05
>>> print(formatter(0.0997))
0.10
Separators
sciform provides support for some customization for separator
characters within formatting strings.
Different locales use different conventions for the symbol separating
the integral and fractional part of a number, called the decimal symbol.
sciform supports using a period '.' or comma ',' as the
decimal symbol.
Additionally, sciform also supports including separation
characters between groups of three digits both above the decimal symbol
and below the decimal symbol.
'', ',', '.', ' ', '_' can all be used as
“upper” separator characters and '', ' ', and '_' can
all be used as “lower” separator characters.
Note that the upper separator character must be different than the
decimal separator.
>>> formatter = Formatter(upper_separator=",")
>>> print(formatter(12345678.987))
12,345,678.987
>>> formatter = Formatter(
... upper_separator=" ",
... decimal_separator=",",
... lower_separator="_",
... )
>>> print(formatter(1234567.7654321))
1 234 567,765_432_1
NIST discourages the use of ',' or '.' as thousands separators
because they can be confused with the decimal separators depending on
the locality. See
NIST Guide to the SI 10.5.3.
Sign Mode
sciform provides control over the symbol used to indicate whether
a number is positive or negative.
In all cases a '-' sign is used for negative numbers.
By default, positive numbers are formatted with no sign symbol.
However, sciform includes a mode where positive numbers are
always presented with a '+' symbol.
sciform also provides a mode where positive numbers include an
extra whitespace in place of a sign symbol.
This mode may be useful to match string lengths when positive and
negatives numbers are being presented together, but without explicitly
including a '+' symbol.
>>> formatter = Formatter(sign_mode="-")
>>> print(formatter(42))
42
>>> formatter = Formatter(sign_mode="+")
>>> print(formatter(42))
+42
>>> formatter = Formatter(sign_mode=" ")
>>> print(formatter(42))
42
Note that both float nan and float 0 have sign
bits which may be positive or negative.
sciform always ignores these sign bits and never puts a + or
- symbol in front of either nan or 0.
In "+" or " " sign modes nan and 0 are always preceded
by a space.
The sign symbol for ±inf is resolved the same as for
finite numbers.
>>> formatter = Formatter(sign_mode="-")
>>> print(formatter(float("-0")))
0
>>> print(formatter(float("-nan")))
nan
>>> print(formatter(float("+inf")))
inf
>>> formatter = Formatter(sign_mode="+")
>>> print(formatter(float("+0")))
0
>>> print(formatter(float("+nan")))
nan
>>> print(formatter(float("+inf")))
+inf
>>> formatter = Formatter(sign_mode=" ")
>>> print(formatter(float("-0")))
0
>>> print(formatter(float("-nan")))
nan
>>> print(formatter(float("-inf")))
-inf
Capitalization
The capitalization of the exponent character can be controlled
>>> formatter = Formatter(exp_mode="scientific", capitalize=True)
>>> print(formatter(42))
4.2E+01
The capitalize flag also controls the capitalization of nan and
inf formatting:
>>> print(formatter(float("nan")))
NAN
>>> print(formatter(float("-inf")))
-INF
Left Padding
The Rounding options described above can be used to control how
many digits to the right of either the most-significant digit or the
decimal point are displayed.
It is also possible, using left padding options, to add digits to the
left of the most-significant digit.
The left_pad_char option can be used to select either whitespaces
' ' or zeros '0' as pad characters.
The left_pad_dec_place option is used to indicate to which decimal
place pad characters should be added.
E.g. left_pad_dec_place=4 indicates pad characters should be
added up to the 104 (ten-thousands) decimal place.
>>> formatter = Formatter(left_pad_char="0", left_pad_dec_place=4)
>>> print(formatter(42))
00042
Superscript Exponent Format
The superscript option can be chosen to present exponents in
superscript notation as opposed to e.g. e+02 notation.
>>> formatter = Formatter(exp_mode="scientific", superscript=True)
>>> print(formatter(789))
7.89×10²
Include Exponent on nan and inf
Python supports 'nan', 'inf', and
'-inf' numbers which are simply formatted to 'nan', 'inf',
and '-inf' or 'NAN', 'INF', and '-INF', respectively,
depending on capitalize.
However, if nan_inf_exp=True (default False), then, for
scientific, percent, and engineering exponent modes, these will
instead be formatted as, e.g. '(nan)e+00'.
>>> formatter = Formatter(
... exp_mode="scientific",
... nan_inf_exp=False,
... capitalize=True,
... )
>>> print(formatter(float("-inf")))
-INF
>>> formatter = Formatter(
... exp_mode="scientific",
... nan_inf_exp=True,
... capitalize=True,
... )
>>> print(formatter(float("-inf")))
(-INF)E+00
>>> formatter = Formatter(
... exp_mode="percent",
... nan_inf_exp=False,
... capitalize=True,
... )
>>> print(formatter(float("-inf")))
-INF
>>> formatter = Formatter(
... exp_mode="percent",
... nan_inf_exp=True,
... capitalize=True,
... )
>>> print(formatter(float("-inf")))
(-INF)%
Value/Uncertainty Formatting Options
For value/uncertainty formatting, the value + uncertainty pair are formatted as follows. First, significant figure rounding is applied to the uncertainty according to the specified precision. Next the value is rounded to the same position as the uncertainty. The exponent is then determined using the exponent mode and the larger of the value or uncertainty. The value and the uncertainty are then formatted into a single string according to the options below.
>>> formatter = Formatter()
>>> print(formatter(123.456, 0.789))
123.456 ± 0.789
Parentheses Uncertainty
The BIPM Guide Section 7.2.2 Provides three example value/uncertainty formats:
100,021 47 ± 0,000 35
100,021 47(0,000 35)
100,021 47(35)
In the first example the value and uncertainty are shown as regular
numbers separated by a ±.
In the second example the uncertainty is shown after the value inside
parentheses.
The third example is like the second but all leading zeros and separator
characters have been removed.
In the third example it is clear to see exactly which digits of the
value are uncertain and by how much.
sciform provides the ability to realize all three of these
formatting strategies by using the paren_uncertainty and
paren_uncertainty_trim options.
>>> from sciform import Formatter, GlobalOptionsContext
>>> value = 100.02147
>>> uncertainty = 0.00035
>>>
>>> with GlobalOptionsContext(decimal_separator=",", lower_separator=" "):
... formatter = Formatter(
... paren_uncertainty=False,
... )
... print(formatter(value, uncertainty))
...
... formatter = Formatter(
... paren_uncertainty=True,
... paren_uncertainty_trim=False,
... )
... print(formatter(value, uncertainty))
...
... formatter = Formatter(
... paren_uncertainty=True,
... paren_uncertainty_trim=True,
... )
... print(formatter(value, uncertainty))
100,021 47 ± 0,000 35
100,021 47(0,000 35)
100,021 47(35)
paren_uncertainty_trim modifies the uncertainty by eliminating any leading
zeros and all separators unless there are significant digits above and below
the decimal separator, in which case that separator is kept.
>>> value = 100.0215
>>> uncertainty = 1.2345
>>> formatter = Formatter(
... paren_uncertainty=True,
... paren_uncertainty_trim=False,
... decimal_separator=",",
... lower_separator=" ",
... )
>>> print(formatter(value, uncertainty))
100,021 5(1,234 5)
>>> formatter = Formatter(
... paren_uncertainty=True,
... paren_uncertainty_trim=True,
... decimal_separator=",",
... lower_separator=" ",
... )
>>> print(formatter(value, uncertainty))
100,021 5(1,2345)
Note that the BIPM guide does not show any examples where the digits of the uncertainty span either a grouping or decimal separator. This means there is no official guidance about
Should
18.4 ± 2.1be formatted as18.4(2.1)or18.4(21).Should
18.456 4 ± 0.002 1be formatted as18.456 4(2 1)or18.456 4(21).
sciform formats the trimmed parentheses uncertainty mode by
never removing the decimal separator unless it is to the left of the
most significant digit of the uncertainty but to always remove all
upper and lower separator characters.
By contrast, the siunitx
LaTeX package always removes all separators characters, including the
decimal.
The default global options have paren_uncertainty=False and
paren_uncertainty_trim=True.
Here are examples demonstrating paren_uncertainty behavior when exponent
strings are present.
>>> formatter = Formatter(
... exp_mode="engineering",
... exp_format="standard",
... paren_uncertainty=True,
... )
>>> print(formatter(523.4e-3, 1.2e-3))
523.4(1.2)e-03
>>> formatter = Formatter(
... exp_mode="engineering",
... exp_format="prefix",
... paren_uncertainty=True,
... )
>>> print(formatter(523.4e-3, 1.2e-3))
523.4(1.2) m
The latter example is consistent with BIPM examples.
Match Value/Uncertainty Width
If the user passes left_pad_dec_place into a Formatter,
then that decimal place will be used for left padding both the value and
the uncertainty.
sciform provides additional control over the left padding of the
value and the uncertainty by allowing the user to left pad to the
maximum of (1) the specified left_pad_dec_place, (2) the most
significant digit of the value, and (3) the most significant digit of
the uncertainty.
This feature is accessed with the left_pad_matching option.
>>> formatter = Formatter(
... left_pad_char="0",
... left_pad_dec_place=2,
... left_pad_matching=False,
... )
>>> print(formatter(12345, 1.23))
12345.00 ± 001.23
>>> formatter = Formatter(
... left_pad_char="0",
... left_pad_dec_place=2,
... left_pad_matching=True,
... )
>>> print(formatter(12345, 1.23))
12345.00 ± 00001.23
Plus/Minus Symbol Whitespace
Depending on the value of pm_whitespace, the ± symbol will either have
one full space on either side of it or not.
>>> formatter = Formatter(pm_whitespace=True)
>>> print(formatter(2, 1))
2 ± 1
>>> formatter = Formatter(pm_whitespace=False)
>>> print(formatter(2, 1))
2±1