Formatting Options
sciform
provides a variety of options for converting numbers into
formatted strings.
These options include control over the rounding strategy, scientific
notation formatting, separation characters and more.
Exponent Mode
sciform
supports a variety of exponent modes.
To display numbers across a wide range of magnitudes, scientific
formatting presents numbers in the form:
num = mantissa * base**exp
Where exp is an integer and base
is 10 or 2.
The different exponent modes control how mantissa
, base
and
exp
are chosen for a given input number num
.
Fixed Point
Fixed point notation is standard notation in which a number is displayed directly with no extra exponent.
>>> from sciform import Formatter
>>> formatter = Formatter(exp_mode="fixed_point")
>>> print(formatter(123.456))
123.456
>>> print(formatter(123.456, 0.001))
123.456 ± 0.001
Percent
Percent mode is similar to fixed point mode.
For percent mode, the number is multiplied by 100 and a %
symbol is
appended to the end of the formatted string.
>>> formatter = Formatter(exp_mode="percent")
>>> print(formatter(0.12345))
12.345%
>>> print(formatter(0.12345, 0.001))
(12.3 ± 0.1)%
Scientific Notation
Scientific notation is used to display base-10 decimal numbers.
In scientific notation, the exponent is uniquely chosen so that the
mantissa m
satisfies 1 <= m < 10
.
>>> formatter = Formatter(exp_mode="scientific")
>>> print(formatter(123.456))
1.23456e+02
>>> print(formatter(123.456, 0.001))
(1.23456 ± 0.00001)e+02
By default the exponent is expressed using ASCII characters, e.g.
e+02
.
The sign symbol is always included and the exponent value is left padded
so that it is at least two digits wide.
These behaviors for ASCII exponents cannot be modified.
However the Superscript Exponent Format mode can be used to represent the
exponent using Unicode superscript characters.
Engineering Notation
Engineering notation is similar to scientific notation except the
exponent is chosen to be an integer multiple of 3.
In standard engineering notation, the mantissa m
satisfies
1 <= m < 1000
.
Engineering notation is compatible with the SI prefixes which are
defined for any integer multiple of 3 exponent, e.g.:
369,000,00 Hz = 369e+06 Hz = 369 MHz
Here it may be more convenient to display the number with an exponent of 6 for rapid identification of the “Mega” order of magnitude, as opposed to an exponent of 8 which would be chosen for standard scientific notation.
>>> formatter = Formatter(exp_mode="engineering")
>>> print(formatter(123.456))
123.456e+00
>>> print(formatter(123.456, 0.001))
(123.456 ± 0.001)e+00
Shifted Engineering Notation
Shifted engineering notation is the same as engineering notation except
the exponent is chosen so that the mantissa m
satisfies
0.1 <= m < 100
.
>>> formatter = Formatter(exp_mode="engineering_shifted")
>>> print(formatter(123.456))
0.123456e+03
>>> print(formatter(123.456, 0.001))
(0.123456 ± 0.000001)e+03
Binary
Binary formatting can be chosen to display a number in scientific notation in base-2.
>>> formatter = Formatter(exp_mode="binary")
>>> print(formatter(256))
1b+08
Here b
exponent symbol indicates base-2 instead of base-10.
For binary formatting, the mantissa m
satisfies 1 <= m < 2
.
Binary IEC
Binary IEC mode is similar to engineering notation, except in base-2.
In this mode number are expressed in base-2 exponent notation, but the
exponent is constrained to be a multiple of 10, consistent with the
IEC binary prefixes.
The mantissa m
satisfies 1 <= m < 1024
.
>>> formatter = Formatter(exp_mode="binary_iec")
>>> print(formatter(2048))
2b+10
Fixed Exponent
The user can coerce the exponent for the formatting to a fixed value.
>>> formatter = Formatter(exp_mode="scientific", exp_val=3)
>>> print(formatter(123.456))
0.123456e+03
To explicitly force sciform
to automatically select the exponent
then use the AutoExpVal
option by passing
exp_val=AutoExpVal
.
This is the default value in the global options.
Note that the forced exponent must be consistent with the requested exponent mode. For fixed point and percent modes an explicit fixed exponent must equal 0. For engineering and shifted engineering modes an explicit fixed exponent must be an integer multiple of 3. For binary IEC mode an explicit fixed exponent must be an integer multiple of 10. Because of this constrained behavior, it is recommended to only use a fixed exponent with the scientific or binary exponent modes.
Exponent String Replacement
sciform
provides a number of formatting options for replacing
decimal and binary exponent strings such as 'e-03'
or 'b+10'
with conventional strings such as 'm'
or 'Ki'
to succinctly
communicate the order of magnitude.
Decimal exponent strings can be replaced with either SI prefixes or
parts-per identifiers and binary exponent strings can be replaced with
IEC prefixes.
See Supported Exponent Replacements for all default supported
replacements.
Furthermore, it is possible to customize Formatter
objects or the global options settings to map additional translations,
in addition to those provided by default.
>>> formatter = Formatter(exp_mode="engineering", exp_format="prefix")
>>> print(formatter(4242.13))
4.24213 k
>>> formatter = Formatter(
... exp_mode="binary_iec",
... round_mode="sig_fig",
... ndigits=4,
... exp_format="prefix",
... )
>>> print(formatter(1300))
1.270 Ki
>>> formatter = Formatter(exp_mode="engineering", exp_format="parts_per")
>>> print(formatter(12.3e-6))
12.3 ppm
Extra Exponent Replacements
In addition to the default
exponent replacements, The user can modify the
available exponent replacements using a number of options.
The SI prefix, IEC prefix, and parts-per replacements can be modified
using the extra_si_prefixes
, extra_iec_prefixes
and
extra_parts_per_forms
options, respectively, and passing in
dictionaries with keys corresponding to integer exponents and values
corresponding to translated strings.
The entries in these dictionaries overwrite any default translation
mappings.
>>> formatter = Formatter(
... exp_mode="scientific",
... exp_format="prefix",
... extra_si_prefixes={-2: "c"},
... )
>>> print(formatter(3e-2))
3 c
Passing None
for the value for a corresponding exponent value will
force that exponent to not be translated.
>>> formatter = Formatter(exp_mode="engineering", exp_format="parts_per")
>>> print(formatter(3e-9))
3 ppb
>>> formatter = Formatter(
... exp_mode="engineering",
... exp_format="parts_per",
... extra_parts_per_forms={-9: None},
... )
>>> print(formatter(3e-9))
3e-09
Keys into the extra translations dictionaries must be integers and values must consist of only English alphabetic characters.
Two helper options exist to add additional SI prefix translations corresponding to:
{-2: 'c', -1: 'd', +1: 'da', +2: 'h'}
These SI prefixes are excluded by default because they do not correspond
to the integer-multiple-of-3 prefixes which are compatible with
engineering notation.
However, they can be easily be included using the add_c_prefix
and
add_small_si_prefixes
options.
>>> formatter = Formatter(
... exp_mode="scientific",
... exp_format="prefix",
... add_c_prefix=True,
... )
>>> print(formatter(0.025))
2.5 c
>>> formatter = Formatter(
... exp_mode="scientific",
... exp_format="prefix",
... add_small_si_prefixes=True,
... )
>>> print(formatter(25))
2.5 da
A parts-per-thousand form, ppth
, can be accessed with
the add_ppth_form
option.
Note that ppth
is not a standard notation for “parts-per-thousand”,
but it is one that the author has found useful.
>>> formatter = Formatter(
... exp_mode="engineering",
... exp_format="parts_per",
... add_ppth_form=True,
... )
>>> print(formatter(12.3e-3))
12.3 ppth
Note that the helper flags will not overwrite value/string pairs already specified in the extra translations dictionary:
>>> formatter = Formatter(
... exp_mode="scientific",
... exp_format="prefix",
... add_c_prefix=True,
... )
>>> print(formatter(0.012))
1.2 c
>>> formatter = Formatter(
... exp_mode="scientific",
... exp_format="prefix",
... extra_si_prefixes={-2: "zzz"},
... add_c_prefix=True,
... )
>>> print(formatter(0.012))
1.2 zzz
Note that there is never merging of local and global extra
translations.
If any local extra translation settings are configured directly with
e.g. extra_si_prefixes
or with a helper like
add_small_si_prefixes
then no global extra translations will be
used.
>>> from sciform import GlobalOptionsContext
>>> formatter = Formatter(
... exp_mode="scientific",
... exp_format="prefix",
... extra_si_prefixes={-4: "zzz"},
... )
>>> with GlobalOptionsContext(add_c_prefix=True):
... print(formatter(0.012))
1.2e-02
>>> formatter = Formatter(
... exp_mode="scientific",
... exp_format="prefix",
... add_c_prefix=True,
... )
>>> with GlobalOptionsContext(extra_si_prefixes={1: "zzz"}):
... print(formatter(12.4))
1.24e+01
If all local extra translation settings are left unset then all global extra translation settings will be populated at format time. This behavior is the same as the behavior for all other options.
Rounding
sciform
provides two rounding strategies: rounding based on
significant figures, and rounding based on decimal places.
In both cases, the rounding applies to the mantissa determined after
identifying the appropriate exponent for display based on the selected
exponent mode.
In some cases, the rounding results in a modification to the chosen
exponent (e.g. when presenting 9.99
in scientific exponent mode with
two digits past the decimal point sciform
displays
"9.99e+00"
, but with one digit past the decimal point sciform
displays "1.0e+01"
).
This is taken into account before the final presentation.
If the user does not specify the number of significant digits or the
digits place to which to round, then the decimal numbers are displayed
with full precision.
To explicitly request this behavior, the user may use the
AutoDigits
sentinel by passing ndigits=AutoDigits
.
This is the default value in the global options.
Note that surprising behavior may be observed if using float
inputs.
A float
input is handled by first being converted to a string
to realize the minimum number decimal digits necessary for the
float
to round trip and is then cast to Decimal
instance before determining the mantissa and exponent and applying the
rounding algorithm.
See Note on Decimals and Floats for more details.
Significant Figures
For significant figure rounding, first the digits place for the
most-significant digit is identified, then the number is rounded to
the specified number of significant figures below that digits place.
E.g. for 12345.678
the most-significant digit appears in the
ten-thousands, or 104, place.
To express this number to 4-significant digits means we should round it
to the tens, or 101, place resulting in 12350
.
Note that 1001 rounded to 1, 2, or 3 significant figures results in 1000. This demonstrates that we can’t determine how many significant figures a number was rounded to (or “how many significant figures a number has”) just by looking at the resulting string.
>>> formatter = Formatter(
... exp_mode="engineering",
... round_mode="sig_fig",
... ndigits=4,
... )
>>> print(formatter(12345.678))
12.35e+03
Here the ndigits
input is used to indicate how many significant
figures should be included.
for significant figure rounding, ndigits
must be an integer
greater than or equal 1.
Decimal Place
For decimal place rounding we specify the decimal place to which we want
to round using ndigits
.
The convention for ndigits
is the same as that for the built-in
round function.
E.g. ndigits=2
means to round to two digits past the decimal place,
the hundredths or 10-2 place, so that 12.987
would be
rounded to 12.99
.
>>> formatter = Formatter(exp_mode="engineering", round_mode="dec_place", ndigits=4)
>>> print(formatter(12345.678))
12.3457e+03
It is possible for ndigits <= 0
:
>>> formatter = Formatter(
... exp_mode="fixed_point",
... round_mode="dec_place",
... ndigits=-2,
... )
>>> print(formatter(12345.678))
12300
Automatic Rounding
If the user does not specify ndigits
or the user uses
AutoDigits
by passing ndigits=AutoDigits
, then
sciform
will automatically determine how rounding should be
performed.
For single value formatting the auto rounding mode will display the
input number with full precision.
For str
, int
and Decimal
inputs this is
unambiguous.
For float
inputs the float
is first converted to a
string and then converted to a decimal.
This means that the float
will be rounded to the minimum
necessary precision for it to “round-trip”.
See Note on Decimals and Floats for more details.
For value/uncertainty formatting, if ndigits=AutoDigits
and
pdg_sig_figs=False
, then the rounding strategy described in the
previous paragraph is used to round the uncertainty and the value is
rounded to the same decimal place as the uncertainty.
See Particle Data Group Significant Figures for more details.
Separators
sciform
provides support for some customization for separator
characters within formatting strings.
Different locales use different conventions for the symbol separating
the integral and fractional part of a number, called the decimal symbol.
sciform
supports using a period '.'
or comma ','
as the
decimal symbol.
Additionally, sciform
also supports including separation
characters between groups of three digits both above the decimal symbol
and below the decimal symbol.
''
, ','
, '.'
, ' '
, '_'
can all be used as
“upper” separator characters and ''
, ' '
, and '_'
can
all be used as “lower” separator characters.
Note that the upper separator character must be different than the
decimal separator.
>>> formatter = Formatter(upper_separator=",")
>>> print(formatter(12345678.987))
12,345,678.987
>>> formatter = Formatter(
... upper_separator=" ",
... decimal_separator=",",
... lower_separator="_",
... )
>>> print(formatter(1234567.7654321))
1 234 567,765_432_1
NIST discourages the use of ','
or '.'
as thousands separators
because they can be confused with the decimal separators depending on
the locality. See
NIST Guide to the SI 10.5.3.
Sign Mode
sciform
provides control over the symbol used to indicate whether
a number is positive or negative.
In all cases a '-'
sign is used for negative numbers.
By default, positive numbers are formatted with no sign symbol.
However, sciform
includes a mode where positive numbers are
always presented with a '+'
symbol.
sciform
also provides a mode where positive numbers include an
extra whitespace in place of a sign symbol.
This mode may be useful to match string lengths when positive and
negatives numbers are being presented together, but without explicitly
including a '+'
symbol.
>>> formatter = Formatter(sign_mode="-")
>>> print(formatter(42))
42
>>> formatter = Formatter(sign_mode="+")
>>> print(formatter(42))
+42
>>> formatter = Formatter(sign_mode=" ")
>>> print(formatter(42))
42
Note that both float
nan
and float
0
have sign
bits which may be positive or negative.
sciform
always ignores these sign bits and never puts a +
or
-
symbol in front of either nan
or 0
.
In "+"
or " "
sign modes nan
and 0
are always preceded
by a space.
The sign symbol for ±inf
is resolved the same as for
finite numbers.
>>> formatter = Formatter(sign_mode="-")
>>> print(formatter(float("-0")))
0
>>> print(formatter(float("-nan")))
nan
>>> print(formatter(float("+inf")))
inf
>>> formatter = Formatter(sign_mode="+")
>>> print(formatter(float("+0")))
0
>>> print(formatter(float("+nan")))
nan
>>> print(formatter(float("+inf")))
+inf
>>> formatter = Formatter(sign_mode=" ")
>>> print(formatter(float("-0")))
0
>>> print(formatter(float("-nan")))
nan
>>> print(formatter(float("-inf")))
-inf
Capitalization
The capitalization of the exponent character can be controlled
>>> formatter = Formatter(exp_mode="scientific", capitalize=True)
>>> print(formatter(42))
4.2E+01
>>> formatter = Formatter(exp_mode="binary", capitalize=True)
>>> print(formatter(1024))
1B+10
The capitalize
flag also controls the capitalization of nan
and
inf
formatting:
>>> print(formatter(float("nan")))
NAN
>>> print(formatter(float("-inf")))
-INF
Left Padding
The Rounding options described above can be used to control how
many digits to the right of either the most-significant digit or the
decimal point are displayed.
It is also possible, using left padding options, to add digits to the
left of the most-significant digit.
The left_pad_char
option can be used to select either whitespaces
' '
or zeros '0'
as pad characters.
The left_pad_dec_place
option is used to indicate to which decimal
place pad characters should be added.
E.g. left_pad_dec_place=4
indicates pad characters should be
added up to the 104 (ten-thousands) decimal place.
>>> formatter = Formatter(left_pad_char="0", left_pad_dec_place=4)
>>> print(formatter(42))
00042
Superscript Exponent Format
The superscript
option can be chosen to present exponents in
superscript notation as opposed to e.g. e+02
notation.
>>> formatter = Formatter(exp_mode="scientific", superscript=True)
>>> print(formatter(789))
7.89×10²
Include Exponent on nan and inf
Python supports 'nan'
, 'inf'
, and
'-inf'
numbers which are simply formatted to 'nan'
, 'inf'
,
and '-inf'
or 'NAN'
, 'INF'
, and '-INF'
, respectively,
depending on capitalize
.
However, if nan_inf_exp=True
(default False
), then, for
scientific, percent, engineering, and binary exponent modes, these will
instead be formatted as, e.g. '(nan)e+00'
.
>>> formatter = Formatter(
... exp_mode="scientific",
... nan_inf_exp=False,
... capitalize=True,
... )
>>> print(formatter(float("-inf")))
-INF
>>> formatter = Formatter(
... exp_mode="scientific",
... nan_inf_exp=True,
... capitalize=True,
... )
>>> print(formatter(float("-inf")))
(-INF)E+00
>>> formatter = Formatter(
... exp_mode="percent",
... nan_inf_exp=False,
... capitalize=True,
... )
>>> print(formatter(float("-inf")))
-INF
>>> formatter = Formatter(
... exp_mode="percent",
... nan_inf_exp=True,
... capitalize=True,
... )
>>> print(formatter(float("-inf")))
(-INF)%
Value/Uncertainty Formatting Options
For value/uncertainty formatting, the value + uncertainty pair are formatted as follows. First, significant figure rounding is applied to the uncertainty according to the specified precision. Next the value is rounded to the same position as the uncertainty. The exponent is then determined using the exponent mode and the larger of the value or uncertainty. The value and the uncertainty are then formatted into a single string according to the options below.
>>> formatter = Formatter()
>>> print(formatter(123.456, 0.789))
123.456 ± 0.789
Particle Data Group Significant Figures
Typically value/uncertainty pairs are formatted with one or two significant figures displayed for the uncertainty. The Particle Data Group has published an algorithm for deciding when to display uncertainty with one versus two significant figures. The algorithm is as follows.
Determine the three most significant digits of the uncertainty (without rounding). E.g. if the uncertainty is 0.004857 then these digits would be 485
If the scaled uncertainty is between 100 and 354 (inclusive) then the uncertainty is rounded and displayed to one digit below its most significant digit. This means it will have two significant digit. E.g. if the uncertainty is 3.03 then it will appear as as 3.0
If the scaled uncertainty is between 355 and 949 (inclusive) then the uncertainty is rounded and displayed to the same digit as the most significant digit. This means it will have one significant digit. E.g. if the uncertainty is 0.76932 then it will appear as 0.8
If the scaled uncertainty is between 950 and 999 (inclusive) then the uncertainty is rounded and displayed to the same digit as the most significant digit. But 950 and above will always be rounded to 1000 if we round to the hundreds place. This means there will be two significant digits. E.g. if the uncertainty is 0.0099 then it will be displayed as 0.010.
sciform
provides the ability to use this algorithm when
formatting value/uncertainty pairs by using significant figure rounding
mode and the pdg_sig_figs
flag.
>>> from sciform import AutoDigits
>>> formatter = Formatter(
... round_mode="sig_fig",
... pdg_sig_figs=True,
... )
>>> print(formatter(1, 0.0123))
1.000 ± 0.012
>>> print(formatter(1, 0.0483))
1.00 ± 0.05
>>> print(formatter(1, 0.0997))
1.00 ± 0.10
If pdg_sig_figs=True
then ndigits
is ignored for
value/uncertainty formatting.
pdg_sig_figs
is always ignored in favor of ndigits
for single
value formatting.
Parentheses Uncertainty
The BIPM Guide Section 7.2.2 Provides three example value/uncertainty formats:
100,021 47 ± 0,000 35
100,021 47(0,000 35)
100,021 47(35)
In the first example the value and uncertainty are shown as regular
numbers separated by a ±
.
In the second example the uncertainty is shown after the value inside
parentheses.
The third example is like the second but all leading zeros and separator
characters have been removed.
In the third example it is clear to see exactly which digits of the
value are uncertain and by how much.
sciform
provides the ability to realize all three of these
formatting strategies by using the paren_uncertainty
and
paren_uncertainty_trim
options.
>>> from sciform import Formatter, GlobalOptionsContext
>>> value = 100.02147
>>> uncertainty = 0.00035
>>>
>>> with GlobalOptionsContext(decimal_separator=",", lower_separator=" "):
... formatter = Formatter(
... paren_uncertainty=False,
... )
... print(formatter(value, uncertainty))
...
... formatter = Formatter(
... paren_uncertainty=True,
... paren_uncertainty_trim=False,
... )
... print(formatter(value, uncertainty))
...
... formatter = Formatter(
... paren_uncertainty=True,
... paren_uncertainty_trim=True,
... )
... print(formatter(value, uncertainty))
100,021 47 ± 0,000 35
100,021 47(0,000 35)
100,021 47(35)
paren_uncertainty_trim
eliminates all separators which are not the
decimal separator.
>>> value = 100.0215
>>> uncertainty = 1.2345
>>> formatter = Formatter(
... paren_uncertainty=True,
... paren_uncertainty_trim=False,
... decimal_separator=",",
... lower_separator=" ",
... )
>>> print(formatter(value, uncertainty))
100,021 5(1,234 5)
>>> formatter = Formatter(
... paren_uncertainty=True,
... paren_uncertainty_trim=True,
... decimal_separator=",",
... lower_separator=" ",
... )
>>> print(formatter(value, uncertainty))
100,021 5(1,2345)
Note that the BIPM guide does not show any examples where the digits of the uncertainty span either a grouping or decimal separator. This means there is no official guidance about
Should
18.4 ± 2.1
be formatted as18.4(2.1)
or18.4(21)
.Should
18.456 4 ± 0.002 1
be formatted as18.456 4(2 1)
or18.456 4(21)
.
sciform
formats the trimmed parentheses uncertainty mode by
never removing the decimal separator unless it is to the left of the
most significant digit of the uncertainty but to always remove all
upper and lower separator characters.
By contrast, the siunitx
LaTeX package always removes all separators characters, including the
decimal.
The default global options have paren_uncertainty=False
and
paren_uncertainty_trim=True
.
We can look at the interplay between parentheses uncertainty mode and exponent options.
>>> formatter = Formatter(
... exp_mode="engineering",
... exp_format="standard",
... paren_uncertainty=True,
... )
>>> print(formatter(523.4e-3, 1.2e-3))
523.4(1.2)e-03
>>> formatter = Formatter(
... exp_mode="engineering",
... exp_format="prefix",
... paren_uncertainty=True,
... )
>>> print(formatter(523.4e-3, 1.2e-3))
523.4(1.2) m
We see that in the ASCII formatting mode parentheses are included around the value/uncertainty, but they are excluded in the SI prefix mode. This is consistent with the BIPM examples.
Match Value/Uncertainty Width
If the user passes left_pad_dec_place
into a Formatter
,
then that decimal place will be used for left padding both the value and
the uncertainty.
sciform
provides additional control over the left padding of the
value and the uncertainty by allowing the user to left pad to the
maximum of (1) the specified left_pad_dec_place
, (2) the most
significant digit of the value, and (3) the most significant digit of
the uncertainty.
This feature is accessed with the left_pad_matching
option.
>>> formatter = Formatter(
... left_pad_char="0",
... left_pad_dec_place=2,
... left_pad_matching=False,
... )
>>> print(formatter(12345, 1.23))
12345.00 ± 001.23
>>> formatter = Formatter(
... left_pad_char="0",
... left_pad_dec_place=2,
... left_pad_matching=True,
... )
>>> print(formatter(12345, 1.23))
12345.00 ± 00001.23