Skip to content

Releases: p12tic/libsimdpp

2.1 (C++11 version)

14 Dec 16:04
Compare
Choose a tag to compare

The library supports the following architectures and instruction sets:

  • x86, x86-64: SSE2, SSE3, SSSE3, SSE4.1, AVX, AVX2, FMA3, FMA4, AVX512F,
    AVX512BW, AVX512DQ, AVX512VL, XOP
  • ARM 32-bit: NEON, NEONv2
  • ARM 64-bit: NEON, NEONv2
  • PowerPC 32-bit big-endian: Altivec, VSX v2.06, VSX v2.07
  • PowerPC 64-bit little-endian: Altivec, VSX v2.06, VSX v2.07
  • MIPS 32-bit little-endian: MSA
  • MIPS 64-bit little-endian: MSA

Supported compilers:

  • C++11 version:

    • GCC: 4.8-7.x
    • Clang: 3.3-4.0
    • Xcode 7.0-9.x
    • MSVC: 2013, 2015, 2017
    • ICC (on both Linux and Windows): 2013, 2015, 2016, 2017
  • C++98 version

    • GCC: 4.4-7.x
    • Clang: 3.3-4.0
    • Xcode 7.0-9.x
    • MSVC: 2013, 2015, 2017
    • ICC (on both Linux and Windows): 2013, 2015, 2016, 2017

Newer versions of the aforementioned compilers will generally work with either
C++11 or C++98 version of the library. Older versions of these compilers will
generally work with the C++98 version of the library.

Changes since v2.0:

  • Various bug fixes
  • Documentation has been significantly improved. The public API is now almost
    fully documented.
  • Added support for MIPS MSA instruction set.
  • Added support for PowerPC VSX v2.06 and v2.07 instruction sets.
  • Added support for x86 AVX512BW, AVX512DQ and AVX512VL instruction sets.
  • Added support for 64-bit little-endian PowerPC.
  • Added support for arbitrary width vectors in extract() and insert().
  • Added support for arbitrary source vectors to to_int8(), to_uint8(),
    to_int16(), to_uint16(), to_int32(), to_uint32(), to_int64(), to_uint64(),
    to_float32(), to_float64().
  • Added support for per-element integer shifts to shift_r() and shift_l().
    Fallback paths are provided for SSE2-AVX instruction sets that lack
    hardware per-element integer shift support.
  • Make shuffle_bytes16(), shuffle_zbytes16(), permute_bytes16() and
    permute_zbytes() more generic.
  • New functions: popcnt, reduce_popcnt, for_each, to_mask().
  • Xcode is now supported.
  • The library has been refactored in such a way that older compilers are able
    to optimize vector emulation code paths much better than before.
  • Deprecation: implicit conversion operators to native vector types has been
    deprecated and a replacement method has been provided instead. The implicit
    conversion operators may lead to wrong code being accepted without a
    compile error on Clang.

2.1 (C++98 version)

14 Dec 16:04
Compare
Choose a tag to compare

The library supports the following architectures and instruction sets:

  • x86, x86-64: SSE2, SSE3, SSSE3, SSE4.1, AVX, AVX2, FMA3, FMA4, AVX512F,
    AVX512BW, AVX512DQ, AVX512VL, XOP
  • ARM 32-bit: NEON, NEONv2
  • ARM 64-bit: NEON, NEONv2
  • PowerPC 32-bit big-endian: Altivec, VSX v2.06, VSX v2.07
  • PowerPC 64-bit little-endian: Altivec, VSX v2.06, VSX v2.07
  • MIPS 32-bit little-endian: MSA
  • MIPS 64-bit little-endian: MSA

Supported compilers:

  • C++11 version:

    • GCC: 4.8-7.x
    • Clang: 3.3-4.0
    • Xcode 7.0-9.x
    • MSVC: 2013, 2015, 2017
    • ICC (on both Linux and Windows): 2013, 2015, 2016, 2017
  • C++98 version

    • GCC: 4.4-7.x
    • Clang: 3.3-4.0
    • Xcode 7.0-9.x
    • MSVC: 2013, 2015, 2017
    • ICC (on both Linux and Windows): 2013, 2015, 2016, 2017

Newer versions of the aforementioned compilers will generally work with either
C++11 or C++98 version of the library. Older versions of these compilers will
generally work with the C++98 version of the library.

Changes since v2.0:

  • Various bug fixes
  • Documentation has been significantly improved. The public API is now almost
    fully documented.
  • Added support for MIPS MSA instruction set.
  • Added support for PowerPC VSX v2.06 and v2.07 instruction sets.
  • Added support for x86 AVX512BW, AVX512DQ and AVX512VL instruction sets.
  • Added support for 64-bit little-endian PowerPC.
  • Added support for arbitrary width vectors in extract() and insert().
  • Added support for arbitrary source vectors to to_int8(), to_uint8(),
    to_int16(), to_uint16(), to_int32(), to_uint32(), to_int64(), to_uint64(),
    to_float32(), to_float64().
  • Added support for per-element integer shifts to shift_r() and shift_l().
    Fallback paths are provided for SSE2-AVX instruction sets that lack
    hardware per-element integer shift support.
  • Make shuffle_bytes16(), shuffle_zbytes16(), permute_bytes16() and
    permute_zbytes() more generic.
  • New functions: popcnt, reduce_popcnt, for_each, to_mask().
  • Xcode is now supported.
  • The library has been refactored in such a way that older compilers are able
    to optimize vector emulation code paths much better than before.
  • Deprecation: implicit conversion operators to native vector types has been
    deprecated and a replacement method has been provided instead. The implicit
    conversion operators may lead to wrong code being accepted without a
    compile error on Clang.

2.0 (C++11 version)

20 Aug 12:49
Compare
Choose a tag to compare

The library supports the following architectures and instruction sets:

  • x86, x86-64: SSE2, SSE3, SSSE3, SSE4.1, AVX, AVX2, FMA3, FMA4, AVX-512F,
    XOP
  • ARM, ARM64: NEON
  • PowerPC: Altivec

Supported compilers:

  • C++11 version:

    • GCC: 4.8-6.x
    • Clang: 3.3-4.0
    • MSVC: 2013, 2015, 2017
    • ICC (on both Linux and Windows): 2013, 2015, 2016, 2017
  • C++98 version

    • GCC: 4.4-6.x
    • Clang: 3.3-4.0
    • MSVC: 2013, 2015, 2017
    • ICC (on both Linux and Windows): 2013, 2015, 2016, 2017

Clang 3.3 is not supported on ARM. MSVC and ICC are only supported on x86 and
x86-64.

Newer versions of the aforementioned compilers will generally work with either
C++11 or C++98 version of the library. Older versions of these compilers will
generally work with the C++98 version of the library.

Changes since 2.0-rc2:

  • Intel compiler is now supported on Windows. Newer versions of other compilers
    are now supported.
  • Various bug fixes.

Changes since 1.0:

  • Expression template-based backend. It is used only for functions that may
    benefit from micro-optimizations (e.g. when several instructions can be merged
    into one).
  • Support for vectors much longer than the native vector type. The only
    limitation is that the length must be a power of 2. The widest available
    instructions are used for the particular vector type.
  • Visual Studio and Intel Compiler support
  • AVX-512F, Altivec and NEONv2 support
  • Vector initialization is simplified, for example: int32<8> v = make_uint(2); or
    int* p = ...; v = load(p);.
  • Curriously recurring template pattern is used to categorize vector
    types. Function templates no longer need to be written for each vector
    type or their combination, instead, an appropriate vector category may
    be used.
  • Each vector type can be explicitly constructed from any other vector
    with the same size.
  • Most functions accept much wider range of vector type combinations. For
    example, bitwise functions accept any two vectors of the same size.
  • If different vector types are used as arguments to such functions, the
    return type is computed as if one or both of the arguments were "promoted"
    according to certain rules. For example, int32 + int32 --> int32, whereas
    uint32 + int32 --> uint32, and uint32 + float32 --> float32. See
    simdpp/types/tag.h for more information.
  • API break: int128 and int256 types have been removed. On some architectures
    such as AVX512 it's more efficient to have different physical representations
    for vectors with different element widths. E.g. 8-bit integer elements would
    use 256-bit vectors and 32-bit integer elements would use 512-bit vectors.
  • API break: basic_int## types have been removed. The CRTP-based type
    categorization and promotion rules make second inheritance-based vector
    categorization system impossible. In majority of cases basic_int## can be
    straightforwardly replaced with uint##.
  • API break: {vector type}::make_const, {vector type}::zero and
    {vector type}::ones have been removed to simplify the library. Use the new
    make_int, make_uint, make_float, make_zero and make_ones free
    functions that produce a construct expression.
  • API break: broadcast family of functions have been renamed to splat
  • API break: permute family of functions has been renamed to permute2 and
    permute4 depending on the number of template arguments taken.
  • API break: value conversion functions such as to_float32x4 have been renamed
    and now returns a vector with the same number of elements as the source
    vector.
  • API break: SIMDPP_USER_ARCH_INFO now accepts any expression, not only a
    function
  • API break: unsigned conversions have been renamed to to_uintXX to reduce
    confusion.
  • API break: saturated add and sub are now called add_sat and sub_sat

No further significant API changes are planned.

2.0 (C++98 version)

20 Aug 12:50
Compare
Choose a tag to compare

The library supports the following architectures and instruction sets:

  • x86, x86-64: SSE2, SSE3, SSSE3, SSE4.1, AVX, AVX2, FMA3, FMA4, AVX-512F,
    XOP
  • ARM, ARM64: NEON
  • PowerPC: Altivec

Supported compilers:

  • C++11 version:

    • GCC: 4.8-6.x
    • Clang: 3.3-4.0
    • MSVC: 2013, 2015, 2017
    • ICC (on both Linux and Windows): 2013, 2015, 2016, 2017
  • C++98 version

    • GCC: 4.4-6.x
    • Clang: 3.3-4.0
    • MSVC: 2013, 2015, 2017
    • ICC (on both Linux and Windows): 2013, 2015, 2016, 2017

Clang 3.3 is not supported on ARM. MSVC and ICC are only supported on x86 and
x86-64.

Newer versions of the aforementioned compilers will generally work with either
C++11 or C++98 version of the library. Older versions of these compilers will
generally work with the C++98 version of the library.

Changes since 2.0-rc2:

  • Intel compiler is now supported on Windows. Newer versions of other compilers
    are now supported.
  • Various bug fixes.

Changes since 1.0:

  • Expression template-based backend. It is used only for functions that may
    benefit from micro-optimizations (e.g. when several instructions can be merged
    into one).
  • Support for vectors much longer than the native vector type. The only
    limitation is that the length must be a power of 2. The widest available
    instructions are used for the particular vector type.
  • Visual Studio and Intel Compiler support
  • AVX-512F, Altivec and NEONv2 support
  • Vector initialization is simplified, for example: int32<8> v = make_uint(2); or
    int* p = ...; v = load(p);.
  • Curriously recurring template pattern is used to categorize vector
    types. Function templates no longer need to be written for each vector
    type or their combination, instead, an appropriate vector category may
    be used.
  • Each vector type can be explicitly constructed from any other vector
    with the same size.
  • Most functions accept much wider range of vector type combinations. For
    example, bitwise functions accept any two vectors of the same size.
  • If different vector types are used as arguments to such functions, the
    return type is computed as if one or both of the arguments were "promoted"
    according to certain rules. For example, int32 + int32 --> int32, whereas
    uint32 + int32 --> uint32, and uint32 + float32 --> float32. See
    simdpp/types/tag.h for more information.
  • API break: int128 and int256 types have been removed. On some architectures
    such as AVX512 it's more efficient to have different physical representations
    for vectors with different element widths. E.g. 8-bit integer elements would
    use 256-bit vectors and 32-bit integer elements would use 512-bit vectors.
  • API break: basic_int## types have been removed. The CRTP-based type
    categorization and promotion rules make second inheritance-based vector
    categorization system impossible. In majority of cases basic_int## can be
    straightforwardly replaced with uint##.
  • API break: {vector type}::make_const, {vector type}::zero and
    {vector type}::ones have been removed to simplify the library. Use the new
    make_int, make_uint, make_float, make_zero and make_ones free
    functions that produce a construct expression.
  • API break: broadcast family of functions have been renamed to splat
  • API break: permute family of functions has been renamed to permute2 and
    permute4 depending on the number of template arguments taken.
  • API break: value conversion functions such as to_float32x4 have been renamed
    and now returns a vector with the same number of elements as the source
    vector.
  • API break: SIMDPP_USER_ARCH_INFO now accepts any expression, not only a
    function
  • API break: unsigned conversions have been renamed to to_uintXX to reduce
    confusion.
  • API break: saturated add and sub are now called add_sat and sub_sat

No further significant API changes are planned.

2.0 release candidate 2 (C++11 version)

03 Apr 20:45
Compare
Choose a tag to compare

The library supports the following architectures and instruction sets:

  • x86, x86-64: SSE2, SSE3, SSSE3, SSE4.1, AVX, AVX2, FMA3, FMA4, AVX-512F,
    XOP
  • ARM, ARM64: NEON
  • PowerPC: Altivec

Supported compilers:

  • C++11 version:
    • GCC: 4.8-5.3
    • Clang: 3.3-3.8
    • MSVC: 2013
    • ICC: 2013, 2015
  • C++98 version
    • GCC: 4.4-5.3
    • Clang: 3.3-3.8
    • MSVC: 2013
    • ICC: 2013, 2015

Clang 3.3 is not supported on ARM. MSVC and ICC are only supported on x86 and
x86-64.

Newer versions of the aforementioned compilers will generally work with either
C++11 or C++98 version of the library. Older versions of these compilers will
generally work with the C++98 version of the library.

Changes since 1.0:

  • Expression template-based backend. It is used only for functions that may
    benefit from micro-optimizations (e.g. when several instructions can be merged
    into one).
  • Support for vectors much longer than the native vector type. The only
    limitation is that the length must be a power of 2. The widest available
    instructions are used for the particular vector type.
  • Visual Studio and Intel Compiler support
  • AVX-512F, Altivec and NEONv2 support
  • Vector initialization is simplified, for example: int32<8> v = make_uint(2); or
    int* p = ...; v = load(p);.
  • Curriously recurring template pattern is used to categorize vector
    types. Function templates no longer need to be written for each vector
    type or their combination, instead, an appropriate vector category may
    be used.
  • Each vector type can be explicitly constructed from any other vector
    with the same size.
  • Most functions accept much wider range of vector type combinations. For
    example, bitwise functions accept any two vectors of the same size.
  • If different vector types are used as arguments to such functions, the
    return type is computed as if one or both of the arguments were "promoted"
    according to certain rules. For example, int32 + int32 --> int32, whereas
    uint32 + int32 --> uint32, and uint32 + float32 --> float32. See
    simdpp/types/tag.h for more information.
  • API break: int128 and int256 types have been removed. On some architectures
    such as AVX512 it's more efficient to have different physical representations
    for vectors with different element widths. E.g. 8-bit integer elements would
    use 256-bit vectors and 32-bit integer elements would use 512-bit vectors.
  • API break: basic_int## types have been removed. The CRTP-based type
    categorization and promotion rules make second inheritance-based vector
    categorization system impossible. In majority of cases basic_int## can be
    straightforwardly replaced with uint##.
  • API break: {vector type}::make_const, {vector type}::zero and
    {vector type}::ones have been removed to simplify the library. Use the new
    make_int, make_uint, make_float, make_zero and make_ones free
    functions that produce a construct expression.
  • API break: broadcast family of functions have been renamed to splat
  • API break: permute family of functions has been renamed to permute2 and
    permute4 depending on the number of template arguments taken.
  • API break: value conversion functions such as to_float32x4 have been renamed
    and now returns a vector with the same number of elements as the source
    vector.
  • API break: SIMDPP_USER_ARCH_INFO now accepts any expression, not only a
    function
  • API break: unsigned conversions have been renamed to to_uintXX to reduce
    confusion.
  • API break: saturated add and sub are now called add_sat and sub_sat

No further significant API changes are planned.

2.0 release candidate 2 (C++03 version)

03 Apr 20:46
Compare
Choose a tag to compare

The library supports the following architectures and instruction sets:

  • x86, x86-64: SSE2, SSE3, SSSE3, SSE4.1, AVX, AVX2, FMA3, FMA4, AVX-512F,
    XOP
  • ARM, ARM64: NEON
  • PowerPC: Altivec

Supported compilers:

  • C++11 version:
    • GCC: 4.8-5.3
    • Clang: 3.3-3.8
    • MSVC: 2013
    • ICC: 2013, 2015
  • C++98 version
    • GCC: 4.4-5.3
    • Clang: 3.3-3.8
    • MSVC: 2013
    • ICC: 2013, 2015

Clang 3.3 is not supported on ARM. MSVC and ICC are only supported on x86 and
x86-64.

Newer versions of the aforementioned compilers will generally work with either
C++11 or C++98 version of the library. Older versions of these compilers will
generally work with the C++98 version of the library.

Changes since 1.0:

  • Expression template-based backend. It is used only for functions that may
    benefit from micro-optimizations (e.g. when several instructions can be merged
    into one).
  • Support for vectors much longer than the native vector type. The only
    limitation is that the length must be a power of 2. The widest available
    instructions are used for the particular vector type.
  • Visual Studio and Intel Compiler support
  • AVX-512F, Altivec and NEONv2 support
  • Vector initialization is simplified, for example: int32<8> v = make_uint(2); or
    int* p = ...; v = load(p);.
  • Curriously recurring template pattern is used to categorize vector
    types. Function templates no longer need to be written for each vector
    type or their combination, instead, an appropriate vector category may
    be used.
  • Each vector type can be explicitly constructed from any other vector
    with the same size.
  • Most functions accept much wider range of vector type combinations. For
    example, bitwise functions accept any two vectors of the same size.
  • If different vector types are used as arguments to such functions, the
    return type is computed as if one or both of the arguments were "promoted"
    according to certain rules. For example, int32 + int32 --> int32, whereas
    uint32 + int32 --> uint32, and uint32 + float32 --> float32. See
    simdpp/types/tag.h for more information.
  • API break: int128 and int256 types have been removed. On some architectures
    such as AVX512 it's more efficient to have different physical representations
    for vectors with different element widths. E.g. 8-bit integer elements would
    use 256-bit vectors and 32-bit integer elements would use 512-bit vectors.
  • API break: basic_int## types have been removed. The CRTP-based type
    categorization and promotion rules make second inheritance-based vector
    categorization system impossible. In majority of cases basic_int## can be
    straightforwardly replaced with uint##.
  • API break: {vector type}::make_const, {vector type}::zero and
    {vector type}::ones have been removed to simplify the library. Use the new
    make_int, make_uint, make_float, make_zero and make_ones free
    functions that produce a construct expression.
  • API break: broadcast family of functions have been renamed to splat
  • API break: permute family of functions has been renamed to permute2 and
    permute4 depending on the number of template arguments taken.
  • API break: value conversion functions such as to_float32x4 have been renamed
    and now returns a vector with the same number of elements as the source
    vector.
  • API break: SIMDPP_USER_ARCH_INFO now accepts any expression, not only a
    function
  • API break: unsigned conversions have been renamed to to_uintXX to reduce
    confusion.
  • API break: saturated add and sub are now called add_sat and sub_sat

No further significant API changes are planned.

2.0 release candidate (C++11 version)

16 Mar 21:59
Compare
Choose a tag to compare

The library supports the following architectures and instruction sets:

  • x86, x86-64: SSE2, SSE3, SSSE3, SSE4.1, AVX, AVX2, FMA3, FMA4, AVX-512F,
    XOP
  • ARM, ARM64: NEON
  • PowerPC: Altivec

Supported compilers:

  • C++11 version:
    • GCC: 4.8-5.3
    • Clang: 3.3-3.8
    • MSVC: 2013
    • ICC: 2013, 2015
  • C++98 version
    • GCC: 4.4-5.3
    • Clang: 3.3-3.8
    • MSVC: 2013
    • ICC: 2013, 2015

Clang 3.3 is not supported on ARM. MSVC and ICC are only supported on x86 and
x86-64.

Newer versions of the aforementioned compilers will generally work with either
C++11 or C++98 version of the library. Older versions of these compilers will
generally work with the C++98 version of the library.

Changes since 1.0:

  • Expression template-based backend. It is used only for functions that may
    benefit from micro-optimizations (e.g. when several instructions can be merged
    into one).
  • Support for vectors much longer than the native vector type. The only
    limitation is that the length must be a power of 2. The widest available
    instructions are used for the particular vector type.
  • Visual Studio and Intel Compiler support
  • AVX-512F, Altivec and NEONv2 support
  • Vector initialization is simplified, for example: int32<8> v = make_uint(2); or
    int* p = ...; v = load(p);.
  • Curriously recurring template pattern is used to categorize vector
    types. Function templates no longer need to be written for each vector
    type or their combination, instead, an appropriate vector category may
    be used.
  • Each vector type can be explicitly constructed from any other vector
    with the same size.
  • Most functions accept much wider range of vector type combinations. For
    example, bitwise functions accept any two vectors of the same size.
  • If different vector types are used as arguments to such functions, the
    return type is computed as if one or both of the arguments were "promoted"
    according to certain rules. For example, int32 + int32 --> int32, whereas
    uint32 + int32 --> uint32, and uint32 + float32 --> float32. See
    simdpp/types/tag.h for more information.
  • API break: int128 and int256 types have been removed. On some architectures
    such as AVX512 it's more efficient to have different physical representations
    for vectors with different element widths. E.g. 8-bit integer elements would
    use 256-bit vectors and 32-bit integer elements would use 512-bit vectors.
  • API break: basic_int## types have been removed. The CRTP-based type
    categorization and promotion rules make second inheritance-based vector
    categorization system impossible. In majority of cases basic_int## can be
    straightforwardly replaced with uint##.
  • API break: {vector type}::make_const, {vector type}::zero and
    {vector type}::ones have been removed to simplify the library. Use the new
    make_int, make_uint, make_float, make_zero and make_ones free
    functions that produce a construct expression.
  • API break: broadcast family of functions have been renamed to splat
  • API break: permute family of functions has been renamed to permute2 and
    permute4 depending on the number of template arguments taken.
  • API break: value conversion functions such as to_float32x4 have been renamed
    and now returns a vector with the same number of elements as the source
    vector.
  • API break: SIMDPP_USER_ARCH_INFO now accepts any expression, not only a
    function
  • API break: unsigned conversions have been renamed to to_uintXX to reduce
    confusion.
  • API break: saturated add and sub are now called add_sat and sub_sat

No further significant API changes are planned.

2.0 release candidate (C++03 version)

16 Mar 22:00
Compare
Choose a tag to compare

The library supports the following architectures and instruction sets:

  • x86, x86-64: SSE2, SSE3, SSSE3, SSE4.1, AVX, AVX2, FMA3, FMA4, AVX-512F,
    XOP
  • ARM, ARM64: NEON
  • PowerPC: Altivec

Supported compilers:

  • C++11 version:
    • GCC: 4.8-5.3
    • Clang: 3.3-3.8
    • MSVC: 2013
    • ICC: 2013, 2015
  • C++98 version
    • GCC: 4.4-5.3
    • Clang: 3.3-3.8
    • MSVC: 2013
    • ICC: 2013, 2015

Clang 3.3 is not supported on ARM. MSVC and ICC are only supported on x86 and
x86-64.

Newer versions of the aforementioned compilers will generally work with either
C++11 or C++98 version of the library. Older versions of these compilers will
generally work with the C++98 version of the library.

Changes since 1.0:

  • Expression template-based backend. It is used only for functions that may
    benefit from micro-optimizations (e.g. when several instructions can be merged
    into one).
  • Support for vectors much longer than the native vector type. The only
    limitation is that the length must be a power of 2. The widest available
    instructions are used for the particular vector type.
  • Visual Studio and Intel Compiler support
  • AVX-512F, Altivec and NEONv2 support
  • Vector initialization is simplified, for example: int32<8> v = make_uint(2); or
    int* p = ...; v = load(p);.
  • Curriously recurring template pattern is used to categorize vector
    types. Function templates no longer need to be written for each vector
    type or their combination, instead, an appropriate vector category may
    be used.
  • Each vector type can be explicitly constructed from any other vector
    with the same size.
  • Most functions accept much wider range of vector type combinations. For
    example, bitwise functions accept any two vectors of the same size.
  • If different vector types are used as arguments to such functions, the
    return type is computed as if one or both of the arguments were "promoted"
    according to certain rules. For example, int32 + int32 --> int32, whereas
    uint32 + int32 --> uint32, and uint32 + float32 --> float32. See
    simdpp/types/tag.h for more information.
  • API break: int128 and int256 types have been removed. On some architectures
    such as AVX512 it's more efficient to have different physical representations
    for vectors with different element widths. E.g. 8-bit integer elements would
    use 256-bit vectors and 32-bit integer elements would use 512-bit vectors.
  • API break: basic_int## types have been removed. The CRTP-based type
    categorization and promotion rules make second inheritance-based vector
    categorization system impossible. In majority of cases basic_int## can be
    straightforwardly replaced with uint##.
  • API break: {vector type}::make_const, {vector type}::zero and
    {vector type}::ones have been removed to simplify the library. Use the new
    make_int, make_uint, make_float, make_zero and make_ones free
    functions that produce a construct expression.
  • API break: broadcast family of functions have been renamed to splat
  • API break: permute family of functions has been renamed to permute2 and
    permute4 depending on the number of template arguments taken.
  • API break: value conversion functions such as to_float32x4 have been renamed
    and now returns a vector with the same number of elements as the source
    vector.
  • API break: SIMDPP_USER_ARCH_INFO now accepts any expression, not only a
    function
  • API break: unsigned conversions have been renamed to to_uintXX to reduce
    confusion.
  • API break: saturated add and sub are now called add_sat and sub_sat

No further significant API changes are planned.