Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A big refresh of the datatype engine #6695

Merged
merged 9 commits into from
Jul 23, 2019

Conversation

bosilca
Copy link
Member

@bosilca bosilca commented May 20, 2019

Faster and less error prone. this patch is a significant redesign of the internals of the datatype engine. No API or ABI changes.

Fixes #5540 (Issue with overlapping vector datatype)

@bosilca
Copy link
Member Author

bosilca commented May 22, 2019

I put together the datatype related benchmarks we exchanged on the different datatype issues and PRs in a single repo (https://github.com/bosilca/ddt_bench).

@hppritcha
Copy link
Member

bot:ompi:retest

@bosilca bosilca force-pushed the fix/vector_stride_0 branch 4 times, most recently from bdec6b2 to de6a289 Compare June 19, 2019 19:47
@jsquyres
Copy link
Member

As noted on the 2019-06-25 webex, this PR is both a performance enhancement for the DDT engine, but it's also a fix for #5540 (i.e., an issue that has been observed on the v4.0.x branch).

Copy link
Contributor

@ggouaillardet ggouaillardet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just an idea, we could always define CHECKSUM to 0 or 1, and then if (CHECKSUM && convertor->flags & CONVERTOR_WITH_CHECKSUM) and let the compiler handle the dead code.
That would make the code more compact and easier to maintain.

@bosilca bosilca force-pushed the fix/vector_stride_0 branch 4 times, most recently from 586e568 to 6e0ce95 Compare June 27, 2019 19:56
@hppritcha
Copy link
Member

hppritcha commented Jul 8, 2019

@derbeyn could you re-review?

@hppritcha
Copy link
Member

@ggouaillardet could you review this PR when you have time?

Move toward a base type of vector (count, type, blocklen, extent, disp)
with disp and extent applying toward the count repertition and blocklen
being a contiguous memory of type type.
Implement 2 optimizations on this description used during type_commit:
- collapse: successive similar datatype descriptions are collapsed
together with an increased count.
- fusion: fuse successive datatype descriptions in order to minimize the
number of resulting memcpy during pack/unpack.

Fixes at the OMPI datatype level including:
 - Fix the create_hindexed and vector creation.
 - Fix the handling of [get|set]_elements and _count.
 - Correctly compute the dispacement for block indexed types.
 - Support the MPI_LB and MPI_UB deprecation, aka. OMPI_ENABLE_MPI1_COMPAT.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Merge contiguous iov in order to minimize the number of returned iovec.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
- optimize handling of contiguous with gaps datatypes.
- fixes a performance issue for all datatypes with a count of 1.
- optimize the pack/unpack of contiguous with gaps datatype.
- optimize the case of blocklen == 1

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Rework the to_self test to be able to be used as a benchmark.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Upon detecting a datatype loop representation skip the entire loop
according the the remaining space.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Optimize contiguous loops by collapsing them into a single element.
During datatype optimization collapse similar elements into larger
blocks.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Amazing how a bad instruction scheduling can have such a drastic impact
on the code performance. With this change, the get a boost of at least
50% on the performance of data with a small blocklen and/or count.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
@gpaulsen
Copy link
Member

Looks like both opal_datatype_test and ddt_test Aborted at runtime in Mellanox CI (http://bgate.mellanox.com/jenkins/job/gh-ompi-master-pr/10139). This may be a real error, or perhaps the test case needs to be updated.

@jsquyres
Copy link
Member

Copying the failed results here, because CI logs get recycled every so often:

...
07:45:45 PASS: unpack_hetero
07:45:45 PASS: position
07:45:45 PASS: checksum
07:45:45 PASS: position_noncontig
07:45:45 PASS: unpack_ooo
07:45:45 PASS: ddt_raw2
07:45:45 ../../config/test-driver: line 107:  6960 Aborted                 "$@" > $log_file 2>&1
07:45:45 FAIL: opal_datatype_test
07:45:45 PASS: external32
07:45:45 PASS: ddt_pack
07:45:45 PASS: large_data
07:45:45 ../../config/test-driver: line 107:  6978 Aborted                 "$@" > $log_file 2>&1
07:45:45 FAIL: ddt_test
07:45:45 PASS: ddt_raw

Start optimizing the code.

This commit divides the operations in 2 parts, the first, outside the
critical part, deals with partial blocks of predefined elements, and the
second, inside the critical path, only deals with full blocks of
elements. This reduces the number of expensive operations in the
critical path and results in a decent performance increase.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
@bosilca bosilca merged commit 94f26f5 into open-mpi:master Jul 23, 2019
@gpaulsen
Copy link
Member

gpaulsen commented Aug 5, 2019

@bosilca,

@hppritcha and I were discussing this PR. Would you mind PRing this back to v4.0.x (if you agree it's appropriate for v4.0.2) to resolve Issue #5540. That issue was marked as blocking a v4.0.2 release.

@bosilca
Copy link
Member Author

bosilca commented Aug 5, 2019

@hppritcha and @gpaulsen here is the 4.0 PR #6863.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Issue with overlapping vector datatype
8 participants