(Closes #2027) fix for ArrayMixin._effective_shape bug #2685

arporter · 2024-08-07T10:40:01Z

We had a bug which would cause a crash if we had an array access like a(b) where b is actually an array.

In testing this, I also realised that the ArrayAssignment2LoopsTrans transformation would incorrectly convert something like:

c(:) = a(b)

to

do idx = 1, SIZE(c,1)
   c(idx) = a(b)
end do

This should in fact be:

do idx = 1, SIZE(c,1)
  c(idx) = a(b(idx))
end do

This also means we have a bug in examples/nemo/scripts/utils.py because it should have converted a(b) to a(b(:)).

codecov · 2024-08-07T10:53:24Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.85%. Comparing base (6aad1d4) to head (eb4f8c8).
Report is 10 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #2685      +/-   ##
==========================================
- Coverage   99.86%   99.85%   -0.02%     
==========================================
  Files         354      354              
  Lines       49112    49123      +11     
==========================================
+ Hits        49048    49052       +4     
- Misses         64       71       +7

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

arporter · 2024-08-07T13:00:18Z

The good news: the integration tests pass.
Bad news:

i.e. the performance of NEMO with OMP offload and OMP CPU threading has taken a big hit.

arporter · 2024-08-07T14:35:53Z

I've diffed the generated OMP code with that from master and the only differences are in a comment where we don't sort the order of 'impure' routine names. Perhaps the machine just happened to be busy? Oh, there are a couple of differences in some routine calls:

diff psycloned-openmp_cpu-master/icbdyn.f90 psycloned-openmp_cpu/icbdyn.f90
108c108
<       pt%lon = icb_utl_bilin_x(glamt,pt%xi,pt%yj)
---
>       pt%lon = icb_utl_bilin_x(glamt(:,:),pt%xi,pt%yj)
diff psycloned-openmp_cpu-master/icbrst.f90 psycloned-openmp_cpu/icbrst.f90
341c341
<         nret = nf90_put_var(ncid,nsiceid,griddata,nstrt3,nlngth3)
---
>         nret = nf90_put_var(ncid,nsiceid,griddata,nstrt3(:),nlngth3(:))

but they look reassuring (as in, something has actually changed) and shouldn't affect performance at all.

arporter · 2024-08-07T15:49:12Z

I re-ran the NEMO tests and got:

so it may just be machine noise. I'll check the generated OMP-offload code against that from master too.

arporter · 2024-08-08T08:15:58Z

The only diffs between master and this branch in the generated OMP-offoad version of NEMO4 are:

$ diff psycloned-openmp_gpu{,-2027}
diff psycloned-openmp_gpu/icbdyn.f90 psycloned-openmp_gpu-2027/icbdyn.f90
111c111
<       pt%lon = icb_utl_bilin_x(glamt,pt%xi,pt%yj)
---
>       pt%lon = icb_utl_bilin_x(glamt(:,:),pt%xi,pt%yj)
diff psycloned-openmp_gpu/icbrst.f90 psycloned-openmp_gpu-2027/icbrst.f90
365c365
<         nret = nf90_put_var(ncid,nsiceid,griddata,nstrt3,nlngth3)
---
>         nret = nf90_put_var(ncid,nsiceid,griddata,nstrt3(:),nlngth3(:))

Therefore I declare this ready for review. One for @sergisiso, @AidanChalk or @hiker.

sergisiso · 2024-08-16T13:02:49Z

I brought the PR to master in order to run the integration tests again (it needed to use updated modules). With an empty system this PR shows no performance degradation 👍

sergisiso

Thanks for fixing this @arporter, I suggested cases that may still break the logic of this functionality, but if fixing them is complicated feel free to defer them to follow up PRs.

sergisiso · 2024-08-20T09:40:55Z

examples/nemo/scripts/utils.py

+            if hasattr(reference, "indices"):
+                # Look at array-index expressions too.
+                for exprn in reference.indices:
+                    if (isinstance(exprn, Reference) and
+                            isinstance(exprn.symbol, DataSymbol)):
+                        try:
+                            Reference2ArrayRangeTrans().apply(exprn)
+                        except TransformationError:
+                            pass


Could we instead remove the stop_type=Reference) above?

This may still fail to recurse in cases where the top reference does not have an indices, e.g.:
my_struct%my_field%my_array_field(another_array)

Regardless, this was not causing a bug, right? It is just an improvement?

(The ArrayAssignment2LoopsTrans is/was already calling the same recursive Reference2ArrayRangeTrans so it should have been expanded there anyway)

sergisiso · 2024-08-20T10:00:55Z

src/psyclone/tests/psyir/nodes/array_mixin_test.py

+    shape = routine.children[child_idx].lhs._get_effective_shape()
+    assert len(shape) == 1
+    assert "SIZE(a)" in shape[0].debug_string()


Could you test if we do the right thing if we don't know the type of the symbols:
b(scalarval, arrayval) = 1

And I see this method is not overriden by ArrayMember nor ArrayOfStructuresMixing. So we could also add some tests with these constructs here.

We didn't but we do now. Added tests.

sergisiso · 2024-08-20T10:06:36Z

src/psyclone/psyir/transformations/arrayassignment2loops_trans.py

-            if num_of_ranges > 0:
-                if not found:
+            count = len([x for x in accessor.indices if isinstance(x, Range)])
+            if count:


I like the name changes, but for clarity I suggest still doing "if count > 0", even if technically it is the same.

sergisiso · 2024-08-20T10:20:41Z

src/psyclone/psyir/transformations/arrayassignment2loops_trans.py

+            # array indices must be known (because otherwise they themselves
+            # might be arrays, e.g.
+            #      a(:) = b(c) where `c` can be a scalar or an array.
+            if isinstance(reference, ArrayMixin):


Again, do we still have a problem if the ArrayMixin is not in the top-level reference? e.g.
a(:) = b%field(c)
(maybe it should walk for inner arraymixins?)

I've extended the test in arrayassignment2loops_trans_test for this situation and it works!

sergisiso · 2024-08-20T10:35:11Z

src/psyclone/psyir/transformations/arrayassignment2loops_trans.py

+                    # If the number of ranges in this access is not the same as
+                    # on the LHS then we may or may not have a scalar.


Can you add: "To disambiguate we need to know that the Reference2ArrayRangeTrans in the apply will provide the remaining ranges, and for this it will need to know the type of each index expression"? (if this is the right reasoning here)

I'm not sure. I've extended the comment - see what you think.

sergisiso · 2024-08-20T10:46:05Z

src/psyclone/psyir/transformations/arrayassignment2loops_trans.py

+                        if not isinstance(exprn, Reference):
+                            continue


I am not sure that's right, for example in the new "test_apply_indirect_indexing" if in
ishtSi(5:8,jf) = ishtSi( iwewe,jf )
if iwewe was not a Reference and instead a codeblock:
ishtSi(5:8,jf) = ishtSi( (/ jpwe,jpea,jpwe,jpea /) , jf )
it should not validate.

Similarly, if it was a function call that is not elemental but we don't know if it returns a scalar or an array.

By contrast, I think we will probably do the right thing for expressions, e.g.:
ishtSi(5:8,jf) = ishtSi( iwewe + 1, jf )
as expr.datatype with bubble up the UnresolvedType if any part of the expression is unresolved.

Please add tests for all these.

Contrary to what you expected, we already refused to validate your first two examples. However, for the third one we haven't yet implemented support for finding the datatype when there are expressions in the array indices (#1799). I've added tests for all of these along with an associated TODO for the last one.

sergisiso · 2024-08-28T08:33:20Z

I am finding this same issue in lbclnk.f90 of NEMOv5 in my two open PRs, merging this branch into them fixes it.

arporter · 2024-08-30T16:07:51Z

Unfortunately, my tightening-up of checks on the types of expressions appearing in array-indices now means that the WHERE handling is refusing some code that is/was OK, e.g.:

  WHERE( picefr(:,:) > 1.e-10 )
    zevap_ice(:,:,1) = snow(:,3,:) * frcv(jpr_ievp)%z3(:,:,1) / picefr(:,:)

Here, we don't know the type of jpr_ievp (because it is imported) and therefore we refuse to find the shape of frcv(jpr_ievp)%z3(:,:,1). However, we can see that the whole expression already contains two ranges (:) and therefore we can conclude that jpr_ievp must be a scalar. This is going to be hard to deal with and also will conflict with @LonelyCat124 's work in #2687. I therefore don't really want to touch it until that PR is on (unless I can find a simple fix).

sergisiso · 2024-09-30T15:14:45Z

@arporter The WHERE branch has been merged, so this PR can probably continue now. Also note that a "lbclnk.f90", # TODO #2685: effective shape bug was added to examples/nemo/scripts/omp_cpu_trans.py as this was affecting NEMOv5. We want to make sure this PR resolves it and we can delete the exclusion.

sergisiso · 2024-09-30T15:23:21Z

Unfortunately, my tightening-up of checks on the types of expressions appearing in array-indices now means that the WHERE handling is refusing some code that is/was OK
Here, we don't know the type of jpr_ievp (because it is imported) and therefore we refuse to find the shape of

By the way, @LonelyCat124 also tightened the WHEREs with imported symbols and I believe he had to bring "jpr_ievp" and other symbols as locals in some tests. We also seen in the integration tests that this did not affect NEMO performance. So, your issue may be resolved now.

arporter · 2024-10-01T12:29:50Z

@arporter The WHERE branch has been merged, so this PR can probably continue now. Also note that a "lbclnk.f90", # TODO #2685: effective shape bug was added to examples/nemo/scripts/omp_cpu_trans.py as this was affecting NEMOv5. We want to make sure this PR resolves it and we can delete the exclusion.

In what way did the bug manifest with this file? I've removed this exclusion and re-run the integration tests which were fine apart from the race condition affecting the OMP CPU results.

sergisiso · 2024-10-01T12:34:43Z

It was a pyclone exception while processing the file. I knew this branch resolved it because I tried merging it even before the first review and I saw it was fixed here, that's why I added the TODO. It should be good now.

arporter · 2024-10-02T14:30:15Z

I have some indirect coverage changes that I need to address.

arporter added 2 commits August 7, 2024 11:30

#2027 fix bug in ArrayMixin._get_effective_shape()

40cc955

#2027 fix bug in ArrayAssignment2LoopsTrans.validate

03510ff

arporter marked this pull request as draft August 7, 2024 10:40

arporter self-assigned this Aug 7, 2024

arporter added NG-ARCH Issues relevant to the GPU parallelisation of LFRic and other models expected to be used in NG-ARCH PSyIR Core PSyIR functionality bug labels Aug 7, 2024

#2027 extend utils.py to convert array refs used as array indices

65e7ff9

arporter temporarily deployed to integration August 7, 2024 11:00 — with GitHub Actions Inactive

#2027 fix error in utils.py

33abf82

arporter temporarily deployed to integration August 7, 2024 11:50 — with GitHub Actions Inactive

arporter added the ready for review label Aug 8, 2024

arporter requested review from sergisiso, LonelyCat124 and hiker August 8, 2024 08:16

arporter marked this pull request as ready for review August 8, 2024 08:16

sergisiso added under review and removed ready for review labels Aug 16, 2024

Merge branch 'master' into 2027_effective_shape_bug

620a6dd

sergisiso temporarily deployed to integration August 16, 2024 09:58 — with GitHub Actions Inactive

sergisiso requested changes Aug 20, 2024

View reviewed changes

sergisiso added reviewed with actions and removed under review labels Aug 20, 2024

#2685 improve testing of ArrayAssignment2LoopsTrans

2ec7260

arporter and others added 2 commits August 30, 2024 17:28

#2685 tighten-up check on type of index expressions [skip ci]

32308ed

Merge branch 'master' into 2027_effective_shape_bug

7eccc19

arporter added 3 commits October 1, 2024 09:58

Merge branch 'master' into 2027_effective_shape_bug

71b7496

#2027 fix failing tests due to change in err msg

d8f3800

#2027 put back processing of lbclnk in omp_cpu_trans.py

1d35684

arporter temporarily deployed to integration October 1, 2024 10:16 — with GitHub Actions Inactive

sergisiso mentioned this pull request Oct 1, 2024

(towards #2671) Parallel array privatisation #2697

Open

arporter mentioned this pull request Oct 1, 2024

Reference2ArrayReference should support structure components #1858

Open

arporter added 3 commits October 2, 2024 10:28

#2027 bug fixes for array refs within struct accesses [skip ci]

4cb2a6f

#2027 further tidying of StructureReference.datatype method

60d6f77

#2027 rm unused import

beab4b6

Merge branch 'master' into 2027_effective_shape_bug

eb4f8c8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(Closes #2027) fix for ArrayMixin._effective_shape bug #2685

(Closes #2027) fix for ArrayMixin._effective_shape bug #2685

arporter commented Aug 7, 2024 •

edited

Loading

codecov bot commented Aug 7, 2024 •

edited

Loading

arporter commented Aug 7, 2024 •

edited

Loading

arporter commented Aug 7, 2024 •

edited

Loading

arporter commented Aug 7, 2024

arporter commented Aug 8, 2024

sergisiso commented Aug 16, 2024

sergisiso left a comment

sergisiso Aug 20, 2024 •

edited

Loading

sergisiso Aug 20, 2024

sergisiso Aug 20, 2024

arporter Aug 30, 2024

sergisiso Aug 20, 2024

arporter Aug 28, 2024

sergisiso Aug 20, 2024

arporter Aug 28, 2024

sergisiso Aug 20, 2024

arporter Aug 28, 2024

sergisiso Aug 20, 2024

arporter Aug 28, 2024

sergisiso commented Aug 28, 2024

arporter commented Aug 30, 2024

sergisiso commented Sep 30, 2024

sergisiso commented Sep 30, 2024

arporter commented Oct 1, 2024

sergisiso commented Oct 1, 2024

arporter commented Oct 2, 2024

		# If the number of ranges in this access is not the same as
		# on the LHS then we may or may not have a scalar.

(Closes #2027) fix for ArrayMixin._effective_shape bug #2685

Are you sure you want to change the base?

(Closes #2027) fix for ArrayMixin._effective_shape bug #2685

Conversation

arporter commented Aug 7, 2024 • edited Loading

codecov bot commented Aug 7, 2024 • edited Loading

Codecov Report

arporter commented Aug 7, 2024 • edited Loading

arporter commented Aug 7, 2024 • edited Loading

arporter commented Aug 7, 2024

arporter commented Aug 8, 2024

sergisiso commented Aug 16, 2024

sergisiso left a comment

Choose a reason for hiding this comment

sergisiso Aug 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sergisiso commented Aug 28, 2024

arporter commented Aug 30, 2024

sergisiso commented Sep 30, 2024

sergisiso commented Sep 30, 2024

arporter commented Oct 1, 2024

sergisiso commented Oct 1, 2024

arporter commented Oct 2, 2024

arporter commented Aug 7, 2024 •

edited

Loading

codecov bot commented Aug 7, 2024 •

edited

Loading

arporter commented Aug 7, 2024 •

edited

Loading

arporter commented Aug 7, 2024 •

edited

Loading

sergisiso Aug 20, 2024 •

edited

Loading