Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use python-casacore's pending MeasurementSet creation functionality for creating Fixed Shape columns #75

Merged

Conversation

sjperkins
Copy link
Contributor

@sjperkins sjperkins commented Jan 31, 2017

Previously, h5toms created variable length data columns when creating a MeasurementSet. Catering for this variability means the MS's are not performant. This PR

  1. Modifies h5toms to use python-casacore's upcoming functionality for creating canonical Measurement Set's. This removes the need to maintain table definitions for the MS within katdal.
  2. Modfies h5toms to define fixed shape columns for DATA, MODEL_DATA, CORRECTED_DATA, FLAG, FLAG_CATEGORY, IMAGING_WEIGHTS, WEIGHTS and SIGMA, as well as the original hypercolumn definitions that were bases on these columns.

Depends on casacore/python-casacore#72.

`HCcoordnames=[], HCidnames=[]` is longer accepted as valid by
python-casacore and casacore.
Now that python-casacore can create a proper MeasurementSet, we no
longer need to rely on our own definition of it and it's subtables.
However, do create custom tiled arrays and hypercolumns for
DATA, MODEL_DATA, CORRECTED_DATA, FLAGS, FLAG, FLAG_CATEGORY,
WEIGHT, SIGMA and IMAGING_WEIGHT in the kat_ms_desc function.
Take averaging into account
@sjperkins
Copy link
Contributor Author

@ludwigschwardt Still a WIP, I'll request a review when ready.

@sjperkins
Copy link
Contributor Author

sjperkins commented Feb 1, 2017

Some indicators of performance on my laptop. From a brief look I don't think writing to the MS is the bottleneck, since my OS disk indicator shows 90 - 100 MB/s during these extended write periods. Reads burst @ perhaps 20 MB/s but very briefly by comparison (mostly < 1MB/s)

Writing static meta data...
scan   1 ( 151 samples) loaded. Target: '3C138'. Writing to disk...
Wrote scan data (1091.067221 MB) in 42.674429 s (25.567237 MBps)

scan   3 (  31 samples) loaded. Target: 'PKS 1607-841'. Writing to disk...
Wrote scan data (223.993933 MB) in 18.888033 s (11.859040 MBps)

scan   5 ( 151 samples) loaded. Target: 'SCP'. Writing to disk...
Wrote scan data (1091.067221 MB) in 50.204534 s (21.732444 MBps)

When writing model and corrected data the situation "improves":

Writing static meta data...
scan   1 ( 151 samples) loaded. Target: '3C138'. Writing to disk...
Wrote scan data (2336.817221 MB) in 43.564234 s (53.640728 MBps)

scan   3 (  31 samples) loaded. Target: 'PKS 1607-841'. Writing to disk...
Wrote scan data (479.743933 MB) in 19.703434 s (24.348240 MBps)

scan   5 ( 151 samples) loaded. Target: 'SCP'. Writing to disk...
Wrote scan data (2336.817221 MB) in 48.277194 s (48.404164 MBps)

scan   7 (  73 samples) loaded. Target: 'PKS 1607-841'. Writing to disk...
Wrote scan data (1129.719584 MB) in 23.365508 s (48.349883 MBps)

They should vary by correlation, not channels.
Set up a Data Manager Info object with tile sizes specified for each
Data Storage Manager. kat_ms_desc_and_dminfo returns both a Table
Description and Data Manager info object suitable for passing to
create_ms.
https://casa.nrao.edu/Memos/229.html#SECTION000615000000000000000 says
that DIRECTION has shape (2,) and since every row in SOURCE table
corresponds to a FIELD, the extra dimension of size one seems
unnecessary.
@bennahugo
Copy link
Contributor

These changes work quite well in our pipeline. I've been running them since Stimela 0.2.8 on a regular basis. As soon as the PRs go into python-casacore I'd recommend that we integrate.

@sjperkins sjperkins force-pushed the use-python-casacore-ms-creation branch from dbbdb7d to 5c93c80 Compare May 17, 2017 15:17
Aim for 4MB tile chunks. Further customisation of the tile size is
possible.
It automatically gets add in the C++ ms layer.
See casacore/python-casacore#72 (comment)
for more information.
@sjperkins sjperkins force-pushed the use-python-casacore-ms-creation branch from 5c93c80 to 43f27b9 Compare May 17, 2017 15:25
@ludwigschwardt
Copy link
Contributor

Is this PR ready to use now?

@sjperkins
Copy link
Contributor Author

Is this PR ready to use now?

Provisionally. However, In order to use this PR:

  1. A manual build of casascore 2.3.0 is required (but not difficult).
  2. pip install of python-casacore master branch (unpleasant).

Unfortunately, none of the functionality in points (1) and (2) is available in a kernsuite release (yet).

@sjperkins
Copy link
Contributor Author

With the:

  • inclusion of casacore 2.3.0 in kern-2
  • inclusion of casacore 2.4.0 in kern-3
  • release of python-casacore 2.2.0 in kern-3 and pypi

I think this feature is ready to be merged.

/cc @bennahugo @SpheMakh @gijzelaerr @ludwigschwardt

@gijzelaerr gijzelaerr self-requested a review November 9, 2017 11:09
@sjperkins
Copy link
Contributor Author

Could a RATT go test this works with kern-3 casacore 2.4.0 and python-casacore 2.2.0. Modify https://github.com/SpheMakh/Stimela/blob/master/stimela/cargo/base/katdal/Dockerfile to

  1. remove custom casacore 2.3.0 build
  2. Add kern-3 repo
  3. docker-apt-install casacore==2.4.0
  4. pip install python-casacore==2.2.0
  5. Retain pip install of custom katdal branch (tracking this PR).

If that works, I reckon @gijzelaerr could press the big green merge button with @ludwigschwardt noticing ;-)

@SpheMakh
Copy link

It works when using kern-3 python-casacore, But the PyPI version of python-casacore is broken

@sjperkins sjperkins force-pushed the use-python-casacore-ms-creation branch from 0f46c5f to 98bbced Compare November 23, 2017 11:32
@sjperkins
Copy link
Contributor Author

Can this be merged now that python-casacore 2.2.1 is released?

@ludwigschwardt
Copy link
Contributor

ludwigschwardt commented Nov 23, 2017

For the record, I bumped the casacore brew formula to 2.4.0 yesterday, so the new combination should work on the Mac too (time to test...).

@ludwigschwardt ludwigschwardt merged commit aec3614 into ska-sa:master Nov 27, 2017
@sjperkins sjperkins deleted the use-python-casacore-ms-creation branch November 30, 2017 06:03
ludwigschwardt added a commit that referenced this pull request May 9, 2018
Ever since PR #75 it is no longer needed to create MeasurementSets.
So long little one, you have served us well ever since the glory days
of ffinder/k7augment in 2010 :-)

This addresses JIRA ticket SR-1189.
ludwigschwardt added a commit that referenced this pull request Jul 12, 2022
This column is initialised full of zeroes and just takes up space
(4 bytes per main table entry, i.e. time-frequency-baseline cell, so
about 6-12% overhead). It can be recreated in CASA at a later stage.

The column is a leftover of good old blank.ms, which included it along
with MODEL_DATA and CORRECTED_DATA. The latter two columns were made
optional in the sneaky 41af4b8 as part of #75 in version 0.9, but this
one was missed.

Also remove the commented-out code that has tagged along forever :-)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants