-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use python-casacore's pending MeasurementSet creation functionality for creating Fixed Shape columns #75
Use python-casacore's pending MeasurementSet creation functionality for creating Fixed Shape columns #75
Conversation
`HCcoordnames=[], HCidnames=[]` is longer accepted as valid by python-casacore and casacore.
Now that python-casacore can create a proper MeasurementSet, we no longer need to rely on our own definition of it and it's subtables. However, do create custom tiled arrays and hypercolumns for DATA, MODEL_DATA, CORRECTED_DATA, FLAGS, FLAG, FLAG_CATEGORY, WEIGHT, SIGMA and IMAGING_WEIGHT in the kat_ms_desc function.
Take averaging into account
@ludwigschwardt Still a WIP, I'll request a review when ready. |
Some indicators of performance on my laptop. From a brief look I don't think writing to the MS is the bottleneck, since my OS disk indicator shows 90 - 100 MB/s during these extended write periods. Reads burst @ perhaps 20 MB/s but very briefly by comparison (mostly < 1MB/s) Writing static meta data...
scan 1 ( 151 samples) loaded. Target: '3C138'. Writing to disk...
Wrote scan data (1091.067221 MB) in 42.674429 s (25.567237 MBps)
scan 3 ( 31 samples) loaded. Target: 'PKS 1607-841'. Writing to disk...
Wrote scan data (223.993933 MB) in 18.888033 s (11.859040 MBps)
scan 5 ( 151 samples) loaded. Target: 'SCP'. Writing to disk...
Wrote scan data (1091.067221 MB) in 50.204534 s (21.732444 MBps) When writing model and corrected data the situation "improves": Writing static meta data...
scan 1 ( 151 samples) loaded. Target: '3C138'. Writing to disk...
Wrote scan data (2336.817221 MB) in 43.564234 s (53.640728 MBps)
scan 3 ( 31 samples) loaded. Target: 'PKS 1607-841'. Writing to disk...
Wrote scan data (479.743933 MB) in 19.703434 s (24.348240 MBps)
scan 5 ( 151 samples) loaded. Target: 'SCP'. Writing to disk...
Wrote scan data (2336.817221 MB) in 48.277194 s (48.404164 MBps)
scan 7 ( 73 samples) loaded. Target: 'PKS 1607-841'. Writing to disk...
Wrote scan data (1129.719584 MB) in 23.365508 s (48.349883 MBps) |
This reverts commit 4f40f94.
They should vary by correlation, not channels.
Set up a Data Manager Info object with tile sizes specified for each Data Storage Manager. kat_ms_desc_and_dminfo returns both a Table Description and Data Manager info object suitable for passing to create_ms.
https://casa.nrao.edu/Memos/229.html#SECTION000615000000000000000 says that DIRECTION has shape (2,) and since every row in SOURCE table corresponds to a FIELD, the extra dimension of size one seems unnecessary.
a5001a4
to
05b4038
Compare
These changes work quite well in our pipeline. I've been running them since Stimela 0.2.8 on a regular basis. As soon as the PRs go into python-casacore I'd recommend that we integrate. |
dbbdb7d
to
5c93c80
Compare
Aim for 4MB tile chunks. Further customisation of the tile size is possible.
Conflicts: scripts/h5toms.py
It automatically gets add in the C++ ms layer. See casacore/python-casacore#72 (comment) for more information.
5c93c80
to
43f27b9
Compare
Is this PR ready to use now? |
Provisionally. However, In order to use this PR:
Unfortunately, none of the functionality in points (1) and (2) is available in a kernsuite release (yet). |
Could a RATT go test this works with kern-3 casacore 2.4.0 and python-casacore 2.2.0. Modify https://github.com/SpheMakh/Stimela/blob/master/stimela/cargo/base/katdal/Dockerfile to
If that works, I reckon @gijzelaerr could press the big green merge button with @ludwigschwardt noticing ;-) |
It works when using kern-3 python-casacore, But the PyPI version of python-casacore is broken |
0f46c5f
to
98bbced
Compare
Can this be merged now that python-casacore 2.2.1 is released? |
For the record, I bumped the casacore brew formula to 2.4.0 yesterday, so the new combination should work on the Mac too (time to test...). |
Ever since PR #75 it is no longer needed to create MeasurementSets. So long little one, you have served us well ever since the glory days of ffinder/k7augment in 2010 :-) This addresses JIRA ticket SR-1189.
This column is initialised full of zeroes and just takes up space (4 bytes per main table entry, i.e. time-frequency-baseline cell, so about 6-12% overhead). It can be recreated in CASA at a later stage. The column is a leftover of good old blank.ms, which included it along with MODEL_DATA and CORRECTED_DATA. The latter two columns were made optional in the sneaky 41af4b8 as part of #75 in version 0.9, but this one was missed. Also remove the commented-out code that has tagged along forever :-)
Previously, h5toms created variable length data columns when creating a MeasurementSet. Catering for this variability means the MS's are not performant. This PR
DATA
,MODEL_DATA
,CORRECTED_DATA
,FLAG
,FLAG_CATEGORY
,IMAGING_WEIGHTS
,WEIGHTS
andSIGMA
, as well as the original hypercolumn definitions that were bases on these columns.Depends on casacore/python-casacore#72.