Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Argument list to long #245

Open
tjgalvin opened this issue Aug 11, 2023 · 2 comments
Open

Argument list to long #245

tjgalvin opened this issue Aug 11, 2023 · 2 comments

Comments

@tjgalvin
Copy link

Hi all,

A little strange issue popped up that has left me scratching my hand.

I was processing a collection of measurement sets in a pipeline. There is a stage early on that iterates over rows in the data table of a singular measurement set, and updates visbilities after applying a rotation correction, before writing them back out. This happens in a chunking fashion. This code is available here: https://github.com/AlecThomson/FixMS/blob/main/fixms/fix_ms_corrs.py#L264

Recently I was running a hefty series of jobs and stumbled on this error:

Encountered exception during execution:
Traceback (most recent call last):
  File "/scratch3/gal16b/mambaforge/envs/flint/lib/python3.8/site-packages/fixms/fix_ms_corrs.py", line 330, in fix_ms_corrs
    tab.flush()
  File "/scratch3/gal16b/mambaforge/envs/flint/lib/python3.8/site-packages/casacore/tables/table.py", line 557, in flush
    self._flush(recursive)
RuntimeError: FiledesIO::write - write error in /scratch3/gal16b/split/39403/2022-04-14_110035_18.RACS.0748-43.ms/table.f1: Argument list too long

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/scratch3/gal16b/mambaforge/envs/flint/lib/python3.8/site-packages/prefect/engine.py", line 1719, in orchestrate_task_run
    result = await call.aresult()
  File "/scratch3/gal16b/mambaforge/envs/flint/lib/python3.8/site-packages/prefect/_internal/concurrency/calls.py", line 292, in aresult
    return await asyncio.wrap_future(self.future)
  File "/scratch3/gal16b/mambaforge/envs/flint/lib/python3.8/site-packages/prefect/_internal/concurrency/calls.py", line 316, in _run_sync
    result = self.fn(*self.args, **self.kwargs)
  File "/scratch3/gal16b/packages/flint/flint/ms.py", line 473, in preprocess_askap_ms
    fix_ms_corrs(
  File "/scratch3/gal16b/mambaforge/envs/flint/lib/python3.8/site-packages/fixms/fix_ms_corrs.py", line 331, in fix_ms_corrs
    start_row += len(data_chunk_cor)
  File "/scratch3/gal16b/mambaforge/envs/flint/lib/python3.8/site-packages/casacore/tables/table.py", line 406, in __exit__
    self.close()
  File "/scratch3/gal16b/mambaforge/envs/flint/lib/python3.8/site-packages/casacore/tables/table.py", line 574, in close
    self._close()
RuntimeError: FiledesIO::write - write error in /scratch3/gal16b/split/39403/2022-04-14_110035_18.RACS.0748-43.ms/table.f1: Argument list too long

I am unsure what to make of this. I have reran my pipeline on a smaller dataset and which included this measurement set and found no issue. Looking at the specific error Argument list too long reads like there was some interaction with a shell when trying to flush the buffers to disk. Like there is a large cp or rm command trying to be executed.

Would you happen to have any insight into this and the underlying behavior of the close and flush of a casacore table? Is there a series of temporary files stored, say, in /dev/shm that are examined or the current working directory? I am at a total loss as to where else to look, and it is not clear to me if this is actually a python-casacore, a casacore or some other related issue.

@rtobar
Copy link
Contributor

rtobar commented Aug 11, 2023

The error is coming from https://github.com/casacore/casacore/blob/5a8df94738bdc36be27e695d7b14fe949a1cc2df/casa/IO/FiledesIO.cc#L100-L104. This is a simple write(2) call which in principle shouldn't result in an E2BIG errno value. I suspect the underlying filesystem of /scratch3 (which one is it, do you know?) is complaining about something during the write, resulting in that non-standard error value for write.

@tjgalvin
Copy link
Author

tjgalvin commented Aug 11, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants