Output of identical GE p-file names get overwritten

ver003 · October 18, 2022, 10:55am

Hi,

I’ve run into a problem using GE p-files. As the files are being reused, I have several files with the same name (but stored under individual IDs). When went through the PDFs in the output to verify voxel placement I discovered that the PDFs are being overwritten for p-files with the same name. Is there a way I could avoid this, without changing all of my p-file names?

Thanks,
Vera

admin · October 18, 2022, 12:49pm

Hi Vera,

I’ve heard the exact same issue from a collaborator last week. This is difficult to fix, because it is impossible to predict at what depth of an arbitrary folder organization the file IDs become unique. For now, I’d suggest renaming the files indeed (this is good practice anyway). You could consider transitioning to BIDS-MRS (BIDS Extension Proposal 22 (BEP022): MRS (Magnetic Resonance Spectroscopy) - Google Docs).

Best,
Georg

mmikkel · October 19, 2022, 12:02am

Hi @ver003,

For what it’s worth, GE scanners will name P-files in run numbers that are incremented by 512 from 00000 to 64512 and cycling back to 00000.

As @admin said, it’s always best to rename P-files (as soon as you export them, imo) to something interpretable.

Mark

alex · October 19, 2022, 11:34am

Hi @admin,

I agree that renaming/re-organising per BIDS would be preferable – but I’m curious how this issue would manifest with data pooled across sites/projects, where the BIDS-recommended naming will also yield conflicting names? Taking the familiar Big GABA dataset as an example, there’s basically perfect overlap in the filenames across sites… so do these need to be processed in separate batches if you want to inspect the PDF output?

Do I understand correctly, it’s only the intermediate output files (PDFs and such) which are affected, so at least the final results table should be consistent?

Hi @ver003, I have a couple of ideas for a workaround in your case, which I’ll try to implement this afternoon; maybe we can test next week?

(possible quick fix: appending a hash of the full path; probable better solution: identifying common root folder and prepending non-common components to the output file name)

Alex.

admin · October 20, 2022, 12:58pm

Yeah, the Big GABA dataset as it is is not BIDS-compliant for many reasons. There is no site level in the BIDS specification, so subjects across multiple sites each should have unique IDs.

Multi-site or multi-center studies

This version of the BIDS specification does not explicitly cover studies with data coming from multiple sites or multiple centers (such extension is planned in BIDS 2.0. There are however ways to model your data without any loss in terms of metadata.

Option 1: Treat each site/center as a separate dataset

The simplest way of dealing with multiple sites is to treat data from each site as a separate and independent BIDS dataset with a separate participants.tsv and other metadata files. This way you can feed each dataset individually to BIDS Apps and everything should just work.

Option 2: Combining sites/centers into one dataset

Alternatively you can combine data from all sites into one dataset. To identify which site each subjects comes from you can add a site column in the participants.tsv file indicating the source site. This solution allows you to analyze all of the subjects together in one dataset. One caveat is that subjects from all sites will have to have unique labels. To enforce that and improve readability you can use a subject label prefix identifying the site. For example sub-NUY001, sub-MIT002, sub-MPG002 and so on. Remember that hyphens and underscores are not allowed in subject labels.
(https://bids-specification.readthedocs.io/en/stable/06-longitudinal-and-multi-site-studies.html)

We used to routinely append folder names parsed from two levels above the file name itself to most output files (apparently not for the masks @Helge @Chris_Davies-Jenkins) - the problem is that this generates really unwieldy redundant file names. There’s probably a smarter way to do this, as you suggest, but it is really impossible to predict how users organize their files. Happy to hear suggestions…

Cheers,
Georg

alex · October 20, 2022, 1:31pm

Thanks for the info

I agree that unwieldy redundant filenames are best avoided; I’m testing an alternative which might work (I’ll post a suggestion here once it’s a little more refined), but the logic would be something like:

If there are duplicated filenames in the batch
- Split the fullpath for each batch item into individual components, arrange into columns
- Drop columns with only one unique value (ie, remove any redundant/common prefixes)
- Drop columns matching BIDS patterns (sub-<label>, sess-<label>) which also appear in the filename
- Keep everything else (any non-shared prefixes, which might include site, project, or non-BIDS-conformant subject folders) and prepend it to the eventual filename.

For most cases this is a long-winded way of reaching the same filename – but specifically in cases of duplicates in separate folders, those non-redundant leading components are prepended.

The major drawback of this approach is that it’s less directly predictable: in principle, a different output filename might be derived for the same spectrum depending on what else was in the processing batch with it…