Data Quality and Exclusion Criteria in Osprey’s LCModel Output

Hello MRS Experts,

I am currently working with Osprey’s in-built LCModel for processing data from a 3T Siemens MEGA-PRESS sequence, focusing on quantifying Glx and GABA+. While Osprey has made the analysis more approachable, I have encountered some issues regarding data quality and exclusion criteria. I would greatly appreciate any insights or recommendations from the community.

1. FWHM of Spectra

According to Wilson et al. (2019), a linewidth of less than 0.1 ppm is recommended for accurate metabolite quantification. In my dataset, the Cr_FWHM extracted from the QM_processed_spectra for one subject is 12.97 Hz, which I calculated as approximately 0.105 ppm. However, the LCModel output for the same session shows differing values: 0.103 ppm in the Diff spectrum and 0.095 ppm in the A spectrum.

Question : Which FWHM value should be used for comparison against the 0.1 ppm criterion? Does this threshold apply to the Diff spectrum, the A spectrum, or both?

2. Cr Frequency Stability

In Song et al. (2024), the Cr frequency stability metric is reported with an SD less than 0.2. I’ve noticed that Osprey provides a “freqshift” metric within QM_processed_spectra, and LCModel outputs a “Data Shift” value in the diff1.table and A.table file.

Question : How is the Cr frequency stability metric typically calculated? Should I use the SD of the “freqshift” from Osprey’s QM_processed_spectra or the SD of the LCModel “Data Shift” values? Are these reported in ppm?

3. GABAplus Macromolecule Modeling

The default LCModel script uses the ‘3to2MM’ setting for macromolecule fitting in the GABA+ range, but this results in nearly half of my GABA estimates being 0. I switched to the ‘1to1GABAsoft’ model and added a 0.4 ppm baseline knot spacing, which improved the fits significantly. Now, only one subject has a GABA estimate of 0.

Question : Should I consider using the ‘1to1GABAhard’ model instead? It provides non-zero GABA estimates across all subjects, but this setting removes the GABA+ column in the Diff1 spectra quantification. Is this trade-off acceptable, or might it introduce other issues?

4. Metabolite-Specific FWHM and SNR

Unlike Gannet, I cannot find metabolite-specific FWHM and SNR values (e.g., for GABAplus or Glx) in Osprey. Rather, I only have access to the absolute CRLB values for each metabolite.

Question : Are metabolite-specific FWHM and SNR not available in Osprey’s LCModel output, or am I missing something? Is there an alternative way to extract these values?

5. Visual Inspection and Spectral Quality

I have identified one subject with notably poor spectral quality: the FWHM is unusually high compared to other sessions, and the chemical shift drift plot shows scattered points.

Question : Is there any additional processing I can apply to salvage this data, or should I exclude this subject entirely? Would you recommend any specific steps for improving the data quality in such cases?

Song et al. Brain glutathione and GABA + levels in autistic children. Autism Research, 17(3), 512–528.
Wilson et al. Methodological consensus on clinical proton MRS of the brain: Review and recommendations. Magnetic Resonance in Medicine, 82(2), 527–550.

Question : Which FWHM value should be used for comparison against the 0.1 ppm criterion? Does this threshold apply to the Diff spectrum, the A spectrum, or both?

The criterion applies to any spectrum since the linewidth is (largely) a reflection of the shim quality. LCModel FWHM estimates only contains the contributions from its estimated lineshape convolution kernel (and does not include the natural Lorentzian component); Osprey estimates the FWHM directly from the data, so I’d expect the Osprey estimates to be slightly larger. I would generally not treat the 0.1-ppm criterion as a super-hard binary cut-off.

Question : How is the Cr frequency stability metric typically calculated? Should I use the SD of the “freqshift” from Osprey’s QM_processed_spectra or the SD of the LCModel “Data Shift” values? Are these reported in ppm?

Neither. The two values you mention describe a shift that is applied during an initial modeling step to account for small shifts between the averaged spectrum and the basis set. What you’re looking for is the Cr frequency shift trajectory which is stored in MRSCont.QM.drift.pre (before the frequency/phase corrections are applied). (I don’t think we’re currently storing these in the output QM tables, which we probably should do.

Question : Should I consider using the ‘1to1GABAhard’ model instead? It provides non-zero GABA estimates across all subjects, but this setting removes the GABA+ column in the Diff1 spectra quantification. Is this trade-off acceptable, or might it introduce other issues?

For MEGA, use 0.55 ppm baseline knot spacing and the 3to2MM setting (see Comparison of linear combination modeling strategies for edited magnetic resonance spectroscopy at 3 T - PubMed (nih.gov)). For this setting, do not report GABA, but report GABA+. GABA and MMs cannot be reliably estimated independently.

Question : Are metabolite-specific FWHM and SNR not available in Osprey’s LCModel output, or am I missing something? Is there an alternative way to extract these values?

Gannet uses an entirely different fitting model (no basis functions, just fitted peaks). FWHM and SNR are, to a first approximation, qualities of a dataset (not of single peaks), so there is no such metric for linear-combination modeling methods (and you do not need them).

Question : Is there any additional processing I can apply to salvage this data, or should I exclude this subject entirely? Would you recommend any specific steps for improving the data quality in such cases?

Exclude. Low SNR, high linewidth, no discernible edited GABA peak… this is not going to be informative. What region is this from?

1 Like

Generally, I’d encourage you to show representative data and fits - I don’t have a clear sense of how good your data are

Hi Admin,

Thank you so much for your detailed response—it’s been incredibly helpful for those of us relatively new to MRS. I also apologize for the delay in following up, as I’ve been testing new parameters for my data and organizing my thoughts. We would be very grateful if you could respond as quickly as you did last time.

Representative Data

All spectra were acquired from the mPFC region, with each subject undergoing two sessions.

Upon review, I noticed that the scan parameters for my MEGA-PRESS data weren’t optimal. Specifically, I found a delta frequency of -2.8 ppm in the water reference scan, a low TR (1500ms), and low averages (128), which differ from the consensus recommendations (e.g., Peek et al., 2023).

Despite this, I’m aiming to make the best of the available data. I tested various parameter combinations and found that changing opts.SpecReg from RobSpecReg to RestrSpecReg was effective. This adjustment, alongside using opts.fit.range [0.5 - 4 ppm] for fitting, generally produced smoother spectra for noisier sessions. It also reduced the overall residual (31.0% vs. 31.7%).

However, there are still cases where RestrSpecReg fails to fully resolve the spectra smoothly.

You can view examples of the improvements here:
RestrSpecReg_Improvement

As requested, I’ve attached some representative data, which includes concatenated output files for both mean and individual session data:
Overview_Data

MM Fitting Strategies

I also have some additional questions regarding signal quality and macromolecule (MM) fitting. I reran the analysis with the 0.55 ppm baseline knot spacing and compared two fitting strategies: 3to2MM and 1to1GABAsoft.

In the 3to2MM model, the macromolecule signal seems to override the expected GABA signal around 3 ppm. Below is a comparison of the fitted difference spectra for the 3to2MM, 1to1GABAsoft, and 1to1GABA models. The red fitted curve for 3to2MM shows only one bump at 3 ppm, whereas the 1to1GABAsoft model shows three bumps, which seems more consistent with the green average. Interestingly, the residuals for 3to2MM (30.0%) are slightly lower than for 1to1GABAsoft (31.0%). The 1to1GABAhard model, while missing the middle bump, has a lower residual than 1to1GABAsoft (31.6%).

You can see the fitting comparison here:
MM_fitting / Fitting Method

I’ve also attached a distribution of individual data points comparing the 3to2MM and 1to1GABAsoft fits (by Max-Min normalization). In about half of the cases under 3to2MM, the GABA concentration is zero, and the GABA CRLB value is set to 999. In contrast, with the 1to1GABAsoft fit, only one data point has a GABA CRLB value of 999, indicating a fit failure. However, two GABA+ CRLB values and one Glx value under 1to1GABAsoft are beyond the 2 SD range.

Distribution for 1to1GABAsoft fitting

Distribution for 3to2MM fitting

Additionally, I’ve uploaded two individual case fitting graphs for your review:
MM_Fitting / Clean_Compare

Each PDF page contains two spectra: 1to1GABAsoft on the left and 3to2MM on the right. These cases have smoother spectra compared to others, and as you can see, the 3to2MM model doesn’t utilize the GABA component for fitting at 3 ppm, being completely dominated by the MM09 peak.

Lastly, referencing a previous forum post, some users have found that 1to1GABAsoft might be more suitable for MEGA-PRESS data. Would this also apply to my case?
Forum Reference

CRLB Screening

We are applying a CRLB ± 2 SD cutoff to filter out participants with extreme data uncertainty. Among the GABA+ CRLB values, two clear outliers stand out. I’ve attached these flagged cases for your review, along with their respective fits under both the 3to2MM and 1to1GABAsoft models. Would you recommend excluding these data points?

Problem_Spectra / CRLB_Flagged

Potentially Noisy Spectra Without CRLB Flags

We’ve also identified several spectra that we are unsure about in terms of quality, even though they weren’t flagged by CRLB values. We’ve categorized these spectra to make it easier for you to assess:

  • Sharp Spikes and Troughs
    Some spectra exhibit rapid, sharp spikes and troughs. Are these spectra acceptable for analysis?
    Problem_Spectra / Noisy GABA
  • Glx Unwanted Oscillations
    For Glx fitting at ~3.7 ppm, some spectra show unwanted oscillations. Are these out-of-voxel artifacts that should be discarded?
    Problem_Spectra / Noisy Glx
  • Baseline Spikes
    In some cases, the baseline between Glx and GABA peaks includes large spikes. Are these spectra suitable for analysis?
    Problem_Spectra / Noisy Baseline

Any insights or recommendations on how to proceed with these cases would be greatly appreciated. Should we exclude certain spectra based on these irregularities, even if they haven’t been flagged by CRLB?

Thanks again for all your help!

Hi @timo0302,

What is your subject group? The references you give deal with some special cases (autistic children, clinical applications) where it can be more challenging to get good quality data. For autistic children, some extra steps to ensure the subject is at ease in the scanner can go a long way to improve the data quality – but even so, I would expect to lose a few datasets to quality issues.

While maybe not “optimal”, these factors won’t explain any of the issues you present here:

  • Delta frequency for the water reference scan may affect the absolute quantification scaling, but shouldn’t alter the metabolite spectrum itself.
  • Lower TR may give you slightly less signal and slightly different contribution of macromolecule signal to the GABA peak, and
  • The lower number of averages will slightly degrade SNR and (hence) reproducibility

On first inspection, I would say most of the issues you present most likely relate to subject motion and placement/shim quality.

When it comes to quality control, simply thresholding on basic metrics isn’t sufficient to catch all problematic spectra. Practices and opinions vary a bit, but removable of clear outliers is often justifiable; I would reject these. Visual inspection is also important to catch the cases which the basic metrics don’t identify – so yes, you should certainly consider rejecting spectra with clear irregularities.

As @admin mentioned, if the edited GABA peak (around 3 ppm) isn’t readily discernible then the data won’t be informative. I would reject most of the examples you supplied, since the GABA is either not clearly discernible, or very oddly shaped. It looks like you have a decent number of subjects, so you can afford to lose a few which are only contributing uncertainty.

1 Like

Thanks @alex for your prompt response. I took some time to check with the experimenters involved in the MRI scanning, and here’s a bit more context regarding the subject characteristics and scan conditions:

Subject Characteristics

The participants are university students who underwent two scanning sessions: one 30 minutes before and one 30 minutes after treadmill running. The total scan duration, including both MRS and fMRI, was about an hour. The fMRI data, in this case, showed acceptable movement and quality, so we don’t expect significant motion artifacts in the MRS data either.

Interestingly, running probably didn’t negatively impact the data quality at Time 2, as the CRLB for GABA+ at Time 2 appears to be lower than at Time 1, even after excluding the two extreme values in Time 1. I’ve attached scatter plots showing the CRLB differences between the two scans for reference:

The full dataset

The dataset with extreme value excluded (circled in red)

In case of subject movement, the sequence is repeated and only the best acquisition was kept. However, it was the first time our radiographer acquire MRS in the ACC region. I am attaching some of the voxel placements in our own scan here as well. voxel_placement

Unfortunately, we only have between 20 - 30 subjects, and each subjects will go through two scans, and I only laid out some exemplar data for different type of noisy data. After looking at the spectra from other user running Osprey other spectra, I noticed that my current data are do not have that consistency. MM_Fitting / Clean_Compare

What would be your judgement on the spectra for the three of the spectra in clean_compare? Are they of acceptable quality? Or only the first spectra is acceptable?

Other Files

I am also attaching other files that might be helpful in resolving the data quality issue, including the job file, and the relevant part of the scan parameter.

Defaced_GABA_Restr_1to1soft_055_jobSDAT_MEGA_LCModel.m (20.1 KB)
MRS_scan_protocol_only.pdf (65.3 KB)

Lastly, I compared the file for the QM_processed_spectra between the example data and my own data, there isn’t much difference for SNR and FWHM, but the residual_water_ampl is at the range between 0 - 6 in our data (v.s. ~ 0.5 in example data). There are orders of magnitude difference for the relResA column, where it’s between (1600000 - 3500000) in my data (v.s. ~ 3 in example data). What does that mean?

Thanks for your help!

Timo