Data augmentation techniques for MRS signals: Seeking expert advice on synthesis range

MATEUS · February 1, 2023, 4:24pm

Hello MRSHub Forum community,

I am looking for guidance on data augmentation techniques that can be used to represent the real distribution of MRS signals. I have a list of techniques I have found, but I am unsure about the recommended synthesis range for each.

Here is the list of techniques I have come across:

Zero-order phase
First-order phase
Amplitude scaling
Time shifting
Frequency shifting
Additive noise
Baseline distortion
Exponential/Gaussian line broadening

I would greatly appreciate it if you could share your knowledge on the recommended synthesis range for each technique. For example, I have read that zero-order phase correction could range somewhere between -90 and 90 degrees and exponential line broadening between 0 and 50 Hz.

It would also be valuable if you could share with me some articles and other didactic materials that you think would be helpful for me to get a deeper understanding of this topic.

Thank you in advance for your help and insights!

mmikkel · February 2, 2023, 12:38am

Hi @MATEUS,

By data augmentation, do you mean physical parameters that modulate the (true) MR(S) signal? Or are you developing a training model for deep learning?

Zero-order phasing could technically range from -180 to 180 degrees.

Line-broadening up to 50 Hz is unrealistic for empirical data (unless something goes extremely wrong). For simulations, I would set the max to around 20 Hz.

Here are some papers that might help you:

https://doi.org/10.1002/(SICI)1099-1492(199906)12:4<205::AID-NBM558>3.0.CO;2-1
https://doi.org/10.1002/nbm.1122
https://doi.org/10.1002/mrm.28525
https://doi.org/10.1016/j.jmr.2019.05.002
https://doi.org/10.1016/j.jmr.2020.106732
https://doi.org/10.1002/nbm.4410

MATEUS · February 2, 2023, 1:35pm

Hi @mmikkel,

I am developing a training model for Deep Learning.

I haven’t found in the Forum common practices related to applying Deep Learning models in MRS. What techniques and their respective ranges do you usually use?

Thank you very much for sharing the papers, for your attention and support.

mmikkel · February 2, 2023, 9:31pm

I don’t think standard practices for deep learning of MRS have been established yet.

These papers might be of help:

https://onlinelibrary.wiley.com/doi/full/10.1002/mrm.27166
https://onlinelibrary.wiley.com/doi/full/10.1002/mrm.27096
https://doi.org/10.1007/978-3-030-00928-1_53
https://onlinelibrary.wiley.com/doi/full/10.1002/mrm.28234
https://doi.org/10.1002/mrm.29561

agudmund · February 5, 2023, 9:12am

Hi @MATEUS
The list you have is great! Additionally, you could consider:

- Truncation:
The standard number of points in an MRS acquisition is typically 1024, 2048, or 4096. However, you could push your network to perform with, for instance, 256 or 512 points.
- Residual Water:
You may already be including a residual water signal. In that case, I might at least add that including both positive and negative residual water signals would be important. You may also want to consider having a multi-component (10.1002/mrm.27824) water signal as well.
- Macromolecules:
It’s worth taking some time to consider how you’ll incorporate MM into your dataset. Perhaps start with the consensus article (10.1002/nbm.4393).
-Artifacts:
Including artifacts is another way to augment your data. Take a look at Roland Kreis’ article (10.1002/nbm.891)
- Leave Signals Out:
While this will not likely be a situation encountered with in-vivo data, leaving out metabolite signals, especially tCr, tCho, and tNAA, will challenge your network to not be too reliant on any 1 signal.

Just a few points about your list so far:
- Zero-order Phase:
I agree with @mmikkel, I would include zero-order phase from -π through π.
- First-order phase:
I might stick with -.20 through .20 radians… but I think I’ve gone as high as -.31 through .31 radians. Choosing a range of pivot points is also something you could consider. Pivoting on water (~4.7ppm) is a natural choice, but it depends on the pulse sequence.
- Amplitude Scaling:
Do you mean representing the amount of each metabolite or scaling (standard/min-max) your inputs for the network? Unfortunately, I’m not sure there is a good answer for the former. There are likely different ranges you want to consider for metabolites, especially if you want to capture differences in disease states. Lactate being the 1st to come to my head, can be non-existant in healthy to a heavy signal in tumors.
- Line Broadening:
Also agree wtih @mmikkel - 20Hz is generally the max I use.
- Frequency Shift:
Just to mention this as well - frequency shifts of -20hz through 20hz is probably safe. I typically go out to -40hz through 40hz to challenge my networks a bit more.
- Noise:
SNR can greatly range - but it will depend on how you’re going to define SNR in your model?

Hope this is helpful!
Aaron