Fit data

nPDyn relies on a builtin implementation to model and fit data, but provides also some methods to fit your data using lmfit as modelling and fitting backend.

In the following, we will introduce data modelling using two type of data and analysis, the first being the fit of quasi-elastic neutron scattering (QENS) measurement on a protein solution sample and the second the fit of elastic fixed-window scans (EFWS) of a protein powder sample.

The QENS data will be modelled using the following:

(1)\[\rm S(q, \hbar \omega) = R(q, \hbar \omega) \otimes \beta_q \left[\alpha \mathcal{L}_{\Gamma} + (1 - \alpha) \mathcal{L}_{\Gamma + \gamma} \right] + \beta_{D_2O} \mathcal{L}_{D_2O}\]

where \(\rm q\) is the momentum transfer, \(\rm \hbar \omega\) the energy transfer, \(\rm R(q, \hbar \omega)\) is the resolution function (here a pseudo-Voigt profile), \(\rm \beta_q\) a vector of scalars accounting for detector efficiency (one scalar for each q), \(\rm \alpha\) a scalar between 0 and 1, \(\rm \mathcal{L}_{\Gamma}\) a Lorentzian of accounting for center-of-mass diffusion with a explicit q-dependent width \(\rm \Gamma = D_s q^2\), where \(\rm D_s\) is the self-diffusion coefficient, \(\rm \mathcal{L}_{\Gamma + \gamma}\) is a Lorentzian accounting for internal dynamics with \(\rm \gamma = \frac{D_i q^2}{1 + D_i q^2 \tau}\) (see [1]) and \(\rm \beta_{D_2O} \mathcal{L}_{D_2O}\) accounting for the signal from the \(\rm D_2O\).

The EFWS data will be modelled using a simple Gaussian to extract the mean-squared displacement (MSD) as a function of temperature:

(2)\[\rm S(q, 0) = e^{-\frac{q^2 MSD}{6}}\]

We use the sample data in the the test suite of nPDyn (from package root directory, use cd nPDyn/tests/sample_data/ and we initiate our dataset using, for QENS:

>>> from nPDyn.dataParsers import processNexus
>>> import numpy as np
>>> qens = processNexus('lys_part_01_QENS_before_280K.nxs')
>>> vana = processNexus('vana_QENS_280K.nxs')
>>> ec = processNexus('empty_cell_QENS_280K.nxs')
>>> buffer = processNexus('D2O_QENS_280K.nxs')
>>> # Perform some data processing
>>> qens, vana, ec, buffer = (
... val.bin(5) for val in (qens, vana, ec, buffer)
... )
>>> qens, vana, buffer = (
... val - 0.95 * ec for val in (qens, vana, buffer)
... )
>>> qens, vana, ec, buffer = (
... val.get_q_range(0.4, 1.8) for val in (qens, vana, ec, buffer)
... )
>>> # Extract momentum transfers for modelling and make it 2D
>>> q = qens.q[:, None]

and for EFWS:

>>> from nPDyn.dataParsers import inxConvert
>>> efws = inxConvert.convert('D_syn_fibers_elastic_10to300K.inx', True)
>>> efws = efws.bin(5, 0)
>>> efws /= efws[:5].mean(0)
>>> efws = efws.get_q_range(0.2, 0.8)

Using builtin model backend

The builtin modelling interface has been designed to be easy to use and adapted to the multi-dimensional dataset obtained with neutron backscattering spectroscopy and a mix of global and non-global parameters.

The basic workflow is as follows:

  1. Create a Parameter instance with parameters that can be scalar, 1D, 2D or any shaped arrays.
  2. Create a Model instance that is initiated with the prviously created parameters.
  3. Add several Component or other Model to this model. Each component is associated with a Python function, the arguments of which can be dynamically defined at the creation of the component using an expression as a string as shown below.
  4. Fit your data!

For the QENS data, we first model the resolution function using a pseudo-voigt profile. To this end, we use the builtins.modelPVoigt() builtin model from nPDyn. The same is done for \(\rm D_2O\) background using the builtins.modelD2OBackground() builtin model.

Simply use:

>>> from nPDyn.models.builtins import modelPVoigt
>>> from nPDyn.models.builtins import modelCalibratedD2O
>>> vana.fit(modelPVoigt(q, 'resolution'))
>>> buffer.fit(modelCalibratedD2O(q, temp=280))

With a little anticipation on this documentation, you can use the following to look at the fit result:

>>> from nPDyn.plot import plot
>>> plot(vana, buffer)

Create parameters

For the QENS sample, there are 6 parameters, namely \(\rm \beta_q\), \(\rm \alpha\), \(\rm D_s\), \(\rm D_i\), \(\rm \tau\), and \(\rm \beta_{D_2O}\).

We can thus create the Parameters instance:

>>> from nPDyn.models import Parameters
>>> pQENS = Parameters(
...     beta={'value': np.zeros_like(q) + 1, 'bounds': (0., np.inf)},
...     alpha={'value': 0.5, 'bounds': (0., 1)},
...     Ds={'value': 5, 'bounds': (0., 100)},
...     Di={'value': 20, 'bounds': (0., 100)},
...     tau={'value': 1, 'bounds': (0., np.inf)},
... )

For the EFWS sample, we only have the MSD and we use a slightly different way to instantiate the Parameters instance for demonstration purpose:

>>> from nPDyn.models import Model
>>> pEFWS = Parameters(msd=0.5)
>>> pEFWS.set('msd', bounds=(0., np.inf), fixed=False)

Instantiate a Model

Instantiating a Model is very straightforward, just use:

>>> modelQENS = Model(pQENS, 'QENS')  # for QENS data
>>> modelEFWS = Model(pEFWS, 'EFWS')  # for EFWS data

Add components

The modelQENS model should contain three components, or three lineshapes, as we can see in equation (1), namely a Lorentzian for center-of-mass diffusion, a Lorentzian for internal dynamics and the model we used for \(\rm D_2O\) background. We can add them using:

>>> from nPDyn.models import Component
>>> from nPDyn.models.presets import lorentzian
>>> modelQENS.addComponent(Component(
...     'center-of-mass',
...     lorentzian,
...     scale='beta * alpha',  # will find the parameters values in pQENS
...     width='Ds * q**2',  # we will give q on the call to the fit method
...     center=0))  # we force the center to be at 0
...                 # (as it is given by the convolution with resolution)
>>> # we can add, subtract, multiply or divide a model using a Component or
>>> # another Model
>>> internal = Component(
...     'internal',
...     lorentzian,
...     scale='beta * (1 - alpha)',
...     width='Di * q**2 / (1 + Di * q**2 * tau)',
...     center=0)  # we force the center to be at 0
...                # (as it is given by the convolution with resolution)
>>> modelQENS += internal
>>> # for the D2O signal, we use a lambda function to include the scaling
>>> # note this can be done automatically with the 'bkgd' and
>>> # 'volume_fraction_bkgd' arguments of the fit function.
>>> modelQENS.addComponent(Component(
...     '$D_2O$',  # we can use LaTeX for the component and model names
...     lambda x, scale: scale * buffer.fit_best(x=x)[0],
...     scale=0.95,
...     skip_convolve=True))  # we do not want to convolve this
>>>                           # component with resolution

The modelEFWS model uses the momentum transfer q as independent variable, which will be passed later upon fitting and it contains only one component. Here, we use:

>>> from nPDyn.models.presets import gaussian
>>> modelEFWS.addComponent(Component(
...     'EISF',
...     lambda x, scale, msd: scale * np.exp(-x**2 * msd / 6)))

Fit data

The class sample.Sample provides a method to fit the data.

Here, we use it and write for QENS:

>>> qens.fit(
...     modelQENS,
...     res=vana,
...     fit_method='basinhopping',
...     fit_kws={'niter': 10, 'disp': True}
... )

and for EFWS, where we set the independent variable to a column vector containing the momentum transfer q values:

>>> efws.fit(
...     modelEFWS,
...     x=efws.q[:, None]
... )

The fitted parameters can be saved in JSON format using (for the first observable):

>>> qens.params[0].writeParams(<'file_name'>)

Subsequently, the parameters can be imported using:

>>> qens.params[0].loadParams(<'file_name'>)

Using lmfit backend

In addition to the builtin model interface of nPDyn, the API also provides some helper functions to use the lmfit package. This package is more advanced and exhaustive than the builtin model interface but it is less adapted to multi-dimensional dataset with global and non-global parameters.

This is where the presets and builtin models in nPDyn come into play, to make it easier to use within the analysis workflow of neutron backscattering data.

The interface with lmfit relies on the lmfit_presets.build_2D_model() function.

We present here the analysis of QENS data using equation (1).

Build model

The function lmfit_presets.build_2D_model() uses a formatted string to build a 2D model where the words flanked by curly braces {} are considered as parameters.

The resolution function and the \(\rm D_2O\) background signal can make use of the provided presets lmfit_presets.pseudo_voigt() and lmfit_presets.calibratedD2O(), we thus use:

>>> from nPDyn.lmfit.lmfit_presets import pseudo_voigt, calibratedD2O
>>> vana.fit(pseudo_voigt(q, prefix='res_'))
>>> buffer.fit(calibratedD2O(q, 0.95, 280, prefix='D2O_'))
>>> q = qens.q

To build the model for the protein sample, we use the function lmfit_presets.build_2D_model() to get the part inside square brackets in (1) and we will convolve with the resolution and add the D2O manually:

>>> from nPDyn.lmfit.lmfit_presets import build_2D_model
>>> # let us start with the formatted text for the center-of-mass term.
>>> comText = ("{beta} * {alpha} * {Ds} * {q}**2 / (np.pi * "
...            "(x**2 + ({Ds} * {q}**2)**2))")
>>> # same for the internal dynamics term
>>> jumpDiffText = ("{beta} * (1 - {alpha}) * "
...                 "{Di} * {q}**2 / (1 + {Di} * {q}**2 * {tau}) / "
...                 "(np.pi * (x**2 + ({Di} * {q}**2 / "
...                 "(1 + {Di} * {q}**2 * {tau}))**2))")
>>> # now we build the components
>>> comModel = build_2D_model(
...     q,
...     'com',
...     comText,
...     paramGlobals=['alpha', 'Ds'],
...     bounds={
...         'beta': (0., np.inf),
...         'alpha': (0, 1),
...         'Ds': (0.01, np.inf)},  # non-zero min to avoid infinites
...     defVals={'alpha': 0.5,
...              'Ds': 5,
...              'beta': 1},
...     prefix='com_')
>>> jumpDiffModel = build_2D_model(
...     q,
...     'jumpDiff',
...     jumpDiffText,
...     paramGlobals=['alpha', 'Di', 'tau'],
...     bounds={
...         'beta': (0., np.inf),
...             'alpha': (0, 1),
...             'Di': (0.01, np.inf),  # non-zero min to avoid infinites
...             'tau': (0., np.inf)},
...         defVals={'beta': 1,
...                  'alpha': 0.5,
...                  'Di': 30,
...                  'tau': 10},
...         prefix='jd_')
>>> # and we assemble them
>>> model = comModel + jumpDiffModel
>>> # some parameters are the same for the two components,
>>> # so we set them equals using 'expr' hint
>>> model.set_param_hint('com_alpha', expr='jd_alpha')
>>> for i in range(q.size):
...     model.set_param_hint('com_beta_%i' % i, expr='jd_beta_%i' % i)

And finally, we add the \(\rm D_2O\) signal with a scaling factor:

>>> # now we add the component for the D2O signal
>>> from nPDyn.lmfit.lmfit_presets import hline
>>> scale = hline(q, prefix='bD2O_')
>>> d2OModel = scale * qens.D2OData.model
>>> d2OModel.param_hints.update(qens.D2OData.getFixedOptParams(0))
>>> fitModel = model + d2OModel

Fit data

Data fitting can be done using the same functions as when using the builtin models. The fit_method and some other keywords are different and should correspond to the keywords expected in lmfit (see lmfit documentation for details).

Here, we can simply use:

>>> qens.fit(fitModel, res=vana)

to fit the data using lmfit default parameters.