IM Publications - Chemometrics

Calculation of SEC for PLS calibrations, n or n-k-1 as DF?

schultz — Wed, 16 Dec 2020 14:08:51 +0000

Forums:

Browsing through text books, norms, standards and this forum, consensus seems to be that for PLS models the SEC is calculated as

sec = sqrt(sum(y_i - y_i^hat)^2/df)

where df = n-k-1 (n=number of calibration samples, k=PLS factors, 1 if mean centered)

However when comparing to pupular software products, it seems consensus is to use df=n.

Unscrambler documents this in their Technical Reference, but as far as I have found this appears to be the standard.

The ISO 12099:2010 defines SEC as: "for a calibration model, an expression of the average difference between predicted and reference values for
samples used to derive the model
NOTE As for definitions C.3.4 to C.3.7, in this statistic, this expression of the average difference refers to the square
root of the sum of squared residual values divided by the number of values corrected for degrees of freedom, where 68 %
of the errors are below this value."

My interpretation of this definition is to use df=n, otherwise he 68% condition is not met.

Academic papers seem to agree that one PLS factor > 1 df, but differ in how to find that number bigger than 1.

So it appears the theoretical/statistical correct way is to use df=n-k-1; but that practical implementations use df=n.

Does anyone know why that is?

I have considered if it has to do with terminology, RMSEC vs SEC, but came to the conclusion that is not the case. In any case if anyone could enlioghten me it would be highly appreciated.

I might have missed an important point, whcih I would be happy to be made aware of.

Stay safe and enjoy the holidays ahead.

/jakob

Procedures for interpretation of 2Der/1Der loadings?

jvrijdag — Wed, 05 Aug 2020 10:45:04 +0000

Forums:

Chemometrics

Dear All,

I am investigating the starch and non-starch polysacharide (NSP) composition of grains, therefore I recently started working with IR and chemometrics. After a lot of trail-and-error (preprocessing of spectra, chemometrics software, PCA, etc.), I was delighted to see that during PCA some presumed clusters were formed.

To understand "why" some clusters were formed in PCA score plots, I turned my attention to PC loadings. Unfortunately, here I got slightly lost in interpretation. PC loadings can of course contain both positive and negative values. Also, I mainly used 2Der spectra for PCA, so every band from the original data is represented by 3 bands in the PC loadings.

So far I just picked a couple of large bands on both the positive and negative side of the loading, and tried to interpret/assign them. Since I am dealing with different polysacharides, each of them having complex IR spectra, I would like to interpret the PC loadings as rationally as possible. In the future I would like to purchase/prepare reference polysacharide compounds, but for now the focus is on obtaining as much useful information as possible from the PC loadings.

Is there a certain procedure to interpret such PC loadings from 2Der data? E.g., picking the largest bands (how many, typically?) or bands with certain characteristics, and assuming them to be "central" band of 3 in the 2Der spectra? Are there any data processing techniques for 2Der PC loadings, highlighting the most crucial regions?

Also during PCA of 1Der spectra I obtained useful groupings. Similar to my previous question, are there certain established procedures to interpret PC loadings from 1Der data?

Any help will be appreciated!

Johannes

P.S.: If this discussion fits better in the "Spectroscopy" Section, please feel free to move it there!

Quotient regression (NR), questions and data sets

ptillmann — Sun, 27 Oct 2019 12:54:08 +0000

Forums:

Chemometrics

Dear friends in NIR spectroscopy,

Data sets:
David Hopkins gave in NIR news 2016 a nice introduction to Norris regression. He uses two wheat data sets "WheglA" and "WheglB", which seem to originate from Karl's Lab.

Emails to the contact address of David from the article bounce. Does anyone have these two datasets and can provide them for me?

Questions:
Does anyone know whether Karl used any kind of scatter correction prior to derivatives? If I read carefully my perception is the only modifcation to the absorption data used are derivatives.

Yours

Peter

Definition of bias in ISO 12099

ptillmann — Tue, 03 Sep 2019 16:27:38 +0000

Forums:

Chemometrics

Hello,
(if there is still someone awake in this forum.)

In ISO 12099:2016 the definition of bias has changed (compared to ISO 12099:2007):

It was bias = sum (NIRS - ref) / n (2007),

it is bias = sum (ref - NIRS) / n (2016).

What seems to be a small change, is in fact truning the world upside down (I don't assume.our Australian hosts for NIR 2019 caused it.)

In my understanding a NIRS method with a bias of +1 unit, results in NIRS values 1 unit above the targeted reference level. But the new formular changes this. My understanding is supported by third party / third industry literature.

Anybody has noticed this change?
Anybody knows why it was changed?

Peter

PLS2 vs PLS1

duqqud — Fri, 24 Aug 2018 22:25:10 +0000

Forums:

Chemometrics

Hi all,

I am developing a pls model to predict a multivariate Y set.

The pls1 algorithm outperforms pls2 quite a lot. I was told this is common, but struggling to understand why.

Could someone explain this to me, in maths? P.S., I used the default SIMPLS algorithm.

Thanks

PLS validation option using Thermo Method Generator software

JuanG — Tue, 08 May 2018 20:10:25 +0000

Forums:

Chemometrics

Dear all,

Since few time I have to use the Method Generator software to build PLS model on a handheld NIR. The software is easy to use, for someone wich a basic chemometrics background. However, when we start to see in detail the calculations, it is not easy to find this information.

My question concerns the last step to generate the PLS model file, if you are familiar with the software you know that there is an option in the last Model parameter pop-up windows called PLS Validation (see image attached).

This option if I well understood has same aim than HotelingT2 and Xresiduals. In this software the statistics used are colled "scores (stdev)" and "Resid (stdev)". Based on that I understand it is the Standard deviation of scores (which ones?) and standard deviation of residuals. However, the default values are quite far of values I expected, for example software propose 5 for scores (stdev) and 15 for Resid(stdev), which is quite huge compared to my results in my data set. Of course depending of these limits the predicted spectra can be considered as valid or invalid, so they are quite important paramenter.

I tried to find how these values are calculated, but I dind not find any literature explaining how these values are calculated.

Hope someone in the community can help me to understand how these values are calculated, because in the Thermo user manual and other information there is nothing about.

Thank you in advance

Best Regards,

Juan G.

Uploaded Images:

[Question] Pre-processing internal standard

miguelG — Thu, 25 Jan 2018 18:24:13 +0000

Forums:

Chemometrics

Hello everyone,
Software OPUS has an option to select which pre-processing to use. One of those pre-processings is "Internal Standard".
My question is: does anyone know what kind of pre-processing this is? I mean, the mathematics behind it, because I cannot find anything online related to this pre-processing.
Thanks!

Outlier detection using mahalanobis [Question]

miguelG — Mon, 05 Jun 2017 16:15:54 +0000

Forums:

Chemometrics

Hello everyone,

Sorry if my question is too newby, but I have been debating over a problem that I have.
I want to predict outliers and I have been using software Quant from OPUs (bruker) to sort the outliers for me. For the construction of calibration and predictive models I use /Toolbox for matlab.
My question is: what is the mathematical formula for outlier detection in NIR spectra using mahalanobis distance with PLS?
Can you please explain with some detail because I have reasearched in books and papers and tried many ways but none seem to work (when compared to the values obtained by the software OPUS), maybe I am missing something...

Any help is appreciated!

data interpretation of TERMO MICROPHAZIR

ForestBiotech — Mon, 03 Apr 2017 15:34:19 +0000

Forums:

Chemometrics

Hi all.
I' m new user in NIR development, and I'm using the Method Generator Software for analyze the data of Microphazir equipment. This give me the following results (picture), and I've some question for their interpretation
RMSE= RMSEP?
and the values the Slope and Offset belong to the coefficients of line equations?
For example the picture
Y= 23.30146 + 0.7118447X + 0.1922865 (error)
is this correct?

Thank in advance

Jorge

Uploaded Images:

Advise on ANN method

Nhi — Thu, 16 Feb 2017 03:30:01 +0000

Forums:

Chemometrics

Dear all,
I just develop calibration model on WINISI and by PLS algorithm. Now I would like to learn more about ANN method and another software. Can you give some advice about:
- What softwares I should consider to use (advantages, disadvantages?, lincense fee?, anual fee?)
- How many samples is enough to build a non-linear model?
- Some sources are available to learn these software?
Thanks in advance for your help,
Nlt