Application of check-all-that-apply and non-metric partial least squares regression to evaluate attribute's perception and consumer liking of apples

Attribute's perception and impact on liking are crucial in quality assessments. An approach with analytical and sensory methods is often necessary to ensure a reliable description of quality perception without neglecting product's characterization. More-over, the presence of nonlinear patterns demands appropriate models. A methodology is proposed to assess quality and attributes of apples cultivated at different orchard's elevation. Physicochemical, sensory and consumer tests with check-all-that-apply and penalty analysis were performed. A non-metric partial least square model (NM-PLS) was applied with a new coding system. The methodology highlighted similar results at all assessments for taste, with no significant differences among samples. Texture was evaluated different among assessments. Differences were found for hardness and astringency only at the panel level, while there was an agreement considering crispness, better described by the analytical index “ average drop. ” The NM-PLS model confirmed sweet, intense odor and the latent attribute “ notes ” as the most related to liking. The methodology confirmed that differences found among trained panelists my not be relevant for consumers, therefore, an integrated approach is needed. It shows a new system to create a regression model that provides information at a specific and a global attribute level to deal with multicomponent parameters.


| INTRODUCTION
The investigation of an attribute's perception and its impact on liking of fruit products is a main step in most of the quality assessments performed in the research and industrial fields.When a quality assessment is performed to obtain a deeper understanding between the quality profile (represented by attributes) and liking, the use of a consumer driven approach is an essential step to obtain more reliable information since most of the assessments involving analytical methods and trained panelists can lead to a poor description of consumer's perception (Ares, Dauber, Fern andez, Giménez, & Varela, 2014;Blaker, Plotto, Baldwin, & Olmstead, 2014).This phenomenon is even more felt for products that are described by multicomponent attributes, such as apples.In this case, the attribute texture is very important to understand the liking of the product (Charles et al., 2018;Corollaro et al., 2014) and involves complex parameters such as crispness and juiciness, for which there is still a lack of consensus regarding which analytical method to be used (Chen & Opara, 2013).Therefore, to describe an apple's quality, the use of trained panelists may be more advantageous due to their ability of integrating different aspects of an attribute to formulate a global judgment (Contador, Shinya, & Infante, 2015), such as sound and hardness to describe apple's crispiness.A drawback of this approach is the fact that trained assessors may describe the product differently and detect differences among samples that may be irrelevant for consumers (Chen & Opara, 2013).
On the other hand, a quality assessment based exclusively on the consumer response is not often exhaustive due to consumers not being able to describe why they like a product (ten Kleij & Musters, 2003).
Therefore, an integrated approach and proper techniques at the consumer level are key to perform a successful quality assessment.
To improve the consumer test assessment, 1-5-point scales such as Just-about-right (JAR) scales are widely applied to evaluate an attribute's perception (Ares et al., 2014;Estiaga, 2015;Hampson & Quamme, 2000;Robertson, Meredith, Senter, Okie, & Norton, 1992;Zamzami & Ariyawardana, 2019).JAR became a very popular tool to investigate the optimum intensity of sensory attributes (Estiaga, 2015) and is often combined with the Penalty analysis in order to obtain mean drop estimates of liking caused by suboptimal attributes (Pagès, Berthelo, Brossier, & Gourret, 2014).Currently, in order to reduce bias linked to the use of JAR scales, where consumers may focus on sensory characteristics that they would not normally do, the Penalty analysis has been applied also on Check-all-that-apply (CATA) questions (Ares et al., 2014) to evaluate liking and apple's attribute perception.In Both JAR scales and CATA questions approaches, the results of penalty analysis can be combined with regression techniques to predict liking and this has been done by converting JAR or CATA attributes scales into dummy variables (Ares et al., 2014;Li, 2013;Xiong & Meullenet, 2006).
Multivariate or regression techniques are needed since the one factor analysis of variance model applied on penalty analysis results does not take into account correlations among attributes (Pagès et al., 2014) and, therefore, the approach is not appropriate to predict liking from JAR or CATA data (Xiong & Meullenet, 2006).The partial-least-square (PLS) model, a linear method that deals with multicollinearity of parameters (Mendes da Silva, Briano, Peano, & Giuggioli, 2020), has been proposed to fulfill these goals (Ares et al., 2014;Xiong & Meullenet, 2006).
Another important concern in the assessment of attributes and liking is the presence of nonlinear relationships among data due to the use of ordinal scales.Different authors have pointed the incorrect use of 1-5-point scales as interval scales rather than ordinal ones in consumer research (Li, 2013;Xiong & Meullenet, 2006).When using JAR scales, little or no behavioral differences may be present between consumers who rate a product at "just about right" and "too little," while the difference between "too little" and "much too little" could be huge.This could be linked to the fact that consumers who assume "stronger positions" concerning their attitude toward an attribute are more prone to penalize the liking of products than consumers that assume "weaker positions" (De Langhe, Puntoni, & Larrick, 2017).Moreover, it is well known that different attributes may display several types of relationships with consumer liking, as explained in the Kano model (Rivière, Monrozier, Rogeaux, Pagès, & Saporta, 2006).
For example, many attributes may display mainly a linear relationship with the degree of liking or disliking, such as sweetness perception in apples, apricots, strawberries and peaches (Lado et al., 2019;Mendes da Silva, Peano, & Giuggioli, 2019;Molina et al., 2006;Stanley, Prakash, Marshall, & Schröder, 2013), however, more complex attributes regarding texture or aroma may stimulate consumer satisfaction when they are perceived in the product but not necessarily stimulate disliking when they are absent (Lado et al., 2019;Mendes da Silva, Peano, et al., 2019).Furthermore, this situation may vary across horticultural crops since consumer's expectations on attributes depends on the type of product.Thus, the use of nonlinear models and division of attributes according to their influence on liking are essential to properly predict liking.The first goal of this work is to conduct an integrated quality assessment using analytical methods, a trained panel and a consumer driven approach in order to evaluate differences between apples cultivated at two orchards with different levels of elevation.The second goal of this work is to build a prediction model of liking using the information obtained at all levels of the quality assessment.
"Area 2" in the northern part of Italy.Meteorological information for each chosen site were obtained from stations near the orchards (meteorological regional information center) and are reported in the The CIEDE2000 Color-Difference index (DE 2000) was used in order to assess color difference of samples.The DE 2000 is more complicated and more accurate since it takes into account the different sensitivities of the eyes to different colors.For brevity, the calculation is illustrated in Sharma, Wu, and Dalal (2005).
Fifteen replicates of each apple sample were compressed with a texturometer TA.XT2+ (Stable Micro Systems, Surrey, United Kingdom) with a compression platen (diameter 75 mm).Since shape and dimension of samples may strongly influence compression tests, the replicates were cut longitudinally in two halves and each half was laid down and compressed at a pre-test speed of 5 mm/s, test speed of 10 mm/s and post-test speed of 10 mm/s.The distance was set to 8.0 mm and the trigger force was 5 g.The first peak of the forcedistance curve was registered as hardness of samples.The software compression's script was modified in order to register the jaggedness of the force-distance curve.A high jagged line is expected for crispy food since the peaks produced are a result of the fracture events that have occurred during the test (Campbell, 2016).Therefore, the number of positive and negative peaks after the first force peak from the curve were registered, as illustrated in Figure 1, and the average drop parameter was automatically calculated by the software.This function calculates the average drop in force between consecutive peaks and troughs over a selected region.It has been suggested that crunchiness or crispness are associated with the mechanical force required to compress food until it yields suddenly and fractures into small pieces along with a pleasant acoustic output (Tunick et al., 2013).Therefore, by measuring the number of peaks and the average drop parameter it is expected to measure to some extend the crispness of apple samples.-2003- (ISO, 2003) ) were used: a 1-9 continuous intensity scale with one end, "too far away from the reference," and the opposite end, "very close to the reference," for an overall quality assessment is important to note that the term indicated as "overall-quality" was used to describe how close to the reference the tasted samples were overall (summarizing all the sensory descriptors used in this work), as described in Mendes da Silva, Peano, et al. ( 2019) work.It was not used to express the hedonic linking of samples, A continuous intensity 1-9 scale with one end, "extremely low intensity," and the opposite end, "extremely high intensity," was used to assess descriptive sensory attributes.In this work, panelists were asked to not consider aspects and color of the product, but only to focus on taste, texture and aroma.Therefore, the descriptive sensory attributes were: ripe apple odor, berry-like odor, hardness, crunchiness, astringency, sweetness, sourness, and aroma.Panelists were presented with four apple slices from different fruits.

| Consumer test assessment
Forty consumers of a specific target were recruited from Cuneo, Italy, where around 60% of apple's cultivation is represented by red apples were first asked to score their overall liking using a 9-point hedonic scale anchored at "dislike very much" (1) and "like very much" (9) for each product.A CATA question with 17 attributes related to sensory characteristics of apples was compiled.Consumers were instructed during a 1-hr lesson in order to ensure the meaning of these terms were well-understood.During the test, they were asked to check only the attributes they considered to be appropriate descriptors of each samples.The attributes were selected based on previous literature for apple (Ares et al., 2014 Example of a force-distance curve obtained by the modified compression test's script.The number of positive (green) and negative (red) peaks and the average drop were automatically calculated between the third and fourth anchors (anchors in red) between the means of apple samples for all sensory attributes scored by the panel test.

| Consumer test assessment
The consumer test methodology was similar to Ares et al. (2014) work.Overall liking scores were analyzed using the paired t test considering sample as fixed source of variation and consumer as a random effect.Frequencies of each sensory attributes from CATA questions were determined by counting the number of consumers that used that term to describe each sample and the ideal.Only attributes that were cite at least by 20% of consumers were considered to be important for the product's perception.The McNemar test was used to assess significant differences between samples for all terms included on the CATA question.

Penalty analysis
The penalty analysis was carried out on consumer responses to determine the drop in overall liking scores associated with a deviation from the ideal for each attribute from the CATA question.However, coding of CATA frequencies of samples and ideal CATA questions was done differently from the binary coding proposed in Ares et al. (2014) work, where from each attribute derived two coded categories: the agreement between samples and ideal CATA questions was coded as 1 for each consumer (e.g., attribute checked or not checked for both CATA's questions) and the disagreement was coded as 0 (e.g., attribute checked in ideal but not for sample's CATA question and vice-versa).
In this work, the coding was done splitting the disagreement situations in two codes (Table 2): À1 if an attribute was checked in ideal but not for sample's CATA question and 1 if the opposite was true.
The 0-code remained represented by the agreement between both ideal and sample CATA's question (for checked and non-checked attributes).Therefore, as similarly proposed in Plaehn (2012), the 0 response for CATA is analogous to the JAR response and the 1 and in order to first classify attributes as "must be," "reversal" and "attractive" attributes, and then coding was performed specifically for each type of attribute."Must-be" attributes are those which consumers that expect to but cannot perceive them in the product, will penalize the product greater than consumers that do not expect the attribute on their ideal but perceive it.Considering Gala apples, sweetness is a must-be attribute since it is highly expected to be found in the product among consumers that prefer sweet apples (Plotto, 1998)."Reversal" attributes are those which consumers that do not expect to but perceive them in the tasted product will penalize it greater than consumers that expect to but do not perceive them in the tasted product.
This can be applied to aroma notes that are not expect, such as green apple notes in sweet cultivars of apples.Finally, "attractive" attributes are those which consumers that expect to but cannot perceive them in the tasted product will penalize it while consumers that do not expect to but perceive them in the product will not penalize it since they are positive even though when they are not expected.An example of an attractive attribute is crispness, which is nearly always appreciated in the product since it is related to product's freshness, despite genotype differences and consumer preference.Therefore, for each type of attribute the penalty categories were re-coded in À2 (greater penalization), À1 (smaller penalization) and 0 (close to the ideal) as shown in Table 3.
Once the penalty categories were transformed, the NM-PLS model was applied with the R package "plspm" (Sanchez, Trinchera, & Russolillo, 2017).Only attributes that were considered as deviated from the ideal for at least 20% of the consumers were used.They were grouped in latent variables that represent a global quality attribute: "texture," "taste" and "notes."In this way, it will be possible to assess how a specific attribute impact overall liking, such as sweet, T A B L E 2 Illustration of penalty analysis coding process and the resulting penalty analysis category (0, 1 and À1)

Attribute17-Ideal1
Penalty category without neglecting the assessment of a global quality attribute, such as taste.This approach is more in agreement with the human way of thinking, which integrates all parameters to formulate a global judgment of quality rather than consider each parameter separately (Giuggioli, Peano, & Mendes da Silva, 2020).The PLS was created for the estimation of a model with metric data as it is needs numerical variables in order to estimate the relationships between latent variables and manifest variables (Russolillo, 2010).However, in this work the NM-PLS was applied since it is assumed that the new coding originated variables (attributes) that are composed by ordinal states (0, À1 and À2) depending on the degree of mean drop.Moreover, it is also expected that mean drops between the À1 and À2 codes in comparison to the 0 code are different, which denotes the presence of nonlinearity as the scale is not wide enough to assume continuity for those variables (Nappo, 2009).Therefore, all predictors were introduced as ordinal variables while overall liking scores were introduced as a continuous dependent variable.Cronbach's alpha and Dillon-

| Physicochemical and sensory analysis of apple samples
There were significant differences considering the color of apples as shown by the colorimetric analysis (Figure 2).In particular, apples from Area 1 were redder (as shown by higher a* values) and presented a more intense color (as shown by higher Chroma values).The DE 2000 index was equal to 2.79, a value that is higher than 2.50, the threshold value used to indicate visible differences between two products (Sharma et al., 2005).Despite color differences, both samples presented very similar quality attributes considering most of the parameters assessed and there was a good agreement between physicochemical data and the sensory scores given by panelists.There were no significant differences among Area 1 and Area 2 samples considering TSS and TA values (Figure 3), however, when taking into account the TSS/TA ratio values, the apple fruits would be classified in two distinct classes of taste, as the samples cultivated in Area 2 would be considered as sweet apples due to a TSS/TA ratio value over 20 (21.6) and Area 1 samples would be considered as sour apples as they presented a TSS/TA value below 20 (15.5), according to previous researches (Almeida & Gomes, 2017;Mendes da Silva, Marinoni, et al., 2019).
Concerning the TPA analysis, there were no differences among products considering the parameter hardness.Both samples produced very jagged lines, with an average number of peaks equals to 18.8 for the Area 2 sample and 17.4 for the Area 1 sample.However, only the average drop of peak's force displayed significant differences among samples (Figure 3), with the Area 2 samples presenting a higher value.
Therefore, results from the sensory analysis are important in order to verify which one of the two parameters (the average drop or the number of peaks) is a better indication of perceived crispness.
Results from sensory test (Figure 4) confirmed TSS and TA values: no differences in sweetness or sourness perception were found among samples.Concerning the texture parameters, Area 1 samples were found to be less hard and less crispy than Area 2 samples.Therefore, the average drop parameter was more in agreement with the sensory test than the number of peaks and it is suggested to be a better indicator of crispness.Up to our knowledge, very few studies have assessed crispness using the same compression test applied in this work, therefore, these results still need to be confirmed in other sensory trials and for other cultivars, as each apple cultivar may respond differently to a same type of texture test (Corollaro et al., 2014).
Moreover, it is also necessary to extend the type of assessment used in this work to other horticultural crops.
T A B L E 3 Coding process of penalty categories according to attributes classification in "must-be" attributes, "reversal" attributes and "attractive" attributes Must-be attributes (attributes with the lower likingmean on category À1) Reversal attributes (attributes with the lower likingmean on category 1)

Attractive attributes
Penalty category coded for penalty analysis as À1 (attributes checked in the ideal CATA question but not in the product)

À2 À1 À2
Penalty category coded for penalty analysis as 1 (attributes checked in the product CATA question but not in the ideal) Penalty category coded as 0 (attributes checked or non-checked for both ideal and product CATA's question) 0 0 0 Abbreviation: CATA, check-all-that-apply.
The results of sweetness perception did not confirm the results of the analytical parameter ratio TSS/TA, which have suggested samples differ in terms of sweetness perception.In the sensory test, the parameter was found to be not significantly different among samples.
This could be explained by the differences found in texture properties, especially in terms of crispness, which could have influenced the sweetness perception of samples.It is well known that interactions between flavor and texture attributes can lead to poor prediction of sweet taste by instrumental data (Corollaro et al., 2014).
Finally, Area 1 apples were perceived as less astringent by panelists than the Area 2 samples, even though both scores were below the scale average, indicating the sensation was not intense for both samples.Scores were low also for berry-like odor, a common descriptor of "Gala" apples (Plotto, 1998).Even though there were no differences in the overall quality scores among samples, it is important to note how Area 2 samples presented better quality characteristics considering texture attributes.Those results are in agreement with literature.Apples that are from orchard at higher altitudes usually display better quality characteristics than lowland apples, especially considering the color development (Charles et al., 2018;Neri, 2004), however, many articles showed that differences between mountain and hillside cultivations are not consistent and may vary across different apple varieties (Gregori, Bergamaschi, Berra, & Folini, 2016).For what concerns Gala, superior quality of mountain apples in the Italian territory is not clear and, differently from other cultivars such as Golden delicious and Fuji, Galas are still advised for all cultivation systems (Bassi & Fideghelli, 1995;Charles et al., 2018).

| Consumer test assessment
There was a good agreement between the panel test results and the consumer test for many quality attributes.For what concerns the overall liking scores, there were no differences among samples, with 6.61 and 6.89 for Area 1 and Area 2 samples, respectively (data not shown).The McNemar test pointed differences among samples considering the number of CATA frequencies only regarding mealiness (Figure 5), with Area 1 samples resulting more cited than Area 2 samples, even though both products presented a higher and similar number of CATA frequencies for the crispy attribute, while mealiness had fewer citations.Therefore, it can be concluded that some consumers might have perceived differences among samples regarding the crispness of products, which is expressed in terms of mealiness in the CATA questionnaire.It is important to note that even though some consumers have perceived differences among samples in terms of mealiness, the most part of them considered both products to be crispy.Therefore, this fact could be linked to the absence of significant differences among overall liking scores of samples as liking is known to be driven mainly by textural factors, as described in previous works (Charles et al., 2018;Corollaro et al., 2014), The Figure 6 indicates the attributes that were cited by at least 20% of consumers and it is possible to note how samples presented similar number of citations for most cited descriptors such as crispy, juicy, sweet, and ripe apple notes.It is also clear that for attributes that were cited least, there is a greater difference on the number of citations, even though not significant.

| Penalty analysis results
Considering products were very similar, the penalty analysis was calculated for each attribute from all products.This approach is advisable since the main interest of this work is to evaluate attributes that affect Gala apples as a whole instead of apples from a specific cultivation system.With this approach the results obtained in this article are more extendable to the population (Pagès et al., 2014).
From Table 4, it is possible to observe that only the penalty coefficients of sweet and intense apple odor were significantly different from zero, and in both cases for the disagreement category À1.The disagreement category À1 indicates that both attributes were checked for the ideal product but were not checked for the tasted samples in the CATA questionnaire.Therefore, the fact that the penalty coefficient was found significant for both attributes indicates that consumers who did not perceive the samples as sweet and with an intense apple odor have penalized the overall liking scores significantly.The percentage of consumers that penalized the samples under the two attributes was high, especially in the case of Area 2 samples with regard the attribute intense apple odor: in this case the percentage was almost half of participants (44.74%).It is interesting to note that, even though the attribute sweet was highly cited on CATA question for both products, the remaining part of consumers that did not cite this attribute on the sample's CATA question had a huge impact on the overall linking mean drop.Therefore, it could be assumed that even though samples were considered to be sweet, they were not sweet enough, at least for an important part of the participants.
By assessing the coefficients, it is possible to identify many attributes with a negative penalty coefficient, such as crispy, aromatic, citric notes and honey notes concerning the disagreement category 1.The disagreement category 1 indicates consumers that have not checked the attributes for the ideal product but have checked them for the samples.A negative penalty coefficient indicates that the mean of overall liking scores of consumers who represent this category is higher than the mean of consumers that represent the agreement categories.This means that, even though consumers did not expect to find those attributes in their ideal apple, the presence of these attributes on the samples did not lead to a drop of overall scores.This is in agreement with Plaehn (2012) work, where the author stated that "out-of-JAR categories" (1 and À1 in this work) are not necessarily associated with mean drops ("penalties") for the CATA penalty case.Therefore, these attributes are considered to be attractive, as described in the Section 2, and must be codified differently from the others when creating the PLS model.
Even though many penalty coefficients were not significantly different from zero, there was a huge percentage of times where a disagreement between the samples and the ideal citations occurred, especially in the case of the attributes: crispy, fibrous, sweet, sour, intense apple odor, aromatic, ripe apple notes, green apple notes and honey notes.For these attributes, at least 20% of times a disagreement occurred, and therefore, following Ares et al. (2014) work, those attributes were chosen to be used as independent variables of the PLS model.It is interesting to note that the number of consumers that penalize samples for not being sour (the À1 category) was higher than the number of consumers that did not choose this attribute in their ideal but found it on the samples.It is well described in literature how sourness of apples may be perceived as a negative attribute (Molina et al., 2006), however, it is also well described how this attribute may enhance the flavor of product when it is perceived along with the sweetness taste (Harker et al., 2002;Mendes da Silva, Marinoni, et al., 2019).
F I G U R E 6 List of attributes cited at least by 20% of consumers on check-all-that-apply (CATA) questions of Area 1 and Area 2 samples

| Prediction of liking with the NM-PLS
As a result of the Penalty analysis assessment, the prediction model of liking was created using the NM-PLS model.It was created by grouping the attributes crispy and fibrous under the latent variable "texture," sweet and sour were grouped under "taste" and intense apple odor, aromatic, ripe apple notes, green apple notes and honey notes under the group "notes."As suggested by Sanchez ( 2013), all NM-PLS selective ratios indicated "texture" and "taste" variables should be classified as reflective latent variables, with both presenting Dillon-Goldstein's rho values >0.70 (0.707 in both cases), while the group "notes" was classified as formative as communality's values of all attributes were <0.50.This was expected since sour and sweet penalties are known to be highly related in fruit products and represent the same concept (sweetness) (Pagès et al., 2014), therefore, the attributes are highly explained by the latent variable "taste," with high The one-dimensional penalty results: the CATA attributes, the penalty categories that represent the disagreement (À1 and 1), the percentage of times the disagreement penalty categories are given for each sample (frequency of consumers columns), the penalty coefficient and its associated p value (≤.05) developed.Few differences were detected among samples, mainly concerning color and texture attributes.Most of the results at the three levels of the assessment were in agreement, such as the similarities among samples concerning the taste perception, which was confirmed by similar levels of TSS and TA, other than the lack of significant differences concerning the attributes scores of sweetness and sourness in the panel test assessment and the CATA frequencies of sweet and sour attributes at the consumer level.Interestingly, slight differences concerning the texture properties and perception were detected among the three levels of the assessment.The panel test pointed Area 2 samples as harder, crispier and more astringent, while the consumer test suggested the Area 2 samples were less mealy but not harder (which was in agreement with the TPA test) and a higher astringency of Area 2 samples was not detected.These results confirm how important is to implement an integrated assessment, where differences that are perceived by panelists but not by consumers are detected and valuable information from samples characterization is not neglected.Finally, a NM-PLS regression model applied on penalty analysis results with a new system of codification is proposed by taking into account penalty results.It successfully confirmed sweet and intense apple odor as the attributes with the highest impact on liking scores and its mean drop, however, the assessment also allowed to obtain valuable information on the liking impact of global and complex quality attributes, such as "notes," "taste" and "texture."The integration of attributes showed a higher impact of "notes" on liking compared to "texture" and "taste" and these results are not in agreement with other works where texture is claimed to be the most important driver on consumer liking of apple.However, the methodology was applied with a very small number of consumers and therefore, the consumer test results still need to be validated on broader population sample.At the same way, the average drop on force calculated by the compression test need to be tested on a higher number of samples and on different horticultural crops in order to better investigate its relation with the crispness perception.
Ten panelists from Sata SRL (Alessandria, Italy) were selected and trained in sensory evaluation of apples as recommended by ISO 8586(ISO, 2012).The training program was carried out weekly for 3 months and was divided in theoretical and practical sessions.Panelists were trained by discussing the definition of quality parameters selected for sensory evaluation, explaining the score sheet and method of scoring.The practical training first involved the use of recognition threshold tests, followed by discrimination tests, such as ranking, "duo-trio" and triangle tests to recognize differences in intensity of the following standards: fructose and sucrose (sweet taste), citric acid (sour taste), sucrose octaacetate (bitter taste), tannic acid (astringency), ethylhexanoate (apple odor) and 4-(p-acetoxyphenyl)-2-butanone (berrylike odor).The standards were prepared at different concentrations according to the panelists' preparation level.All standards were supplied by AROXA (Cara Technology, Leatherhead, United Kingdom).Panelists were trained to use the variety clone "Royal Gala," as a reference standard in order to evaluate samples, as recommended by ISO 8586 (ISO, 2012).Before the training, the clone was assessed in order to verify that values of TSS and TA were adequate to set the product as a benchmark of excellence, as it is recognized in the northern part of Italy.Two different continuous scales compliant to ISO 412 1 such as "Gala" and "Red Delicious."Their ages were 18-25 years old and around 60% were female.It was decided to work with this specific target since young-adults are more prone to place a greater value to secondary factors such as the vocationality of the territory.All samples were tasted monadically in random order to take into account position and carry over effects.Samples were presented in plastic containers labeled with three-digit random numbers, at room temperature.Water was available for rinsing between samples.Consumers

À1
responses are analogous to an "out-of-JAR response."The penalty analysis was applied following the same principles of penalty analysis with JAR scales, with the one-dimensional model, as proposed inPagès et al. (2014) work and the p values of penalty coefficients were assessed to determine if they were significantly different from zero (≤.05).The analysis was done using the R Package SensomineR(Husson, Le, & Cadoret, 2020).Regression model: Prediction of likingThe non-metric partial least square model (NM-PLS), as described byRussolillo (2010), was applied to Penalty analysis categories in order to estimate liking and to individuate the weight of deviation of each attribute from the CATA's question.CATA's attributes were used as predictors and overall liking was used as the dependent variable.In order to use attributes as predictors in the model, the penalty's categories of all attributes were transformed into dummy variables following a similar approach used inXiong and Meullenet (2006) andAres et al. (2014), however, considering the out-of-JAR variables could not necessarily lead to an overall mean drop, as proposed byPlaehn (2012), each Penalty categories' mean of attributes was assessed Goldstein's values were assessed in order to classify the global latent variables as reflective or formative, along with loadings and communalities of single predictors of the outer model as performed in Mendes da Silva et al. (2020).Due absence of distribution hypothesis on data the PLS inferential tools are usually based on resampling techniques (Nappo, 2009), therefore, in this work the model was validated through a bootstrap technique and the confidence intervals of path coefficients and loadings were assessed.
a*, b* and Chroma values of Area 1 and Area 2 apple samples.Different lower-case letters (a-b) show significant differences among samples (p value ≤.05).Capital letters (N.S.) show absence of significant differences (p value ≤.05) among samples F I G U R E 3 Total soluble solid content (TSS), titratable acidity (TA), hardness and average drop values of Area 1 apple and Area 2 samples.Different lower-case letters (a-b) show significant differences among samples (p value ≤.05).Capital letters (N.S.) show absence of significant differences (p value ≤.05) among samples

4
Sensory analysis of Area 1 and Area 2 samples.The "*" indicates attribute's values that were significantly different among samples (p value ≤.05) F I G U R E 5 Percentage of consumers who cited a specific check-all-that-apply (CATA) attribute only for one of the apple samples.The "*" indicates attributes that resulted significant in Mc Nermar test (p value ≤.05)

Table 1 .
Apples were subjected to physicochemical analysis, sensory evaluation with a trained panel and a consumer test.Both samples were T A B L E 1 Information of the meteorological measurements from June to August for the two apple orchards Area 1 and Area 2