Knowledge discovery in textile field – analysis of a ... · easily using the High Volume...

Knowledge discovery in textile field – analysisof a cotton fibre properties database

S. Diasl &R. Vasconcelos2, M. Santosl, T. Amorim2 & L. AmaralllInformation Systems Department, 2Textile DepartmentUniversity of Minho, Portugal.

Abstract

In the last years, great developments in technology have taken place in certainbranches of testing procedures in the textile field. Measures like micronaire,length, uniformity, strength, elongation, CO1OWand trash contents are determinedeasily using the High Volume Instrument Systems (HVI), providing rapid andreliable results. With the USTER AFIS (another measurement system), the user isin a position to carry out a process control with regard to the variation of fibreparameters such as the fibre length, the nep and the trash content. However,cotton chemical properties are obtained using laboratory methods that are moreexpensive and time consuming. This knowledge is very important becausechemical properties can affect wet processes. In order to fmd some relationbetween these properties, so we can predict chemical properties fi-om physicalproperties, we collect a large amount of data acquired in a HVI classificationsystem. United States Department of Agriculture (USDA) HVI calibration ofcotton, laborato~ conditions, and sample conditioning practices and procedureswere used in the test. Several studies were made relating, physical properties andyarn characteristics, physical properties between each others or betweeninstruments used to measure cotton quality. All of these studies used statisticaltools to achieve those relations. This paper shows the results of a study, usingData Mining techniques, resulting from a research project named CottonProperties: Inference through Data Mining Techniques timded by Science andTechnology Foundation of the Portuguese government. This paper describes theuse of Clementine in the analysis of a database storing physical and chemicalproperties of cotton fibres. Through the application of several techniques it ispossible to produce different kinds of knowledge. In this work the use of NeuralNetworks, Association Rules and Rule Induction is made in order to fmdrelationships in the available data.

© 2002 WIT Press, Ashurst Lodge, Southampton, SO40 7AA, UK. All rights reserved.Web: www.witpress.com Email [email protected] from: Data Mining III, A Zanasi, CA Brebbia, NFF Ebecken & P Melli (Editors).ISBN 1-85312-925-9

216 DataMining III

1 Introduction

Cotton is an extremely important fibre in the textile industry. For that reason,many studies and developments have been made to better understand this fibre.Among these studies we can find those made by Vasconcelos [8] and Lucas [9]. Itis important to know all the characteristics of cotton before the purchase of balesin order to improve the cotton quality. Traditionally, ideal cotton fibres are said tobe as white as snow, as strong as steel, as jine as silk and as long as wool. It isdifficult to incorporate these specifications favored by cotton processors into amanufacturing program or to set them as quantitative goals for cotton producers[1]. Cotton fibre quality is actually defined by those properties reported for everybale by the classif@g offices of the United States Department of Agriculture(USDA), which currently include micronaire, length, length uniformity index,strength, elongation, colour as reflectance (RD), yellowness (+b) and trashcontents, all quantified by the High Volume Instrument (HVI).

There are many studies relating physical properties, a few studies relatingchemical properties, but not many have been performed relating physical andchemical properties together. It is useful to relate all those characteristics. Thisstudy was undertaken in order to contribute to a better understanding of cottonproperties.

This work contains the result of an analysis of a database storing cottontibre properties, physical and chemical properties, using Data Mining techniques.The paper is organised as follows: Section 2 overviews the importance ofanalysing cotton fibre properties. Section 3 overviews the process of KnowledgeDiscovery in Databases and presents the Data Mining techniques used in thiswork. Section 4 describes the analysis of the cotton fibre database andsystematises the main results achieved with the Data Mining techniques. Section5 concludes with some remarks.

2 The analysis of cotton fibre properties

The analysis of cotton fibre properties is an activity of extreme importancebecause tibre quality depends on good combinations of these. When analysingproperties, one has to distinguish the physical and chemical properties, that areobtained using different methods.

As physical properties we consider measures like micronaire, length,uniformity, strength, elongation, colour and trash contents, that are easilydetermined using the HVI, providing rapid and reliable results.

The fibre properties measurements made by HVI are among thecharacteristics used to describe cotton quality, and actually this system is the mostused around the world. However, it is not the only one. The USTER AFIS isanother system used to test the characteristics of cotton fibres. HVI and AFIStesting show similar results in fibre quality. With these instruments we can obtainthe most important characteristics of cotton tibre in an easy way, faster and withhigh reliability. In this study, the physical cotton fibre properties used weredetermined using the HVI system.

The chemical properties are as important as the physical ones, but theirdetermination is more difficult. These properties are obtained through laboratory


DataMining III 217

tests that are, by nature, time consuming and for several times have to be repeatedin order to confm the results obtained in the fu-st testing. It is important to knowthese properties because they can affect wet processes.

In order to know all the characteristics of cotton fibre and the way they worktogether, many studies were undertaken. Some conclusions were achieved but notat a satisfactory level. It is usefid to know the relationship between chemical andphysical properties because it enables the identification of fibre behaviors in themanufacturing process.

All of these studies used statistical tools to achieve those relations. We havedecided to use the Knowledge Discovery in Databases (KDD) process becausethe main goal of this study is to analyse and fmd relationships between physicaland chemical properties together, with is difficult to do using other methods.With statistical tools this kind of study is almost impossible, suggesting the use ofother techniques. The process of KDD integrates several steps, one of themrelated with the application of intelligent data analysis techniques, Data Miningalgorithms, used in the analysis and extraction of patterns from data. Thesetechniques combines skills from different research areas like artificialintelligence, statistics, machine learning and databases. For that reason thesetechniques were applied in this study.

3 The process of knowledge discovery in databases

KDD is the process of finding trends and patterns in data. The objective of thisprocess is to explore large quantities of data and discovery new information [6] inorder to support decision-making processes or exploit the data to achievescientific, business or operational goals.

Different tasks can be performed in the knowledge discovery process, andseveral techniques can be applied in the execution of each task. Some of thesetasks are classljication, clustering, association, prediction, estimation andsummary. Data Mining tools offer a wide variety of algorithms to choose from.The performance of each technique depends of the task to be carried out, thequality of the available data and most important, the objective of the discovery.The most popular Data Mining algorithms include neural networks, decisiontrees, association rules and genetic algorithms. In this work is addressed the useof neural networks, rule induction and association rules techniques available inthe Clementine Data Mining system.

Neural Networks are data models that simulate the structure of the humanbrain. Like the brain, Neural Networks learns from a set of inputs and adjust theparameters of the model according to this new knowledge to find patterns in data[4]. Neural Networks learn from experience and are usefid in detecting unknownrelationships between a set of input data and an outcome. In this study, thesealgorithms are used to predict the behaviour of the fibres, in other words, topredict what will most likely happen. But Neural Networks are difficult tointerpret, and to minimise this problem, another machine learning technique isused – rule induction. This technique can be used with neural networks or byitself. Rule induction can be obtained using decision trees. Decision trees separateout data into sets of rules which are likely to have a different effect on a target


218 DataMining III

variable. With these rules we can fmd interesting relationships and see how theproperties will react according to a given output.

The other technique used in this work is Association Rules. Associationmodels are models that examine the extent to which values of one field dependon, or are predicted by, values of another field. This rules fmd things that “gotogether”, which is different then “predicted by” [4]. With this technique weexpect to reach a particular conclusion according the data we are analysing, andfmd automatically associations between the data available.

4 Knowledge discovery from a textile database

As in any project, the first requirement is to have a clear idea of what problem wewant to solve. After the main goal has been defined, we have to analyse theinformation stored in our database. For the implementation of the textile database,a relational database system, with the Open Database Connectivity (ODBC)fimctionality available was used. This functionality is needed to make theconnection between Clementine, the tool used in this work and in which it ispossible to implement all the steps of the KDD process, and the database.

Our database is mainly composed by two tables, one where physicalproperties were stored and other to store chemical properties. In table 1 and 2, wecan see an extract of the physical and chemical tables from the cotton database.

Table 1– Cotton database: some attributes from physical table

Table 2 – Cotton database: some attributes from chemical table

The physical and chemical tables were then analysed in order to fmdinconsistencies in the stored data.


DataMining III 219

After this, and using visualization techniques, we have analysed our data inorder to fmd which properties are related, how they relate with each other, if thereare some influence between them, and many other questions that we considerimportant. This step is important because with visualization techniques we canoften brings out points that we would not normally see [7].

In this phase of the work, we fwst analyse the physical table relating manyattributes between them. Next we have done the same but using both tables ftomour database, trying to find some trends between physical and chemicalproperties. In figure 1, we can see one example of a graph used to do the analysisof data. The heavy lines represents strong connections between them, and thenormal lines represent weaker relationships. The attributes analysed were thecountry of the cotton (origem – Egipt o (Egipt), MadagAs car (Madagascar),Zimbabwe (Zimbabwe), Uganda (Uganda)), the trash weight (es calao_wt)and the ashes percentages (es ca1ao cin zas).With this kind of graph, we canfind some relationships that are difflc~lt to fmd if the user only looks at data in atabular form.

Figure 1 – Example of a web node used in the analysis

When we started to analyse our data we discovered that many testing valuesfor humidity and temperature (twoattributes from physical table) were outof the recommended range by the USDA. So, it was important to fmd if thesevalues had some influence in the other properties. In order to do so, a variety ofgraphs were constructed relating the other properties with the two criticalproperties. The results achieved shows that the values out of the rangerecommended have no influence in the cotton properties.

The next phase of the work was to do some advanced tasks like ruleinduction, association and prediction. When working in this phase, we can, at anymoment, return to the analysis phase to do new analysis or to refine existing ones.The main goal in this phase is to apply advanced techniques in our data, in orderto fmd relationships that we haven’t found. So a frost exercise was to do this taskonly in the physical database, and in a second phase using the two tables.


220 DataMining III

Before the construction of a model, the data set available must be divided intwo sub-sets. The fust, named Train Set, is used to identify the relationshipsexisting in data. The second sub-set, the Test Set, is used to verifi theperformance of the model previously generated.

Rule induction is one of the most common forms of Knowledge Discovery.It is a technique for discovering a set of “IfYThen”rules from data to classify thedifferent cases [4]. In order to do so, we apply the C5.O algorithm in our dataset.With C5.O were obtained a set of rules that describe the relationships existingbetween the physical properties. In some cases the results achieved with thisalgorithm weren’t good enough. So, to improve the accuracy of the models, weused a Neural Network together with the C5.Omodel. In some circumstances, thetwo models were better, and in other cases the C5.0, alone, achieved moreaccurate results. These models were applied to different attributes: uI

(Uniformity Index), ST (Strength), EL (Elongation), RD (Reflectance Degree), +b(yellowness content) and origem (source). Some of the results achieved wererefined until they present satisfactory results. In Figure 2 we can see some rulesgenerated by the models for the attribute EL (Elongation).

Figure 2 – Some rules generated for the attributeEL (Elongation) stored in the physical table

These rules are possible patterns found for range 1 of EL. We can read rule#1 like If LEAF (impurities) is equal to 3.0 and escalao_s15 O (Span length50%) is equal to 1.0 or 2.0 and escalao s125 (Span length 2,5%) is equal to4.0 and es cal ao rd (colour as reflectm;e) is equal to 4.0 Then cotton havean elongation value–inrange 1, which means that the fiber as a weak elongation.The patterns found so far needs to be analyzed by the textile experts to determineif they are useful or not. What makes a pattern interesting is if it is easilyunderstood by humans, if it is valid on new or test data with some degree ofcertainty and if it is potentially useful. A pattern is also interesting if it validates a


DataMining III 221

hypothesis that the user sought to confirm. An interesting pattern representsknowledge [5].

The next step was to predict behaviors between fibres using, also, thechemical properties obtained so far. For this exercise, we have to connect the twotables in a single one, using only the important attributes. This is done inClementine using the merge node (figure 3).

After this, the task previously presented for the physical table was repeated.In figure 4 we can see some of the rules obtained in this phase of our work.

Figure 3 – Stream that joins the two tables

Figure 4 – Some rules generated for the attributeori gem (source) with both tables


222 DataMining III

Another technique used in our work is association. With associationtechniques we will find interesting relationships among our data, and discoverwhich properties act together.

Figure 5 presents some of the rules obtained for the attribute ori gem(source), using the chemical and physical tables from the cotton database.

Figure 5 – Some rules generated for the attribute origem (source)with both tables with the GRI algorithm

The results obtained with association rules aren’t yet satisfactory, and theyare being studied at the moment of the preparation of this paper by thetextile experts that integrate the project team work.

5 Conclusions

This paper presented an approach for knowledge discovery in a textile database,storing physical and chemical properties of cotton fibres. This study wasundertaken due the lack of information about the relations that exist between thephysical and chemical properties.

The main goal of this study is to use intelligent algorithms in the process ofanalysis and relationship the most important fibres properties. For that, we usedData Mining techniques and its related algorithms. The fust step carried out wasthe collection of the textile data and the construction of a cotton database in arelational system. After this, and using visualization techniques, we have analysedour dataset in order to better understand it and to find some important patternsunknown so far. Finally, the use of advanced techniques was required for classifyand predict future cases. In order to do so, techniques like association andprediction were used, those implemented through algorithms like NeuralNetworks and Decision trees.

The results achieved show that with Data Mining techniques we canidenti~, analyse and predict relationships between cotton properties. We canstudy if two or more category of different properties are related or study eachcategory individually. We can also analyse if some results obtained in the HVIsystem, that are out of the limits imposed by the USDA, have any kind ofinfluence in the other cotton properties.


DataMining III 223

As future work, we intend to use clustering techniques. It is also importantto implement a user-friendly interface for the textile experts, in order to facilitatethe KDD process in the Clementine system.

Acknowledgements

This work has been partially supported by a FCT (Funda@o para a Ci6ncia e aTecnologia) project tiding, contract number POCTI / 1999/ CTM / 32993.

References

[1] Bradow, J. M., Davidonis, G.H., Quantitation of fibre quality and the cottonproduction-processing interface: a physiologist’s perspective,www.jcotsci.erg, Maio 2000.

[2] Krzystof C., Pedrycz W., Swiniarski R., Data Mining Methods forKnowledge Discovery, Kluwer Academic Publishers, 1998.

[3] Fayyad U.M., Piatetsky-Shapiro G., Smyth P., From Data Mining toKnowledge Discove~: An overview, In U.M Fayyad, G. Piatetsky-Shapiro,P. Smyth, R. Uthurusamy (Eds.), Advances in Knowledge Discovery andData Mining, pp. 1-34, The MIT Press, Massachusetts, 1996.http://www.spss. com/clementine, Outubro 2001.

[4] Han J., Kamber M, Data Mining: Concepts and Techniques, MorganKaufman Publishers, 2001.

[5] Klerfors D., Artificial Neural Networks, Saint Louis University, School ofBusiness & Administration, Maio 2001.

[6] Groth R., Data Mining - Building Competitive Advantage, Prenctice HallPTR, New Jersey, 2000.

[7] Vasconcelos R.M., Contribui@o a aplica@io de t6cnicas de intelig~nciaartificial na tecnologia da fiaqtio, Tese de Doutoramento, Universidade doMinho, Guimariies, 1993.

[8] Lucas F.S.N.D, Ramas de Algodilo: Propriedades Fisicas e Quimicas, Tesede Mestrado, Universidade do Miuho, Guimar5es, 2000.


Date post:	03-Jun-2020
Category:	Documents
Upload:	others
View:	3 times
Download:	0 times

Knowledge discovery in textile field – analysis of a ... · easily using the High Volume...

Documents