數(shù)據(jù)挖掘課設(shè)報(bào)告.doc
《數(shù)據(jù)挖掘課設(shè)報(bào)告.doc》由會(huì)員分享,可在線閱讀,更多相關(guān)《數(shù)據(jù)挖掘課設(shè)報(bào)告.doc(29頁珍藏版)》請?jiān)谘b配圖網(wǎng)上搜索。
XI`AN TECHNOLOGICAL UNIVERSITY 課程設(shè)計(jì)報(bào)告 課程名稱 數(shù)據(jù)挖掘 專 業(yè): 信息管理與信息系統(tǒng) 班 級(jí): 130513 姓 名: 賈丹丹 學(xué) 號(hào): 130513117 指導(dǎo)教師: 李剛 成 績: 2016 年 1 月 3 日 前言 數(shù)據(jù)挖掘就是從大量的數(shù)據(jù)中挖掘出有用的信息。它是根據(jù)人們的特定要求,從浩如煙海的數(shù)據(jù)中找出所需的信息來,供人們的特定需求使用。據(jù)國外專家預(yù)測,隨著數(shù)據(jù)量的日益積累和計(jì)算機(jī)的廣泛應(yīng)用,在今后的5—10年內(nèi),數(shù)據(jù)挖掘?qū)⒃谥袊纬梢粋€(gè)新型的產(chǎn)業(yè)。 數(shù)據(jù)挖掘,在人工智能領(lǐng)域,習(xí)慣上又稱為數(shù)據(jù)庫中的知識(shí)發(fā)現(xiàn)(Knowledge Discovery in Database, KDD), 也有人把數(shù)據(jù)挖掘視為數(shù)據(jù)庫中知識(shí)發(fā)現(xiàn)過程的一個(gè)基本步驟。知識(shí)發(fā)現(xiàn)過程由以下三個(gè)階段組成:(1)數(shù)據(jù)準(zhǔn)備(2)數(shù)據(jù)挖掘(3)結(jié)果表達(dá)和解釋。數(shù)據(jù)挖掘可以與用戶或知識(shí)庫交互。數(shù)據(jù)挖掘是通過分析每個(gè)數(shù)據(jù),從大量數(shù)據(jù)中尋找其規(guī)律的技術(shù),主要有數(shù)據(jù)準(zhǔn)備、規(guī)律尋找和規(guī)律表示3個(gè)步驟。數(shù)據(jù)準(zhǔn)備是從相關(guān)的數(shù)據(jù)源中選取所需的數(shù)據(jù)并整合成用于數(shù)據(jù)挖掘的數(shù)據(jù)集;規(guī)律尋找是用某種方法將數(shù)據(jù)集所含的規(guī)律找出來;規(guī)律表示是盡可能以用戶可理解的方式(如可視化)將找出的規(guī)律表示出來。 數(shù)據(jù)挖掘中的分類反映同類事物共同性質(zhì)的特征型知識(shí)和不同事物之間的差異型特征知識(shí)。最為典型的分類方法是基于決策樹的分類方法。它是從實(shí)例集中構(gòu)造決策樹,是一種有指導(dǎo)的學(xué)習(xí)方法。該方法先根據(jù)訓(xùn)練子集(又稱為窗口)形成決策樹。如果該樹不能對所有對象給出正確的分類,那么選擇一些例外加入到窗口中,重復(fù)該過程一直到形成正確的決策集。最終結(jié)果是一棵樹,其葉結(jié)點(diǎn)是類名,中間結(jié)點(diǎn)是帶有分枝的屬性,該分枝對應(yīng)該屬性的某一可能值。 目錄 1 業(yè)務(wù)理解 1 2 數(shù)據(jù)理解 1 2.1英文版數(shù)據(jù)說明 1 2.2數(shù)據(jù)的讀入 2 2.3瀏覽數(shù)據(jù)內(nèi)容 2 2.4指定各個(gè)變量的作用 3 2.5觀察各變量的數(shù)據(jù)分布特征 4 3 數(shù)據(jù)準(zhǔn)備 4 3.1對數(shù)據(jù)進(jìn)行重新分類 4 3.2對數(shù)據(jù)進(jìn)行平衡處理 6 4 建立決策樹模型 6 4.1 C5.0,CART,CHAID算法介紹 7 4.2模型建立 8 4.3模型計(jì)算結(jié)果 14 4.4模型結(jié)果分析 17 5 模型評(píng)估 18 6 總結(jié) 20 附錄1:zoo.date 21 附錄2:zoo.names 24 1 業(yè)務(wù)理解 動(dòng)物園動(dòng)物數(shù)量大,種類多,對動(dòng)物園的動(dòng)物根據(jù)它們的特征進(jìn)行分類,以便于觀察和分析動(dòng)物的特征,進(jìn)而更加合理的管理動(dòng)物以及為未來查找動(dòng)物信息提供參考。 2 數(shù)據(jù)理解 該數(shù)據(jù)集是從UCI網(wǎng)站上獲得的一份關(guān)于動(dòng)物園的動(dòng)物的數(shù)據(jù)。該數(shù)據(jù)是收集的動(dòng)物園中99種動(dòng)物的特征,包括hair,feathers ,eggs,milk,airborne,aquatic ,Predator,toothed,backbone,breathes,venomous,fins,legs,tail,domestic,catsize ?,F(xiàn)需利用數(shù)據(jù)挖掘?qū)⑦@些動(dòng)物進(jìn)行分類,分成7種類型。 2.1英文版數(shù)據(jù)說明 Source: Creator: Richard Forsyth Donor: Richard S. Forsyth 8 Grosvenor Avenue Mapperley Park Nottingham NG3 5DX 0602-621676 Data Set Information: A simple database containing 17 Boolean-valued attributes. The "type" attribute appears to be the class attribute. Here is a breakdown of which animals are in which type: (I find it unusual that there are 2 instances of "frog" and one of "girl"!) Class# -- Set of animals: 1 -- (41) aardvark, antelope, bear, boar, buffalo, calf, cavy, cheetah, deer, dolphin, elephant, fruitbat, giraffe, girl, goat, gorilla, hamster, hare, leopard, lion, lynx, mink, mole, mongoose, opossum, oryx, platypus, polecat, pony, porpoise, puma, pussycat, raccoon, reindeer, seal, sealion, squirrel, vampire, vole, wallaby,wolf 2 -- (20) chicken, crow, dove, duck, flamingo, gull, hawk, kiwi, lark, ostrich, parakeet, penguin, pheasant, rhea, skimmer, skua, sparrow, swan, vulture, wren 3 -- (5) pitviper, seasnake, slowworm, tortoise, tuatara 4 -- (13) bass, carp, catfish, chub, dogfish, haddock, herring, pike, piranha, seahorse, sole, stingray, tuna 5 -- (4) frog, frog, newt, toad 6 -- (8) flea, gnat, honeybee, housefly, ladybird, moth, termite, wasp 7 -- (10) clam, crab, crayfish, lobster, octopus, scorpion, seawasp, slug, starfish, worm Attribute Information: 1. animal name: Unique for each instance 2. hair: Boolean 3. feathers: Boolean 4. eggs: Boolean 5. milk: Boolean 6. airborne: Boolean 7. aquatic: Boolean 8. predator: Boolean 9. toothed: Boolean 10. backbone: Boolean 11. breathes: Boolean 12. venomous: Boolean 13. fins: Boolean 14. legs: Numeric (set of values: {0,2,4,5,6,8}) 15. tail: Boolean 16. domestic: Boolean 17. catsize: Boolean 18. type: Numeric (integer values in range [1,7]) Relevant Papers: Forsyths PC/BEAGLE Users Guide. 2.2數(shù)據(jù)的讀入 將數(shù)據(jù)讀入Modeler中。在源選項(xiàng)卡中選擇可變文件節(jié)點(diǎn)并設(shè)置節(jié)點(diǎn)參數(shù)。在文件選項(xiàng)卡中指定從文件zoo.txt中讀入數(shù)據(jù)。 2.3瀏覽數(shù)據(jù)內(nèi)容 在輸出選項(xiàng)卡中選擇表節(jié)點(diǎn),添加到數(shù)據(jù)流中。執(zhí)行該節(jié)點(diǎn)生成數(shù)據(jù)表。在瀏覽數(shù)據(jù)時(shí)發(fā)現(xiàn)兩個(gè)錯(cuò)誤項(xiàng),數(shù)據(jù)中有兩個(gè)frog和一個(gè)gril,則刪除一個(gè)frog和gril。 【表】節(jié)點(diǎn)的輸出結(jié)果 2.4指定各個(gè)變量的作用 其中animal name,hair,feathers ,eggs,milk,airborne,aquatic ,Predator,toothed,backbone,breathes,venomous,fins,legs,tail,domestic,catsize為模型的輸入變量, type為模型的目標(biāo)變量。在字段選項(xiàng)選項(xiàng)卡中選擇【類型】節(jié)點(diǎn),添加到數(shù)據(jù)流中,設(shè)置參數(shù)指定變量角色。 【類型】節(jié)點(diǎn)的參數(shù) 2.5觀察各變量的數(shù)據(jù)分布特征 在輸出選項(xiàng)卡中選擇數(shù)據(jù)審核節(jié)點(diǎn),添加到數(shù)據(jù)流中。執(zhí)行節(jié)點(diǎn)生成數(shù)據(jù)表。 【數(shù)據(jù)審核】節(jié)點(diǎn)的輸出結(jié)果 可以看出,該份數(shù)據(jù)有99個(gè)樣本,除animal name以外均為數(shù)值型變量,除animal name、legs、type以外均是布爾值。Modeler對此計(jì)算,輸出最小值、最大值、均值、標(biāo)準(zhǔn)差、偏態(tài)系數(shù)等基本描述統(tǒng)計(jì)量。數(shù)據(jù)顯示,legs最大值與最小值差距較大。從數(shù)值型變量的柱形圖可以看出屬于type1的數(shù)量最多。數(shù)據(jù)質(zhì)量理想。 3 數(shù)據(jù)準(zhǔn)備 3.1對數(shù)據(jù)進(jìn)行重新分類 針對該數(shù)據(jù),hair,feathers ,eggs,milk,airborne,aquatic ,Predator,toothed,backbone,breathes,venomous,fins,tail,domestic,catsize屬性為是否有hair,feathers ,eggs,milk,airborne,aquatic ,Predator,toothed,backbone,breathes,venomous,fins,tail,domestic,catsize,所以取值0和1不規(guī)范,應(yīng)將取值0和1調(diào)整為No和Yes。 【重新分類】的【設(shè)置】選項(xiàng)卡 在輸出選項(xiàng)卡中選擇【表】節(jié)點(diǎn),連接到【重新分類】節(jié)點(diǎn),執(zhí)行【表】節(jié)點(diǎn)生成重新分類后的數(shù)據(jù)表,如下: 【表】節(jié)點(diǎn)的輸出結(jié)果 3.2對數(shù)據(jù)進(jìn)行平衡處理 觀察數(shù)據(jù)發(fā)現(xiàn),屬于type1的數(shù)據(jù)較多,屬于其他type的數(shù)據(jù)相對較少,所以進(jìn)行樣本平衡處理。 【平衡】的【設(shè)置】選項(xiàng)卡 在輸出選項(xiàng)卡中選擇【表】節(jié)點(diǎn),連接到【平衡】節(jié)點(diǎn),執(zhí)行【表】節(jié)點(diǎn)生成平衡處理后的數(shù)據(jù)表,如下圖: 【表】節(jié)點(diǎn)的輸出結(jié)果 4 建立決策樹模型 使用C5.0,CART,CHAID三種算法建立模型: 4.1 C5.0,CART,CHAID算法介紹 (1)C5.0:C5.0是決策樹模型中的算法,79年由J R Quinlan發(fā)展,并提出了ID3算法,主要針對離散型屬性數(shù)據(jù),其后又不斷的改進(jìn),形成C4.5,它在ID3基礎(chǔ)上增加了隊(duì)連續(xù)屬性的離散化。C5.0是C4.5應(yīng)用于大數(shù)據(jù)集上的分類算法,主要在執(zhí)行效率和內(nèi)存使用方面進(jìn)行了改進(jìn)。C5.0是經(jīng)典的決策樹模型算法之一,可生成多分支的決策樹,目標(biāo)變量為分類變量,使用C5.0算法可以生成決策樹或者規(guī)則集。C5.0模型根據(jù)能偶帶來的最大信息增益的字段拆分樣本。第一次拆分確定的樣本子集隨后再次拆分,通常是根據(jù)另一個(gè)字段進(jìn)行拆分,這一過程重復(fù)進(jìn)行指導(dǎo)樣本子集不能在被拆分為止。最后,重新緝拿眼最低層次的拆分,哪些對模型值沒有顯著貢獻(xiàn)的樣本子集被提出或者修剪。 優(yōu)點(diǎn):C5.0模型在面對數(shù)據(jù)遺漏和輸入字段很多的問題時(shí)非常穩(wěn)健; C5.0模型比一些其他類型的模型易于理解,模型退出的規(guī)則有非常直觀的解釋; C5.0也提供強(qiáng)大技術(shù)以提高分類的精度。 C5.0算法選擇分支變量的依據(jù):以信息熵的下降速度作為確定最佳分支變量和分割閥值的依據(jù)。 (2)CART:CART(Classification And Regression Tree)算法采用一種二分遞歸分割的技術(shù),將當(dāng)前的樣本集分為兩個(gè)子樣本集,使得生成的的每個(gè)非葉子節(jié)點(diǎn)都有兩個(gè)分支。因此,CART算法生成的決策樹是結(jié)構(gòu)簡潔的二叉樹。CART算法檢查每個(gè)變量和該變量所有可能的劃分值來發(fā)現(xiàn)最好的劃分,對離散值如{x,y,x},則在該屬性上的劃分有三種情({{x,y},{z}},{{x,z},y},{{y,z},x}),空集和全集的劃分除外;對于連續(xù)值處理引進(jìn)“分裂點(diǎn)”的思想,假設(shè)樣本集中某個(gè)屬性共n個(gè)連續(xù)值,則有n-1個(gè)分裂點(diǎn),每個(gè)“分裂點(diǎn)”為相鄰兩個(gè)連續(xù)值的均值 (a[i] + a[i+1]) / 2。將每個(gè)屬性的所有劃分按照他們能減少的雜質(zhì)(合成物中的異質(zhì),不同成分)量來進(jìn)行排序。CART算法經(jīng)常采用事后剪枝方法:該方法是通過在完全生長的樹上剪去分枝實(shí)現(xiàn)的,通過刪除節(jié)點(diǎn)的分支來剪去樹節(jié)點(diǎn)。最下面未被剪枝的節(jié)點(diǎn)成為樹葉。 (3)CHAID:CHAID(Chi-SquareAutomaticInteractionDetection)提供了一種在多個(gè)自變量中自動(dòng)搜索能產(chǎn)生最大差異的變量方案。CHAID分析可以生成非二進(jìn)制樹,即有些分割有兩個(gè)以上的分支。CHAID模型需要一個(gè)單一的目標(biāo)和一個(gè)或多個(gè)輸入字段。還可以指定重量和頻率領(lǐng)域。 CHAID分析,卡方自動(dòng)交互檢測,是一種用卡方統(tǒng)計(jì),以確定最佳的分割,建立決策樹的分類方法。CHAID算法以因變量為根結(jié)點(diǎn),對每個(gè)自變量(只能是分類或有序變量,也就是離散性的,如果是連續(xù)變量,如年齡,收入要定義成分類或有序變量)進(jìn)行分類,計(jì)算分類的卡方值(Chi-Square-Test)。如果幾個(gè)變量的分類均顯著,則比較這些分類的顯著程度(P值的大?。?,然后選擇最顯著的分類法作為子節(jié)點(diǎn)。CHIAD可以自動(dòng)歸并自變量中類別,使之顯著性達(dá)到最大。最后的每個(gè)葉結(jié)點(diǎn)就是一個(gè)細(xì)分市場。 4.2模型建立 (1)在【建模】選項(xiàng)卡中選擇【C5.0】、【C&R樹R】、【CHAID(C)】節(jié)點(diǎn),添加到數(shù)據(jù)流中。設(shè)置各算法的主要參數(shù)。 【C5.0】的【模型】選項(xiàng)卡 【C5.0】的【分析】選項(xiàng)卡 【C&R樹】的【構(gòu)建選項(xiàng)】選項(xiàng)卡(一) 【C&R樹】的【構(gòu)建選項(xiàng)】選項(xiàng)卡(二) 【C&R樹】的【構(gòu)建選項(xiàng)】選項(xiàng)卡(三) 【C&R樹】的【構(gòu)建選項(xiàng)】選項(xiàng)卡(四) 【C&R樹】的【構(gòu)建選項(xiàng)】選項(xiàng)卡(六) 【CHAID】的【構(gòu)建選項(xiàng)】選項(xiàng)卡(一) 【CHAID】的【構(gòu)建選項(xiàng)】選項(xiàng)卡(二) 【CHAID】的【構(gòu)建選項(xiàng)】選項(xiàng)卡(三) 【CHAID】的【構(gòu)建選項(xiàng)】選項(xiàng)卡(四) 【CHAID】的【構(gòu)建選項(xiàng)】選項(xiàng)卡(五) (2)建立的數(shù)據(jù)流如圖所示: 動(dòng)物分類的數(shù)據(jù)流 4.3模型計(jì)算結(jié)果 C5.0算法分析結(jié)果的文字形式如下圖: C5.0算法分析結(jié)果的圖形形式如下圖: CART算法分析結(jié)果的文字形式如下圖: CART算法分析結(jié)果的圖形形式如下圖: CHAID算法分析結(jié)果的文字形式如下圖: CHAID算法分析結(jié)果的圖形形式如下圖: 4.4模型結(jié)果分析 (1)C5.0算法模型結(jié)果分析 該模型找出了10個(gè)影響因素:feathers,tail,backbone,milk,fins,legs, predator,airborne其中feathers是最重要的屬性,其中l(wèi)egs,predator,fins是不重要的屬性。因此,對一個(gè)動(dòng)物進(jìn)行歸類時(shí),首先看它是否有feathers。 當(dāng)feathers為有時(shí),則直接屬于type2,不用考慮其他因素,如果沒有feathers,再看它是否有backbone,如果有backbone,再看它是否有milk,如果有milk,則屬于type1,如果沒有milk,再看它是否有fins,如果有fins,則直接屬于type4,如果沒有fins,再看它是否有tail,如果有tail,則屬于type3,如果沒有tail,則直接屬于type5,如果沒有backbone,再看它是否有airborne,如果有airborne,則直接屬于type6,如果沒有airborne,再看它是否有predator,如果有predator,則屬于type7,如果沒有predator,再看它的legs是否為0,如果它的legs為0,則屬于type7,如果它的legs為2,4,5,6,8,則屬于type,6。 (2)CART算法模型結(jié)果分析 該模型找出了3個(gè)影響因素:feathers,legs,airborne,其中feathers是最重要的屬性,與feathers比較,其他屬性遠(yuǎn)遠(yuǎn)不如feathers重要。 當(dāng)feathers為有時(shí),則直接屬于type2,不用考慮其他因素,如果無feathers,則直接屬于type1。 (3)CHAID算法模型結(jié)果分析 該模型找出了10個(gè)影響因素:legs,hair,aquatic,fins,toothed,其中l(wèi)egs最重要,其中fins和toothed是最不重要的屬性。 當(dāng)當(dāng)腿的數(shù)量等于0時(shí),再看它是否有hair,如果有,則直接屬于type1,不用考慮其他因素,如果沒有hair,則看它收否有toothed,如果沒有則直接屬于type7,如果有toothed,再看它是否有fins,如果沒有,則屬于type3,如果有,則屬于type4。 當(dāng)腿的數(shù)量等于2時(shí),再看它是否有hair,如果沒有,則直接屬于type2,不用考慮其他因素,如果有hair,則直接屬于type1。 當(dāng)腿的數(shù)量等于4時(shí),再看它是否有hair,如果有,則直接屬于type1,不用考慮其他因素,如果沒有hair,則看它收否有aquatic,如果沒有則直接屬于type3,如果有aquatic,再看它是否有toothed,如果沒有,則屬于type7,如果有,則屬于type5。 當(dāng)腿的數(shù)量等于5或者8時(shí),則直接屬于type7,不再考慮其他因素。 當(dāng)腿的數(shù)量等于6時(shí),再看它是否有aquatic,如果沒有,則直接屬于type6,如果有aquatic,則直接屬于type7。 5 模型評(píng)估 在節(jié)點(diǎn)工具箱的【輸出】選項(xiàng)卡中選擇【分析】節(jié)點(diǎn),與模型結(jié)果節(jié)點(diǎn)相連。執(zhí)行分析節(jié)點(diǎn),得到分析結(jié)果。 C5.0的分析結(jié)果如下圖: CART的分析結(jié)果如下圖: CHAID的分析結(jié)果如下圖: 可以看出,C5.0和CHAID算法建立的模型正確預(yù)測精度分別達(dá)到了98.75%和100%,模型比較理想。CART算法建立的模型正確預(yù)測精度為51.25%,模型不理想。 6 總結(jié) 用數(shù)據(jù)挖掘技術(shù)對審計(jì)數(shù)據(jù)加以分析,總結(jié)出一些正常模式,用來進(jìn)行異常檢測,將有助于提高入侵檢測系統(tǒng)的檢測準(zhǔn)確性和完備性。在本課設(shè)中用到了決策樹分類分析方法,使用了決策樹算法中的C5.0、CART、CHAID三種算法,結(jié)果各不相同,預(yù)測的準(zhǔn)確性也不同,由此可見每種數(shù)據(jù)挖掘的方法都有其側(cè)重點(diǎn),對于現(xiàn)實(shí)的數(shù)據(jù)挖掘處理,不大可能使用單一的數(shù)據(jù)挖掘方法就能得到滿意的結(jié)果,而要綜合應(yīng)用多種方法取各種方法之長補(bǔ)其之短,對數(shù)據(jù)進(jìn)行挖掘才能得到滿意的結(jié)果。 通過這次的課程設(shè)計(jì),使我對數(shù)據(jù)挖掘技術(shù)有了一個(gè)整體的認(rèn)識(shí)。同樣在建立模型的時(shí)候也遇到了這樣或那樣的問題。但在自己認(rèn)真的思考和查找資料,艱難的完成了這次課設(shè)。這讓我對數(shù)據(jù)挖掘技術(shù)以后的深入學(xué)習(xí)打下了良好的基礎(chǔ)。 附錄1:zoo.date aardvark,1,0,0,1,0,0,1,1,1,1,0,0,4,0,0,1,1 antelope,1,0,0,1,0,0,0,1,1,1,0,0,4,1,0,1,1 bass,0,0,1,0,0,1,1,1,1,0,0,1,0,1,0,0,4 bear,1,0,0,1,0,0,1,1,1,1,0,0,4,0,0,1,1 boar,1,0,0,1,0,0,1,1,1,1,0,0,4,1,0,1,1 buffalo,1,0,0,1,0,0,0,1,1,1,0,0,4,1,0,1,1 calf,1,0,0,1,0,0,0,1,1,1,0,0,4,1,1,1,1 carp,0,0,1,0,0,1,0,1,1,0,0,1,0,1,1,0,4 catfish,0,0,1,0,0,1,1,1,1,0,0,1,0,1,0,0,4 cavy,1,0,0,1,0,0,0,1,1,1,0,0,4,0,1,0,1 cheetah,1,0,0,1,0,0,1,1,1,1,0,0,4,1,0,1,1 chicken,0,1,1,0,1,0,0,0,1,1,0,0,2,1,1,0,2 chub,0,0,1,0,0,1,1,1,1,0,0,1,0,1,0,0,4 clam,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,7 crab,0,0,1,0,0,1,1,0,0,0,0,0,4,0,0,0,7 crayfish,0,0,1,0,0,1,1,0,0,0,0,0,6,0,0,0,7 crow,0,1,1,0,1,0,1,0,1,1,0,0,2,1,0,0,2 deer,1,0,0,1,0,0,0,1,1,1,0,0,4,1,0,1,1 dogfish,0,0,1,0,0,1,1,1,1,0,0,1,0,1,0,1,4 dolphin,0,0,0,1,0,1,1,1,1,1,0,1,0,1,0,1,1 dove,0,1,1,0,1,0,0,0,1,1,0,0,2,1,1,0,2 duck,0,1,1,0,1,1,0,0,1,1,0,0,2,1,0,0,2 elephant,1,0,0,1,0,0,0,1,1,1,0,0,4,1,0,1,1 flamingo,0,1,1,0,1,0,0,0,1,1,0,0,2,1,0,1,2 flea,0,0,1,0,0,0,0,0,0,1,0,0,6,0,0,0,6 frog,0,0,1,0,0,1,1,1,1,1,0,0,4,0,0,0,5 frog,0,0,1,0,0,1,1,1,1,1,1,0,4,0,0,0,5 fruitbat,1,0,0,1,1,0,0,1,1,1,0,0,2,1,0,0,1 giraffe,1,0,0,1,0,0,0,1,1,1,0,0,4,1,0,1,1 girl,1,0,0,1,0,0,1,1,1,1,0,0,2,0,1,1,1 gnat,0,0,1,0,1,0,0,0,0,1,0,0,6,0,0,0,6 goat,1,0,0,1,0,0,0,1,1,1,0,0,4,1,1,1,1 gorilla,1,0,0,1,0,0,0,1,1,1,0,0,2,0,0,1,1 gull,0,1,1,0,1,1,1,0,1,1,0,0,2,1,0,0,2 haddock,0,0,1,0,0,1,0,1,1,0,0,1,0,1,0,0,4 hamster,1,0,0,1,0,0,0,1,1,1,0,0,4,1,1,0,1 hare,1,0,0,1,0,0,0,1,1,1,0,0,4,1,0,0,1 hawk,0,1,1,0,1,0,1,0,1,1,0,0,2,1,0,0,2 herring,0,0,1,0,0,1,1,1,1,0,0,1,0,1,0,0,4 honeybee,1,0,1,0,1,0,0,0,0,1,1,0,6,0,1,0,6 housefly,1,0,1,0,1,0,0,0,0,1,0,0,6,0,0,0,6 kiwi,0,1,1,0,0,0,1,0,1,1,0,0,2,1,0,0,2 ladybird,0,0,1,0,1,0,1,0,0,1,0,0,6,0,0,0,6 lark,0,1,1,0,1,0,0,0,1,1,0,0,2,1,0,0,2 leopard,1,0,0,1,0,0,1,1,1,1,0,0,4,1,0,1,1 lion,1,0,0,1,0,0,1,1,1,1,0,0,4,1,0,1,1 lobster,0,0,1,0,0,1,1,0,0,0,0,0,6,0,0,0,7 lynx,1,0,0,1,0,0,1,1,1,1,0,0,4,1,0,1,1 mink,1,0,0,1,0,1,1,1,1,1,0,0,4,1,0,1,1 mole,1,0,0,1,0,0,1,1,1,1,0,0,4,1,0,0,1 mongoose,1,0,0,1,0,0,1,1,1,1,0,0,4,1,0,1,1 moth,1,0,1,0,1,0,0,0,0,1,0,0,6,0,0,0,6 newt,0,0,1,0,0,1,1,1,1,1,0,0,4,1,0,0,5 octopus,0,0,1,0,0,1,1,0,0,0,0,0,8,0,0,1,7 opossum,1,0,0,1,0,0,1,1,1,1,0,0,4,1,0,0,1 oryx,1,0,0,1,0,0,0,1,1,1,0,0,4,1,0,1,1 ostrich,0,1,1,0,0,0,0,0,1,1,0,0,2,1,0,1,2 parakeet,0,1,1,0,1,0,0,0,1,1,0,0,2,1,1,0,2 penguin,0,1,1,0,0,1,1,0,1,1,0,0,2,1,0,1,2 pheasant,0,1,1,0,1,0,0,0,1,1,0,0,2,1,0,0,2 pike,0,0,1,0,0,1,1,1,1,0,0,1,0,1,0,1,4 piranha,0,0,1,0,0,1,1,1,1,0,0,1,0,1,0,0,4 pitviper,0,0,1,0,0,0,1,1,1,1,1,0,0,1,0,0,3 platypus,1,0,1,1,0,1,1,0,1,1,0,0,4,1,0,1,1 polecat,1,0,0,1,0,0,1,1,1,1,0,0,4,1,0,1,1 pony,1,0,0,1,0,0,0,1,1,1,0,0,4,1,1,1,1 porpoise,0,0,0,1,0,1,1,1,1,1,0,1,0,1,0,1,1 puma,1,0,0,1,0,0,1,1,1,1,0,0,4,1,0,1,1 pussycat,1,0,0,1,0,0,1,1,1,1,0,0,4,1,1,1,1 raccoon,1,0,0,1,0,0,1,1,1,1,0,0,4,1,0,1,1 reindeer,1,0,0,1,0,0,0,1,1,1,0,0,4,1,1,1,1 rhea,0,1,1,0,0,0,1,0,1,1,0,0,2,1,0,1,2 scorpion,0,0,0,0,0,0,1,0,0,1,1,0,8,1,0,0,7 seahorse,0,0,1,0,0,1,0,1,1,0,0,1,0,1,0,0,4 seal,1,0,0,1,0,1,1,1,1,1,0,1,0,0,0,1,1 sealion,1,0,0,1,0,1,1,1,1,1,0,1,2,1,0,1,1 seasnake,0,0,0,0,0,1,1,1,1,0,1,0,0,1,0,0,3 seawasp,0,0,1,0,0,1,1,0,0,0,1,0,0,0,0,0,7 skimmer,0,1,1,0,1,1,1,0,1,1,0,0,2,1,0,0,2 skua,0,1,1,0,1,1,1,0,1,1,0,0,2,1,0,0,2 slowworm,0,0,1,0,0,0,1,1,1,1,0,0,0,1,0,0,3 slug,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,7 sole,0,0,1,0,0,1,0,1,1,0,0,1,0,1,0,0,4 sparrow,0,1,1,0,1,0,0,0,1,1,0,0,2,1,0,0,2 squirrel,1,0,0,1,0,0,0,1,1,1,0,0,2,1,0,0,1 starfish,0,0,1,0,0,1,1,0,0,0,0,0,5,0,0,0,7 stingray,0,0,1,0,0,1,1,1,1,0,1,1,0,1,0,1,4 swan,0,1,1,0,1,1,0,0,1,1,0,0,2,1,0,1,2 termite,0,0,1,0,0,0,0,0,0,1,0,0,6,0,0,0,6 toad,0,0,1,0,0,1,0,1,1,1,0,0,4,0,0,0,5 tortoise,0,0,1,0,0,0,0,0,1,1,0,0,4,1,0,1,3 tuatara,0,0,1,0,0,0,1,1,1,1,0,0,4,1,0,0,3 tuna,0,0,1,0,0,1,1,1,1,0,0,1,0,1,0,1,4 vampire,1,0,0,1,1,0,0,1,1,1,0,0,2,1,0,0,1 vole,1,0,0,1,0,0,0,1,1,1,0,0,4,1,0,0,1 vulture,0,1,1,0,1,0,1,0,1,1,0,0,2,1,0,1,2 wallaby,1,0,0,1,0,0,0,1,1,1,0,0,2,1,0,1,1 wasp,1,0,1,0,1,0,0,0,0,1,1,0,6,0,0,0,6 wolf,1,0,0,1,0,0,1,1,1,1,0,0,4,1,0,1,1 worm,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,7 wren,0,1,1,0,1,0,0,0,1,1,0,0,2,1,0,0,2 附錄2:zoo.names 1. Title: Zoo database 2. Source Information -- Creator: Richard Forsyth -- Donor: Richard S. Forsyth 8 Grosvenor Avenue Mapperley Park Nottingham NG3 5DX 0602-621676 -- Date: 5/15/1990 3. Past Usage: -- None known other than what is shown in Forsyths PC/BEAGLE Users Guide. 4. Relevant Information: -- A simple database containing 17 Boolean-valued attributes. The "type" attribute appears to be the class attribute. Here is a breakdown of which animals are in which type: (I find it unusual that there are 2 instances of "frog" and one of "girl"!) Class# Set of animals: 1 (41) aardvark, antelope, bear, boar, buffalo, calf, cavy, cheetah, deer, dolphin, elephant, fruitbat, giraffe, girl, goat, gorilla, hamster, hare, leopard, lion, lynx, mink, mole, mongoose, opossum, oryx, platypus, polecat, pony, porpoise, puma, pussycat, raccoon, reindeer, seal, sealion, squirrel, vampire, vole, wallaby,wolf 2 (20) chicken, crow, dove, duck, flamingo, gull, hawk, kiwi, lark, ostrich, parakeet, penguin, pheasant, rhea, skimmer, skua, sparrow, swan, vulture, wren 3 (5) pitviper, seasnake, slowworm, tortoise, tuatara 4 (13) bass, carp, catfish, chub, dogfish, haddock, herring, pike, piranha, seahorse, sole, stingray, tuna 5 (4) frog, frog, newt, toad 6 (8) flea, gnat, honeybee, housefly, ladybird, moth, termite, wasp 7 (10) clam, crab, crayfish, lobster, octopus, scorpion, seawasp, slug, starfish, worm 5. Number of Instances: 101 6. Number of Attributes: 18 (animal name, 15 Boolean attributes, 2 numerics) 7. Attribute Information: (name of attribute and type of value domain) 1. animal name: Unique for each instance 2. hair Boolean 3. feathers Boolean 4. eggs Boolean 5. milk Boolean 6. airborne Boolean 7. aquatic Boolean 8. predator Boolean 9. toothed Boolean 10. backbone Boolean 11. breathes Boolean 12. venomous Boolean 13. fins Boolean 14. legs Numeric (set of values: {0,2,4,5,6,8}) 15. tail Boolean 16. domestic Boolean 17. catsize Boolean 18. type Numeric (integer values in range [1,7]) 8. Missing Attribute Values: None 9. Class Distribution: Given above- 1.請仔細(xì)閱讀文檔,確保文檔完整性,對于不預(yù)覽、不比對內(nèi)容而直接下載帶來的問題本站不予受理。
- 2.下載的文檔,不會(huì)出現(xiàn)我們的網(wǎng)址水印。
- 3、該文檔所得收入(下載+內(nèi)容+預(yù)覽)歸上傳者、原創(chuàng)作者;如果您是本文檔原作者,請點(diǎn)此認(rèn)領(lǐng)!既往收益都?xì)w您。
下載文檔到電腦,查找使用更方便
9.9 積分
下載 |
- 配套講稿:
如PPT文件的首頁顯示word圖標(biāo),表示該P(yáng)PT已包含配套word講稿。雙擊word圖標(biāo)可打開word文檔。
- 特殊限制:
部分文檔作品中含有的國旗、國徽等圖片,僅作為作品整體效果示例展示,禁止商用。設(shè)計(jì)者僅對作品中獨(dú)創(chuàng)性部分享有著作權(quán)。
- 關(guān) 鍵 詞:
- 數(shù)據(jù) 挖掘 報(bào)告
鏈接地址:http://www.szxfmmzy.com/p-9092673.html