Here is a quick snapshot of the data and the classification task at hand:
========================================================================
This dataset was taken from the UCI Machine Learning Repository
(http://archive.ics.uci.edu/ml/datasets.html)
1. Number of Instances: 1728
(instances completely cover the attribute space)
2. Number of Attributes (features): 6
3. Data feature descriptions:
0 - buying: vhigh, high, med, low.
1 - maint: vhigh, high, med, low.
2 - doors: 2, 3, 4, 5more.
3 - persons: 2, 4, more.
4 - lug_boot: small, med, big.
5 - safety: low, med, high.
4. Class Labels (to predict thru classification):
car evaluation: unacc, acc, good, vgood
5. Missing Attribute Values: none
6. Class Distribution (number of instances per class)
There is a sample imbalance (very common in real world data sets)
class N N[%]
-----------------------------
unacc 1210 (70.023 %)
acc 384 (22.222 %)
good 69 ( 3.993 %)
v-good 65 ( 3.762 %)
========================================================================
Here is the python script which trains the model and tests its generalizability using a test set.
If the above code is executed either in the python shell (by copy and pasting the lines) or at the command prompt using: $ python (path_to_file)/thinkModelcode_carData.py
you should get an accuracy between 80-90% (there is a range because it all depends on WHICH rows were used for test vs. test data). The output line should be something like:
Classification accuracy of MNB = 0.901162790698
Explanation of python code:
you should get an accuracy between 80-90% (there is a range because it all depends on WHICH rows were used for test vs. test data). The output line should be something like:
Classification accuracy of MNB = 0.901162790698
Explanation of python code:
- lines 1-6: importing several modules and packages necessary to run this script
- lines 10-23: using packages urllib and csv to read in data from URL (click links for more info)
- line 25: create a list of feature names (for reference)
- lines 27-31: converting string data into numerical form by using NumPy's unique() function
- keys --> contains string labels corresponding to numerical values assigned
- numdata --> contains numerical representations of string labels in 'data' read from URL
- lines 33-37: determine number of rows, columns. Also convert numdata to be of int array type.
- split numdata into xdata (first 6 columns) and ydata (last column of class labels)
- lines 41-46: convert each multivalued feature in xdata, to a multi-column binary feature array. This conversion is done using sklearn.preprocessing.LabelBinarizer. Here's an example of how what this conversion looks like:
- >>> a = [2,0,1]
- >>> lbin.fit_transform(a)
- OUTPUT:
- array([[ 0., 0., 1.],
- [ 1., 0., 0.],
- [ 0., 1., 0.]])
- lines 51-55: create training and test data sets from full sample of 1728 rows:
- To create the test and training sets, we simple create an array of ALL indices:
- allIDX = [0, 1, 2,......,1725, 1726, 1727]
- random.shuffle(allIDX) "shuffles" ordered indices of allIDX to a randomized list:
- allIDX = [564, 981, 17, ...., 1023, 65, 235]
- Then we simply take the first 10% of allIDX as the test set, the remaining as training.
- lines 58-61: use testIDX and trainIDX with xdata to create xtest and xtrain, respectively
- lines 62-67: use sklearn's naive_bayes module to perform multinomial naive-bayes classification
Hopefully, the combination of having an introduction to the basics and formalism of Naive Bayes Classifiers, running thru a toy example in US census income dataset, and being able to see an application of Naive-Bayes classifiers in the above python code (I hope you play with it beyond the basic python script above!) helps solidify some of the main points and value of using Bayes' Theorem.
Please let me know if you have any questions and, as always, comments and suggestions are greatly appreciated!
Hello, Thank you for this interesting example. Just one question, why we need to transform the multivalue feature into multi binary colums? Thanks
ReplyDeleteHey Cozyberry, thanks for reading and for leaving a comment. The need for a transform from a multi-value feature to a multi binary columns has more to do with dealing with programming and computing probabilities with sklearn's multinomial naive-bayes classification functions and classes. If you wanted to write your own functions or used a different package you would not need to make this transform. Please let me know if you have further questions! Thanks again!
ReplyDeleteHey Brian thanks for your response. And I have another question concerning this transformation. Why in the example code we used xdata_ml for attributes matrix: xtrain=xdata_ml[trainIDX,:] however we used the original ydata for target matrix: ytrain=ydata[trainIDX,:]. I tried to test with ytrain=ydata_ml[trainIDX,:] and ytest=ydata_ml[0:trainIDX] and the accuracy turned as 0.0 ><
DeleteHi Cozy - good question and I'm glad you brought it up. The reason for not using the multi-labeled version of ytrain was because the NB-classifier function already accounts for the multivalues unlike for x-values. The difference here is that sometimes you actually have more than one y-value you are trying to predict, similar to having many response variables in a regression. When you specify a since multi-valued column as the response (as I did above) the function knows that you are fitting to a single dependent variable and converts it to a multi-label column internally in the function. Does this make sense? Thanks again for the comments. Hopefully I find some time soon to write my next series, its been too long.
DeleteHaHa then it all makes sense. Thanks again for your quick response. And it is really good to see an example here. Since the official documentation for Naives Bayes focus more on text predicting. And the Car data is more normal as a benchmark. I will come back again for your next series. Bow~~
DeletePlease share the code and data in the github or other repository. Thanks :)
ReplyDeleteHi, i try to use it in Python 3.
ReplyDeleteBut i received that error: "data[k,:] = np.array(row)
ValueError: could not broadcast input array from shape (2) into shape (7)"
van you help me?
yes, can you post this code?
ReplyDeleteHi, I tried your code but i didn't feed the classificator with a multi binary matrix, I just used the matrix of the features i had. It seems the performance is not affected by this. I read your explanation to cozy, but i didn't get it. Can you explain me again why are you using such binary matrix?
ReplyDeleteHi,
ReplyDeleteI want to read the csv file from my system itself not using the url...
could you please help me out with the code to read csv file from my system..
reply as soon as early
Thanks in advance..
Pretty section of content. I simply stumbled upon your site and in accession capital to say that I get actually loved to account your blog posts.
ReplyDeletePython Training in Chennai
I am happy for sharing on this blog its awesome blog I really impressed. thanks for sharing. Great efforts.
ReplyDeleteLooking for Best Training Institute in Bangalore , India. Softgen Infotech is the best one to offers 85+ computer training courses including IT Software Course in Bangalore , India. Also, it provides placement assistance service in Bangalore for IT.
Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging…
ReplyDeleteSalesforce Training in Chennai | Certification | Online Course | Salesforce Training in Bangalore | Certification | Online Course | Salesforce Training in Hyderabad | Certification | Online Course | Salesforce Training in Pune | Certification | Online Course | Salesforce Online Training | Salesforce Training
ReplyDeleteI have to search sites with relevant information on given topic and provide them to teacher our opinion and the article.
keep it up!!
sap training in chennai
sap training in tambaram
azure training in chennai
azure training in tambaram
cyber security course in chennai
cyber security course in tambaram
ethical hacking course in chennai
ethical hacking course in tambaram
instagram takipçi satın al
ReplyDeleteaşk kitapları
tiktok takipçi satın al
instagram beğeni satın al
youtube abone satın al
twitter takipçi satın al
tiktok beğeni satın al
tiktok izlenme satın al
twitter takipçi satın al
tiktok takipçi satın al
youtube abone satın al
tiktok beğeni satın al
instagram beğeni satın al
trend topic satın al
trend topic satın al
youtube abone satın al
instagram takipçi satın al
beğeni satın al
tiktok izlenme satın al
sms onay
youtube izlenme satın al
tiktok beğeni satın al
sms onay
sms onay
perde modelleri
instagram takipçi satın al
takipçi satın al
tiktok jeton hilesi
instagram takipçi satın al
pubg uc satın al
sultanbet
marsbahis
betboo
betboo
betboo
Very Informative blog thank you for sharing. Keep sharing.
ReplyDeleteBest software training institute in Chennai. Make your career development the best by learning software courses.
devops certification in chennai
uipath training in chennai
cloud computing courses in chennai
perde modelleri
ReplyDeletenumara onay
mobil ödeme bozdurma
nft nasıl alınır
Ankara evden eve nakliyat
trafik sigortası
dedektör
Kurma.website
aşk kitapları
Smm Panel
ReplyDeleteSMM PANEL
iş ilanları
İNSTAGRAM TAKİPÇİ SATIN AL
HİRDAVATCİ BURADA
beyazesyateknikservisi.com.tr
SERVİS
tiktok jeton hilesi
lisans satın al
ReplyDeletenft nasıl alınır
minecraft premium
en son çıkan perde modelleri
uc satın al
yurtdışı kargo
özel ambulans
en son çıkan perde modelleri
Good content. You write beautiful things.
ReplyDeletehacklink
mrbahis
hacklink
sportsbet
vbet
korsan taksi
taksi
sportsbet
vbet
van
ReplyDeletekastamonu
elazığ
tokat
sakarya
OBPXQR
uşak
ReplyDeletevan
hakkari
elazığ
bingöl
E5AJ
Kocaeli Lojistik
ReplyDeleteUşak Lojistik
Osmaniye Lojistik
Çorlu Lojistik
Kocaeli Lojistik
5MERTM
amasya evden eve nakliyat
ReplyDeleteeskişehir evden eve nakliyat
ardahan evden eve nakliyat
manisa evden eve nakliyat
karaman evden eve nakliyat
X7V8D
izmir evden eve nakliyat
ReplyDeletemalatya evden eve nakliyat
hatay evden eve nakliyat
kocaeli evden eve nakliyat
mersin evden eve nakliyat
DFM3A
https://istanbulolala.biz/
ReplyDeleteSJZ
muş evden eve nakliyat
ReplyDeleteçanakkale evden eve nakliyat
uşak evden eve nakliyat
ardahan evden eve nakliyat
eskişehir evden eve nakliyat
43YR
B6A9B
ReplyDeleteNiğde Lojistik
Kayseri Parça Eşya Taşıma
Trabzon Parça Eşya Taşıma
Ardahan Evden Eve Nakliyat
Kırklareli Parça Eşya Taşıma
4A5DA
ReplyDeleteKastamonu Evden Eve Nakliyat
Rize Parça Eşya Taşıma
Ağrı Evden Eve Nakliyat
Tokat Evden Eve Nakliyat
Siirt Parça Eşya Taşıma
E2C75
ReplyDeleteŞırnak Evden Eve Nakliyat
Çorum Evden Eve Nakliyat
Malatya Evden Eve Nakliyat
Bolu Evden Eve Nakliyat
Mersin Evden Eve Nakliyat
2C4D4
ReplyDeleteKilis Lojistik
Yalova Parça Eşya Taşıma
Niğde Lojistik
Mersin Şehir İçi Nakliyat
Maraş Şehir İçi Nakliyat
Konya Şehirler Arası Nakliyat
Bursa Evden Eve Nakliyat
Referans Kimliği Nedir
Sincan Fayans Ustası
883C5
ReplyDeleteKastamonu Şehirler Arası Nakliyat
Edirne Evden Eve Nakliyat
Bitcoin Nasıl Alınır
Bilecik Şehirler Arası Nakliyat
Samsun Evden Eve Nakliyat
Yalova Parça Eşya Taşıma
Keçiören Parke Ustası
Kocaeli Lojistik
Van Şehir İçi Nakliyat
A7F04
ReplyDeletereferanskodunedir.com.tr
14890
ReplyDeleteBitlis Görüntülü Sohbet Uygulamaları Ücretsiz
canlı sohbet odası
Nevşehir Sohbet
Kırıkkale En İyi Ücretsiz Sohbet Siteleri
muş mobil sohbet
Ardahan Canlı Sohbet
Eskişehir Sohbet Uygulamaları
bingöl yabancı sohbet
canli goruntulu sohbet siteleri
8B6C1
ReplyDeletesivas görüntülü sohbet sitesi
mobil sohbet siteleri
malatya sesli sohbet sitesi
ankara ücretsiz görüntülü sohbet
ığdır yabancı görüntülü sohbet
en iyi sesli sohbet uygulamaları
kilis canlı sohbet odaları
ücretsiz sohbet
Antalya Telefonda Sohbet
3F697
ReplyDeleteTwitch İzlenme Hilesi
Spotify Dinlenme Hilesi
Kripto Para Madenciliği Siteleri
Hexa Coin Hangi Borsada
Azero Coin Hangi Borsada
Soundcloud Takipçi Satın Al
Binance Borsası Güvenilir mi
Bitcoin Kazanma
Twitch Takipçi Hilesi
24EE0EFF58
ReplyDeletetwitter türk beğeni
80AF4BAC8C
ReplyDeletetakipçi satın al ucuz