Think, Model, Code.....: Naive-Bayes Classification using Python, NumPy, and Scikits

Saturday, April 13, 2013

Naive-Bayes Classification using Python, NumPy, and Scikits

So after a busy few months, I have finally returned to wrap up this series on Naive-Bayes Classification. I have decided to use a simple classification problem borrowed (again) from the UCI machine learning repository. You can read about this data set here, and download the data used in this example here. This example assumes you have python version 2.7.X or newer, and have the packages NumPy and Scikits-learn installed. You can use the links provided to download and install them, or use easy_install to do the installation (an example of using easy_install for installing scikits-learn is given here).

Here is a quick snapshot of the data and the classification task at hand:

========================================================================
This dataset was taken from the UCI Machine Learning Repository

(http://archive.ics.uci.edu/ml/datasets.html)

1. Number of Instances: 1728
(instances completely cover the attribute space)

2. Number of Attributes (features): 6

3. Data feature descriptions:
0 - buying: vhigh, high, med, low.
1 - maint: vhigh, high, med, low.
2 - doors: 2, 3, 4, 5more.
3 - persons: 2, 4, more.
4 - lug_boot: small, med, big.
5 - safety: low, med, high.

4. Class Labels (to predict thru classification):
car evaluation: unacc, acc, good, vgood

5. Missing Attribute Values: none

6. Class Distribution (number of instances per class)
There is a sample imbalance (very common in real world data sets)

class N N[%]
-----------------------------
unacc 1210 (70.023 %)
acc 384 (22.222 %)
good 69 ( 3.993 %)
v-good 65 ( 3.762 %)
========================================================================

Here is the python script which trains the model and tests its generalizability using a test set.

If the above code is executed either in the python shell (by copy and pasting the lines) or at the command prompt using: $ python (path_to_file)/thinkModelcode_carData.py
you should get an accuracy between 80-90% (there is a range because it all depends on WHICH rows were used for test vs. test data). The output line should be something like:

Classification accuracy of MNB = 0.901162790698

Explanation of python code:

lines 1-6: importing several modules and packages necessary to run this script
lines 10-23: using packages urllib and csv to read in data from URL (click links for more info)
line 25: create a list of feature names (for reference)
lines 27-31: converting string data into numerical form by using NumPy's unique() function

keys --> contains string labels corresponding to numerical values assigned
numdata --> contains numerical representations of string labels in 'data' read from URL

lines 33-37: determine number of rows, columns. Also convert numdata to be of int array type.

split numdata into xdata (first 6 columns) and ydata (last column of class labels)

lines 41-46: convert each multivalued feature in xdata, to a multi-column binary feature array. This conversion is done using sklearn.preprocessing.LabelBinarizer. Here's an example of how what this conversion looks like:

>>> a = [2,0,1]
>>> lbin.fit_transform(a)

OUTPUT:
array([[ 0., 0., 1.],
[ 1., 0., 0.],
[ 0., 1., 0.]])

lines 51-55: create training and test data sets from full sample of 1728 rows:

To create the test and training sets, we simple create an array of ALL indices:

allIDX = [0, 1, 2,......,1725, 1726, 1727]

random.shuffle(allIDX) "shuffles" ordered indices of allIDX to a randomized list:

allIDX = [564, 981, 17, ...., 1023, 65, 235]

Then we simply take the first 10% of allIDX as the test set, the remaining as training.

lines 58-61: use testIDX and trainIDX with xdata to create xtest and xtrain, respectively
lines 62-67: use sklearn's naive_bayes module to perform multinomial naive-bayes classification

Hopefully, the combination of having an introduction to the basics and formalism of Naive Bayes Classifiers, running thru a toy example in US census income dataset, and being able to see an application of Naive-Bayes classifiers in the above python code (I hope you play with it beyond the basic python script above!) helps solidify some of the main points and value of using Bayes' Theorem.

Please let me know if you have any questions and, as always, comments and suggestions are greatly appreciated!

40 comments:

cozyberryJune 24, 2013 at 4:59 AM
Hello, Thank you for this interesting example. Just one question, why we need to transform the multivalue feature into multi binary colums? Thanks
ReplyDelete
Replies
sungoakJune 24, 2013 at 7:12 PM
Hey Cozyberry, thanks for reading and for leaving a comment. The need for a transform from a multi-value feature to a multi binary columns has more to do with dealing with programming and computing probabilities with sklearn's multinomial naive-bayes classification functions and classes. If you wanted to write your own functions or used a different package you would not need to make this transform. Please let me know if you have further questions! Thanks again!
ReplyDelete
Replies
RoujriAugust 2, 2013 at 2:42 AM
Please share the code and data in the github or other repository. Thanks :)
ReplyDelete
Replies
BolinhaSeptember 25, 2013 at 7:10 AM
Hi, i try to use it in Python 3.

But i received that error: "data[k,:] = np.array(row)
ValueError: could not broadcast input array from shape (2) into shape (7)"

van you help me?
ReplyDelete
Replies
AlexOctober 25, 2013 at 3:01 PM
yes, can you post this code?
ReplyDelete
Replies
peppescavoApril 22, 2014 at 8:08 AM
Hi, I tried your code but i didn't feed the classificator with a multi binary matrix, I just used the matrix of the features i had. It seems the performance is not affected by this. I read your explanation to cozy, but i didn't get it. Can you explain me again why are you using such binary matrix?
ReplyDelete
Replies
UnknownNovember 7, 2014 at 8:49 PM
Hi,
I want to read the csv file from my system itself not using the url...
could you please help me out with the code to read csv file from my system..
reply as soon as early
Thanks in advance..
ReplyDelete
Replies
Priya KannanJune 8, 2017 at 12:09 AM
Pretty section of content. I simply stumbled upon your site and in accession capital to say that I get actually loved to account your blog posts.
Python Training in Chennai
ReplyDelete
Replies
Softgen InfotechDecember 13, 2019 at 2:52 AM
I am happy for sharing on this blog its awesome blog I really impressed. thanks for sharing. Great efforts.

Looking for Best Training Institute in Bangalore , India. Softgen Infotech is the best one to offers 85+ computer training courses including IT Software Course in Bangalore , India. Also, it provides placement assistance service in Bangalore for IT.
ReplyDelete
Replies
shankarjayaJuly 13, 2020 at 5:42 AM
Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging…
Salesforce Training in Chennai | Certification | Online Course | Salesforce Training in Bangalore | Certification | Online Course | Salesforce Training in Hyderabad | Certification | Online Course | Salesforce Training in Pune | Certification | Online Course | Salesforce Online Training | Salesforce Training
ReplyDelete
Replies
vickySeptember 7, 2020 at 3:37 AM

I have to search sites with relevant information on given topic and provide them to teacher our opinion and the article.
keep it up!!
sap training in chennai

sap training in tambaram

azure training in chennai

azure training in tambaram

cyber security course in chennai

cyber security course in tambaram

ethical hacking course in chennai

ethical hacking course in tambaram
ReplyDelete
Replies
UnknownJuly 23, 2021 at 1:44 PM
instagram takipçi satın al
aşk kitapları
tiktok takipçi satın al
instagram beğeni satın al
youtube abone satın al
twitter takipçi satın al
tiktok beğeni satın al
tiktok izlenme satın al
twitter takipçi satın al
tiktok takipçi satın al
youtube abone satın al
tiktok beğeni satın al
instagram beğeni satın al
trend topic satın al
trend topic satın al
youtube abone satın al
instagram takipçi satın al
beğeni satın al
tiktok izlenme satın al
sms onay
youtube izlenme satın al
tiktok beğeni satın al
sms onay
sms onay
perde modelleri
instagram takipçi satın al
takipçi satın al
tiktok jeton hilesi
instagram takipçi satın al
pubg uc satın al
sultanbet
marsbahis
betboo
betboo
betboo
ReplyDelete
Replies
Mrbk30February 4, 2022 at 5:27 AM
Very Informative blog thank you for sharing. Keep sharing.

Best software training institute in Chennai. Make your career development the best by learning software courses.

devops certification in chennai
uipath training in chennai
cloud computing courses in chennai
ReplyDelete
Replies
AnonymousMay 17, 2022 at 10:31 AM
perde modelleri
numara onay
mobil ödeme bozdurma
nft nasıl alınır
Ankara evden eve nakliyat
trafik sigortası
dedektör
Kurma.website
aşk kitapları
ReplyDelete
Replies
AnonymousMay 25, 2022 at 9:06 PM
Smm Panel
SMM PANEL
iş ilanları
İNSTAGRAM TAKİPÇİ SATIN AL
HİRDAVATCİ BURADA
beyazesyateknikservisi.com.tr
SERVİS
tiktok jeton hilesi
ReplyDelete
Replies
AnonymousJune 27, 2022 at 7:01 PM
lisans satın al
nft nasıl alınır
minecraft premium
en son çıkan perde modelleri
uc satın al
yurtdışı kargo
özel ambulans
en son çıkan perde modelleri
ReplyDelete
Replies
mrbahisDecember 11, 2022 at 11:42 PM
Good content. You write beautiful things.
hacklink
mrbahis
hacklink
sportsbet
vbet
korsan taksi
taksi
sportsbet
vbet
ReplyDelete
Replies
Mukaddes7October 3, 2023 at 4:29 PM
van
kastamonu
elazığ
tokat
sakarya
OBPXQR
ReplyDelete
Replies
RocketRogue777October 7, 2023 at 12:24 AM
uşak
van
hakkari
elazığ
bingöl
E5AJ
ReplyDelete
Replies
Seda9October 21, 2023 at 1:05 AM
Kocaeli Lojistik
Uşak Lojistik
Osmaniye Lojistik
Çorlu Lojistik
Kocaeli Lojistik
5MERTM
ReplyDelete
Replies
ElectricSerpentessOctober 22, 2023 at 3:13 PM
amasya evden eve nakliyat
eskişehir evden eve nakliyat
ardahan evden eve nakliyat
manisa evden eve nakliyat
karaman evden eve nakliyat
X7V8D
ReplyDelete
Replies
StellarPhoenix12ATOctober 22, 2023 at 7:18 PM
izmir evden eve nakliyat
malatya evden eve nakliyat
hatay evden eve nakliyat
kocaeli evden eve nakliyat
mersin evden eve nakliyat
DFM3A
ReplyDelete
Replies
MathMagician101October 27, 2023 at 8:55 AM
https://istanbulolala.biz/
SJZ
ReplyDelete
Replies
AngelGöçmen68November 1, 2023 at 4:35 PM
muş evden eve nakliyat
çanakkale evden eve nakliyat
uşak evden eve nakliyat
ardahan evden eve nakliyat
eskişehir evden eve nakliyat
43YR
ReplyDelete
Replies
FF0F5Elaina79A40November 6, 2023 at 12:46 AM
B6A9B
Niğde Lojistik
Kayseri Parça Eşya Taşıma
Trabzon Parça Eşya Taşıma
Ardahan Evden Eve Nakliyat
Kırklareli Parça Eşya Taşıma
ReplyDelete
Replies
EE970Esperanza6A252November 7, 2023 at 1:41 AM
4A5DA
Kastamonu Evden Eve Nakliyat
Rize Parça Eşya Taşıma
Ağrı Evden Eve Nakliyat
Tokat Evden Eve Nakliyat
Siirt Parça Eşya Taşıma
ReplyDelete
Replies
F84ACKaitlynC6802November 8, 2023 at 12:26 PM
E2C75
Şırnak Evden Eve Nakliyat
Çorum Evden Eve Nakliyat
Malatya Evden Eve Nakliyat
Bolu Evden Eve Nakliyat
Mersin Evden Eve Nakliyat
ReplyDelete
Replies
3B61EKristopher0102BNovember 9, 2023 at 8:01 PM
2C4D4
Kilis Lojistik
Yalova Parça Eşya Taşıma
Niğde Lojistik
Mersin Şehir İçi Nakliyat
Maraş Şehir İçi Nakliyat
Konya Şehirler Arası Nakliyat
Bursa Evden Eve Nakliyat
Referans Kimliği Nedir
Sincan Fayans Ustası
ReplyDelete
Replies
C1043MaliaFA554November 10, 2023 at 3:33 AM
883C5
Kastamonu Şehirler Arası Nakliyat
Edirne Evden Eve Nakliyat
Bitcoin Nasıl Alınır
Bilecik Şehirler Arası Nakliyat
Samsun Evden Eve Nakliyat
Yalova Parça Eşya Taşıma
Keçiören Parke Ustası
Kocaeli Lojistik
Van Şehir İçi Nakliyat
ReplyDelete
Replies
2D6A9Korbin1F8FCNovember 28, 2023 at 3:39 PM
A7F04
referanskodunedir.com.tr
ReplyDelete
Replies
380EETraci3E874December 23, 2023 at 5:49 AM
14890
Bitlis Görüntülü Sohbet Uygulamaları Ücretsiz
canlı sohbet odası
Nevşehir Sohbet
Kırıkkale En İyi Ücretsiz Sohbet Siteleri
muş mobil sohbet
Ardahan Canlı Sohbet
Eskişehir Sohbet Uygulamaları
bingöl yabancı sohbet
canli goruntulu sohbet siteleri
ReplyDelete
Replies
395F1Curtis4FB96January 5, 2024 at 1:13 AM
8B6C1
sivas görüntülü sohbet sitesi
mobil sohbet siteleri
malatya sesli sohbet sitesi
ankara ücretsiz görüntülü sohbet
ığdır yabancı görüntülü sohbet
en iyi sesli sohbet uygulamaları
kilis canlı sohbet odaları
ücretsiz sohbet
Antalya Telefonda Sohbet
ReplyDelete
Replies
64DBCLauren96CD4January 19, 2024 at 5:11 PM
3F697
Twitch İzlenme Hilesi
Spotify Dinlenme Hilesi
Kripto Para Madenciliği Siteleri
Hexa Coin Hangi Borsada
Azero Coin Hangi Borsada
Soundcloud Takipçi Satın Al
Binance Borsası Güvenilir mi
Bitcoin Kazanma
Twitch Takipçi Hilesi
ReplyDelete
Replies
87DC277C8ANicholas23FB5016D3November 24, 2024 at 2:34 PM
24EE0EFF58
twitter türk beğeni
ReplyDelete
Replies
D1EF660274Mila68DB1E2572December 27, 2024 at 11:52 PM
80AF4BAC8C
takipçi satın al ucuz
ReplyDelete
Replies
AnonymousJanuary 31, 2025 at 3:54 PM
77C45DA4C2
cekilisle takipci satin al
ReplyDelete
Replies
AnonymousFebruary 1, 2025 at 2:42 AM
C4DD9C6CC3
bot basma tiktok
ReplyDelete
Replies

Add comment

Thanks for reading and for choosing to give us feedback!

Saturday, April 13, 2013

Naive-Bayes Classification using Python, NumPy, and Scikits

40 comments:

Contributors