Date post: | 05-Jan-2017 |
Category: |
Software |
Upload: | narong-intiruk |
View: | 136 times |
Download: | 0 times |
PYTHONFUN WITH
AGENDA
▸ Using Python to Access Web Data
▸ Using Databases with Python
▸ Processing and Visualizing Data with Python
USING PYTHON TO ACCESS WEB DATA
Access Web Data
USING PYTHON TO ACCESS WEB DATA
▸ Web Requests
▸ Web Parser
▸ Web Services
USING PYTHON TO ACCESS WEB DATA
▸ Web Requests
Requests Library
import requests
requests.get(‘http://www.facebook.com’).text
pip install requests #install library
USING PYTHON TO ACCESS WEB DATA
▸ Web Requests
Make a Request
#GET Requestimport requests
r = requests.get(‘http://www.facebook.com’) if r.status_code == 200: print(“Success”)
Success
USING PYTHON TO ACCESS WEB DATA
▸ Web Requests
Make a Request
#POST Requestimport requests
r = requests.post('http://httpbin.org/post', data = {'key':'value'})if r.status_code == 200: print(“Success”)
Success
USING PYTHON TO ACCESS WEB DATA
▸ Web Requests
Make a Request
#Other Types of Requestimport requests
r = requests.put('http://httpbin.org/put', data = {'key':'value'})r = requests.delete('http://httpbin.org/delete')r = requests.head('http://httpbin.org/get') r = requests.options('http://httpbin.org/get')
USING PYTHON TO ACCESS WEB DATA
▸ Web Requests
Passing Parameters In URLs
#GET Request with parameterimport requests
r = requests.get(‘https://www.google.co.th/?hl=th’) if r.status_code == 200: print(“Success”)
Success
USING PYTHON TO ACCESS WEB DATA
▸ Web Requests
Passing Parameters In URLs
#GET Request with parameter
import requests
r = requests.get(‘https://www.google.co.th’,params={“hl”:”en”}) if r.status_code == 200: print(“Success”)
Success
USING PYTHON TO ACCESS WEB DATA
▸ Web Requests
Passing Parameters In URLs
#POST Request with parameter
import requests
r = requests.post("https://m.facebook.com",data={"key":"value"})if r.status_code == 200: print(“Success”)
Success
USING PYTHON TO ACCESS WEB DATA
▸ Web Requests
Response Content
#Text Response
import requestsdata = {“email” :“…..” , pass : “……”}r = requests.post(“https://m.facebook.com”,data=data)if r.status_code == 200: print(r.text)
'<?xml version="1.0" encoding="utf-8"?>\n<!DOCTYPE html PUBLIC "-//WAPFORUM//DTD XHTML Mobile 1.0//EN" "http://www.wapforum.org/DTD/xhtml-mobile10.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>Facebook</title><meta name="referrer" content="default" id="meta_referrer" /><style type=“text/css”>/*<!………………..
USING PYTHON TO ACCESS WEB DATA
▸ Web Requests
Response Content
#Response encoding
import requests
r = requests.get('https://www.google.co.th/logos/doodles/2016/king-bhumibol-adulyadej-1927-2016-5148101410029568.2-hp.png') r.encoding = ’tis-620'if r.status_code == 200: print(r.text)
'<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="th"><head><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"><meta content="/logos/doodles/2016/king-bhumibol-adulyadej-1927-2016-5148101410029568.2-hp.png" itemprop="image"><meta content="ปวงข้าพระพุทธเจ้า ขอน้อมเกล้าน้อมกระหม่อมรำลึกใน...
USING PYTHON TO ACCESS WEB DATA
▸ Web Requests
Response Content
#Binary Response
import requests
r = requests.get('https://www.google.co.th/logos/doodles/2016/king-bhumibol-adulyadej-1927-2016-5148101410029568.2-hp.png') if r.status_code == 200: open(“img.png”,”wb”).write(r.content)
USING PYTHON TO ACCESS WEB DATA
▸ Web Requests
Response Status Codes
#200 Response (OK)import requests
r = requests.get('https://api.github.com/events')if r.status_code == requests.codes.ok: print(data[0]['actor'])
{'url': 'https://api.github.com/users/ShaolinSarg', 'display_login': 'ShaolinSarg', 'avatar_url': 'https://avatars.githubusercontent.com/u/6948796?', 'id': 6948796, 'login': 'ShaolinSarg', 'gravatar_id': ''}
USING PYTHON TO ACCESS WEB DATA
▸ Web Requests
Response Status Codes
#200 Response (OK)import requests
r = requests.get('https://api.github.com/events')print(r.status_code)
200
USING PYTHON TO ACCESS WEB DATA
▸ Web Requests
Response Status Codes
#404import requests
r = requests.get('https://api.github.com/events/404')print(r.status_code)
404
USING PYTHON TO ACCESS WEB DATA
▸ Web Requests
Response Headers
#404import requests
r = requests.get('http://www.sanook.com')print(r.headers)print(r.headers[‘Date’])
{'Content-Type': 'text/html; charset=UTF-8', 'Date': 'Tue, 08 Nov 2016 14:38:41 GMT', 'Cache-Control': 'private, max-age=0', 'Age': '16', 'Content-Encoding': 'gzip', 'Content-Length': '38089', 'Connection': 'keep-alive', 'Vary': 'Accept-Encoding', 'Accept-Ranges': 'bytes'}Tue, 08 Nov 2016 14:38:41 GMT
USING PYTHON TO ACCESS WEB DATA
▸ Web Requests
Timeouts
#404import requests
r = requests.get(‘http://www.sanook.com',timeout=0.001)
ReadTimeout: HTTPConnectionPool(host='github.com', port=80): Read timed out. (read timeout=0.101)
USING PYTHON TO ACCESS WEB DATA
▸ Web Requests
Authentication
#Basic Authenticationimport requests
r = requests.get('https://api.github.com/user', auth=('user', 'pass'))print(r.status_code)
200
USING PYTHON TO ACCESS WEB DATA
▸ Web Requests
read more : http://docs.python-requests.org/en/master/
USING PYTHON TO ACCESS WEB DATA
▸ Web Requests
Quiz#1 : Tag Monitoring
1. Get webpage : http://pantip.com/tags
2. Save to file every 5 minutes (time.sleep(300))
3. Use current date time as filename
(How to get current date time using Python?, find it on Google)
USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
HTML Parser : beautifulsoup
from bs4 import BeautifulSoupsoup = BeautifulSoup(open(“file.html”),"html.parser") #parse from filesoup = BeautifulSoup(“<html>data</html>”,"html.parser") #parse from text
pip install beautifulsoup4 #install library
USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
Parse a document
from bs4 import BeautifulSoup
soup = BeautifulSoup(“<html>data</html>”,"html.parser")print(soup)
<html>data</html>
USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
Parse a document
#Navigating using tag names
from bs4 import BeautifulSouphtml_doc = """<html><head><title>The Dormouse's story</title></head><body><p class="title"><b>The Dormouse's story</b></p></body>”””
soup = BeautifulSoup(html_doc,"html.parser")soup.head soup.titlesoup.body.p
USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
<head><title>The Dormouse's story</title></head>
<title>The Dormouse's story</title>
<p class="title"><b>The Dormouse's story</b></p>
USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
Parse a document
#Access string
from bs4 import BeautifulSouphtml_doc = “""<h1>hello</h1>”””
soup = BeautifulSoup(html_doc,"html.parser")print(soup.h1.string)
hello
USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
Parse a document
#Access attribute
from bs4 import BeautifulSouphtml_doc = “<a href="http://example.com/elsie" >Elsie</a>”
soup = BeautifulSoup(html_doc,"html.parser")print(soup.a[‘href’])
http://example.com/elsie
USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
Parse a document
#Get all text in the page
from bs4 import BeautifulSouphtml_doc = """<html><head><title>The Dormouse's story</title></head><body><p class="title"><b>The Dormouse's story</b></p></body>”””
soup = BeautifulSoup(html_doc,"html.parser")print(soup.get_text)
<bound method Tag.get_text of <html><head><title>The Dormouse's story</title></head><body><p class="title"><b>The Dormouse's story</b></p></body></html>>
USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
Parse a document
# find_all()
from bs4 import BeautifulSouphtml_doc = """<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;”””
soup = BeautifulSoup(html_doc,"html.parser")for a in soup.find_all(‘a’): print(a)
USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
Parse a document
<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a><a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>
USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
Parse a document
#find_all()soup.find_all(id='link2')soup.find_all(href=re.compile("elsie"))soup.find_all(id=True) data_soup.find_all(attrs={"data-foo": “value"})soup.find_all("a", class_="sister")soup.find_all("a", recursive=False)soup.p.find_all(“a", recursive=False)
USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
Parse a document
re.compile(…..) <a href=“http://192.x.x.x” class=“c1”>hello</a><a href=“https://192.x.x.x” class=“c1”>hello</a> <a href=“https://www.com” class=“c1”>hello</a>
find_all(href=re.compile(‘(https|http)://[0-9\.]’))
https://docs.python.org/2/howto/regex.html
USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
Parse a document
read more : https://www.crummy.com/software/BeautifulSoup/bs4/doc/
USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
Quiz#2 : Tag Extraction
1. Get webpage : http://pantip.com/tags
2. Extract tag name, tag link, number of topic in first 10 pages
3. save to file as this formattag name, tag link, number of topic, current datetime
4. Run every 5 minutes
USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
JSON Parser : json
import jsonjson_doc = json.loads(“{key : value}“)
built-in function
USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
JSON Parser : json
#JSON stringjson_doc = “””{“employees":[ {"firstName":"John", "lastName":"Doe"}, {"firstName":"Anna", "lastName":"Smith"}, {"firstName":"Peter", "lastName":"Jones"} ]} “””
USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
JSON Parser : json
#Parse string to objectimport json
json_obj = json.loads(json_doc)print(json_obj)
{'employees': [{'firstName': 'John', 'lastName': 'Doe'}, {'firstName': 'Anna', 'lastName': 'Smith'}, {'firstName': 'Peter', 'lastName': 'Jones'}]}
USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
JSON Parser : json
#Access json objectimport json
json_obj = json.loads(json_doc)print(json_obj[‘employees’][0][‘firstName’])print(json_obj[‘employees’][0][‘lastName’])
JohnDoe
USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
JSON Parser : json
#Create json docimport json
json_obj = {“firstName” : “name”,”lastName” : “last”} #Dictionaryprint(json.dumps(json_obj,indent=1))
{ "firstName": "name", "lastName": “last"}
USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
Quiz#3 : Post Monitoring
1. Register as Facebook Developer on developers.facebook.com
2. Get information of last 10 hours post on the pagehttps://www.facebook.com/MorningNewsTV3
3. save to file as this formatpost id, post datetime, #number like, current datetime
USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
Quiz#3 : Post Monitoring
URLhttps://graph.facebook.com/v2.8/<PageID>?fields=posts.limit(100)%7Blikes.limit(1).summary(true)%2Ccreated_time%7D&access_token=
USING PYTHON TO ACCESS WEB DATA
▸ Web Service
USING PYTHON TO ACCESS WEB DATA
▸ Web Service
Web Service Type
USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
SOAP Example
USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
SOAP Request
USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
REST
USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
REST Request
USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
JSON Web Service
USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
Application
USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
JSON
{"employees":[
{"firstName":"John", "lastName":"Doe"},
{"firstName":"Anna", "lastName":"Smith"},
{"firstName":"Peter", "lastName":"Jones"}
]}
listdictkeyvalue
read more : http://www.json.org/
USING PYTHON TO ACCESS WEB DATA
▸ Web Service
Create Simple Web Service
from flask.ext.api import FlaskAPI
app = FlaskAPI(__name__)@app.route('/example/')def example(): return {'hello': 'world'}app.run(debug=False,port=5555)
pip install Flask-API
USING PYTHON TO ACCESS WEB DATA
▸ Web Service
Create Simple Web Service
#receive inputfrom flask.ext.api import FlaskAPI
app = FlaskAPI(__name__)@app.route(‘/hello/<name>/<lastName>')def example(name,lastName): return {'hello':name}app.run(debug=False,port=5555)
USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
Quiz#4 : Tag Service
1. Build get TopTagInfo function using web service.
2. Input : Number of top topic
3. Output: tag name and number of top the topic in jsonformat.
USING PYTHON TO ACCESS WEB DATA
▸ Web Parser
Quiz#4 : Top Tag Service
1. Build getTopTagInfo web service.
2. Input : Number of top topic
3. Output: tag name and number of top the topic in jsonformat.
USING DATABASES WITH PYTHON
Databases
USING DATABASES WITH PYTHON
……….
USING DATABASES WITH PYTHON
Zero configuration – SQLite does not need to be Installed as there is no setup procedure to use it.
Server less – SQLite is not implemented as a separate server process. With SQLite, the process that wants to access the database reads and writes directly from the database files on disk as there is no intermediary server process.
Stable Cross-Platform Database File – The SQLite file format is cross-platform. A database file written on one machine can be copied to and used on a different machine with a different architecture.
Single Database File – An SQLite database is a single ordinary disk file that can be located anywhere in the directory hierarchy.
Compact – When optimized for size, the whole SQLite library with everything enabled is less than 400KB in size
USING DATABASES WITH PYTHON
SQLite
import sqlite3
conn = sqlite3.connect('my.db')
built-in library : sqlite3
USING DATABASES WITH PYTHON
SQLite
1. Connect to db
2. Get cursor
3. Execute command
4. Commit (insert / update/delete) / Fetch result (select)
5. Close database
Workflow
USING DATABASES WITH PYTHON
SQLite
import sqlite3 conn = sqlite3.connect(‘example.db') # connect dbc = conn.cursor() # get cursor
# execute1c.execute('''CREATE TABLE stocks(date text, trans text, symbol text, qty real, price real)''')
# execute2c.execute("INSERT INTO stocks VALUES ('2006-01-05','BUY','RHAT',100,35.14)")
conn.commit() # commitconn.close() # close
Workflow Example
USING DATABASES WITH PYTHON
SQLite
Data Type
USING DATABASES WITH PYTHON
Database Storage
import sqlite3
conn = sqlite3.connect(‘example.db') #store in disk
conn = sqlite3.connect(‘:memory:’) #store in memory
USING DATABASES WITH PYTHON
Execute
#executeimport sqlite3
conn = sqlite3.connect(‘example.db') c = conn.cursor()t = ('RHAT',)c.execute('SELECT * FROM stocks WHERE symbol=?', t)
USING DATABASES WITH PYTHON
Execute
#executemanyimport sqlite3
conn = sqlite3.connect(‘example.db') c = conn.cursor()purchases = [('2006-03-28', 'BUY', 'IBM', 1000, 45.00), ('2006-04-05', 'BUY', 'MSFT', 1000, 72.00), ('2006-04-06', 'SELL', 'IBM', 500, 53.00),]
c.executemany('INSERT INTO stocks VALUES (?,?,?,?,?)', purchases)
USING DATABASES WITH PYTHON
fetch
#fetchaoneimport sqlite3
conn = sqlite3.connect(‘example.db') c = conn.cursor()c.execute('SELECT * FROM stocks')
c.fetchone()
('2006-01-05', 'BUY', 'RHAT', 100.0, 35.14)
USING DATABASES WITH PYTHON
fetch
#fetchallimport sqlite3
conn = sqlite3.connect(‘example.db') c = conn.cursor()c.execute('SELECT * FROM stocks')
for d in c.fetchall(): print(d)
[('2006-01-05', 'BUY', 'RHAT', 100.0, 35.14), ('2006-03-28', 'BUY', 'IBM', 1000.0, 45.0), ('2006-04-05', 'BUY', 'MSFT', 1000.0, 72.0),
USING DATABASES WITH PYTHON
Context manager
import sqlite3
con = sqlite3.connect(":memory:")
con.execute("create table person (id integer primary key, firstname varchar unique)")
#con.commit() is called automatically afterwards with con: con.execute("insert into person(firstname) values (?)", ("Joe"))
USING DATABASES WITH PYTHON
Read more : https://docs.python.org/2/library/sqlite3.html https://www.tutorialspoint.com/python/python_database_access.htm
USING DATABASES WITH PYTHON
Quiz#5 : Post DB
1. Register as Facebook Developer on developers.facebook.com
2. Get information of last 10 hours post on the pagehttps://www.facebook.com/MorningNewsTV3 (post id, post datetime, #number like, current datetime)
3. design and create table to store posts
PROCESSING AND VISUALIZING DATA WITH PYTHON
Processing and Visualizing
PROCESSING AND VISUALIZING DATA WITH PYTHON
▸ Processing : pandaspip install pandas
high-performance, easy-to-use data structures and data analysis tools
USING DATABASES WITH PYTHON
Pandas : Series
#create series with Array-likeimport pandas as pdfrom numpy.random import rand
s = pd.Series(rand(5), index=['a', 'b', 'c', 'd', 'e'])
print(s)
a 0.690232b 0.738294c 0.153817d 0.619822e 0.4347
USING DATABASES WITH PYTHON
Pandas : Series
#create series with dictionaryimport pandas as pdfrom numpy.random import randd = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(d) #with dictionary
print(s)
a 0b 1c 2
dtype: float64
USING DATABASES WITH PYTHON
Pandas : Series
#create series with Scalarimport pandas as pdfrom numpy.random import rand
s = pd.Series(5., index=['a', 'b', 'a', 'd', ‘a']) #index can duplicate
print(s[‘a’])
a 5a 5a 5
dtype: float64
USING DATABASES WITH PYTHON
Pandas : Series
#access series dataimport pandas as pdfrom numpy.random import rand
s = pd.Series(5., index=['a', 'b', 'a', 'd', ‘a']) #index can duplicate
print(s[0])print(s[:3])
5.0a 5b 5a 5
dtype: float64
USING DATABASES WITH PYTHON
Pandas : Series
#series operationsimport pandas as pdfrom numpy.random import randimport numpy as np
s = pd.Series(rand(10)) #index can duplicate
s = s + 2s = s * ss = np.exp(s)print(s)
0 187.7356061 691.6607522 60.1297413 595.4386064 769.4794565 397.0521236 4691.9264837 1427.5935208 180.0018249 410.994395
dtype: float64
USING DATABASES WITH PYTHON
Pandas : Series
#series filteringimport pandas as pdfrom numpy.random import randimport numpy as np
s = pd.Series(rand(10)) #index can duplicate
s = s[s > 0.1]print(s)
1 0.7087002 0.9100903 0.3806136 0.6923247 0.5084408 0.7639779 0.470675
dtype: float64
USING DATABASES WITH PYTHON
Pandas : Series
#series incomplete dataimport pandas as pdfrom numpy.random import randimport numpy as np
s1 = pd.Series(rand(10))s2 = pd.Series(rand(8))
s = s1 + s2print(s)
0 0.8137471 1.3738392 1.5697163 1.6248874 1.5156655 0.5267796 1.5443277 0.7409628 NaN9 NaN
dtype: float64
USING DATABASES WITH PYTHON
Pandas : Series
#create series with Array-likeimport pandas as pdfrom numpy.random import rand
s = pd.Series(rand(5), index=['a', 'b', 'c', 'd', 'e'])
print(s)
a 0.690232b 0.738294c 0.153817d 0.619822e 0.4347
USING DATABASES WITH PYTHON
Pandas : DataFrame
2-dimensional labeled data structure with columns of potentially different types
USING DATABASES WITH PYTHON
Pandas : DataFrame
#create dataframe with dictd = {'one' : pd.Series([1., 2., 3.], index=['a', 'b', 'c']), 'two' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}df = pd.DataFrame(d)print(df)
one two a 1 1b 2 2c 3 3d NaN 4
USING DATABASES WITH PYTHON
Pandas : DataFrame
#create dataframe with dict listd = {'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]}df = pd.DataFrame(d)print(df)
one two 0 1 41 2 32 3 23 4 1
USING DATABASES WITH PYTHON
Pandas : DataFrame
#access dataframe column d = {'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]}df = pd.DataFrame(d)print(df[‘one’])
0 11 22 33 4
Name: one, dtype: float64
USING DATABASES WITH PYTHON
Pandas : DataFrame
#access dataframe rowd = {'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]}df = pd.DataFrame(d)print(df.iloc[:3])
one two0 1 41 2 32 3 2
USING DATABASES WITH PYTHON
Pandas : DataFrame
#add new columnd = {'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]}df = pd.DataFrame(d)df['three'] = [1,2,3,2]print(df)
one two three0 1 4 11 2 3 22 3 2 33 4 1 2
USING DATABASES WITH PYTHON
Pandas : DataFrame
#show data : head() and tail()d = {'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]}df = pd.DataFrame(d)df['three'] = [1,2,3,2]print(df.head())print(df.tail())
one two three0 1 4 11 2 3 22 3 2 33 4 1 2
USING DATABASES WITH PYTHON
Pandas : DataFrame
#dataframe summaryd = {'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]}df = pd.DataFrame(d)print(df.describe())
USING DATABASES WITH PYTHON
Pandas : DataFrame
#dataframe functiond = {'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]}df = pd.DataFrame(d)print(df.mean())
one 2.5two 2.5dtype: float64
USING DATABASES WITH PYTHON
Pandas : DataFrame
#dataframe functiond = {'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]}df = pd.DataFrame(d)print(df.corr()) #calculate correlation
one twoone 1 -1two -1 1
USING DATABASES WITH PYTHON
Pandas : DataFrame
#dataframe filteringd = {'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]}df = pd.DataFrame(d)print(df[(df[‘one’] > 1) & (df[‘one’] < 3)] )
one two
1 2 3
USING DATABASES WITH PYTHON
Pandas : DataFrame
#dataframe filtering with isind = {'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]}df = pd.DataFrame(d)print(df[df[‘one’].isin([2,4])] )
one two
1 2 3 3 4 1
USING DATABASES WITH PYTHON
Pandas : DataFrame
#dataframe with row datad = [ [1., 2., 3., 4.], [4., 3., 2., 1.]]df = pd.DataFrame(d)df.columns = ["one","two","three","four"] print(df)
one two three four0 1 2 3 41 4 3 2 1
USING DATABASES WITH PYTHON
Pandas : DataFrame
#dataframe sort valuesd = [ [2., 1., 3., 4.], [1., 3., 2., 4.]]df = pd.DataFrame(d)df.columns = ["one","two","three","four"] df = df.sort_values([“one”,”two”],ascending=[1,0]) print(df)
one two three four0 2 1 3 41 1 3 2 4
USING DATABASES WITH PYTHON
Pandas : DataFrame
#dataframe from csv filedf = pd.read_csv(‘file.csv’)print(df)
one two three
0 1 2 31 1 2 32 1 2 3
file.csvone,two,three1,2,31,2,31,2,3
USING DATABASES WITH PYTHON
Pandas : DataFrame
#dataframe from csv file, without header.df = pd.read_csv(‘file.csv’,header=-1)print(df)
0 1 2
0 1 2 31 1 2 32 1 2 3
file.csv1,2,31,2,31,2,3
USING DATABASES WITH PYTHON
Pandas : DataFrame
USING DATABASES WITH PYTHON
Pandas : DataFrame
#dataframe from html, need to install lxml first (pip install lxml)df = pd.read_html(‘https://simple.wikipedia.org/wiki/List_of_U.S._states’)print(df[0])
Abbreviation State Name Capital Became a State
1 AL Alabama Montgomery December 14, 1819
2 AK Alaska Juneau January 3, 1959
3 AZ Arizona Phoenix February 14, 1912
USING DATABASES WITH PYTHON
Quiz#6 : Data Exploration
1. Goto https://archive.ics.uci.edu/ml/datasets/Adult to read data description
2. Parse data into pandas using read_csv() and set columns name
3. Explore data to answer following questions,- find number of person in each education level.- find correlation and covariance between continue fields - Avg age of United-States population where income >50K.
USING DATABASES WITH PYTHON
Quiz#6 : Data Exploration
df[3].value_counts()
PROCESSING AND VISUALIZING DATA WITH PYTHON
▸ Visualizing : seaborn
pip install seaborn
visualization library based on matplotlib
PROCESSING AND VISUALIZING DATA WITH PYTHON
▸ Visualizing : seaborn
seaborn : set inline plot for jupyter
%matplotlib inlineimport numpy as npimport seaborn as sns# Generate some sequential data
x = np.array(list("ABCDEFGHI"))
y1 = np.arange(1, 10)
sns.barplot(x, y1)
PROCESSING AND VISUALIZING DATA WITH PYTHON
▸ Visualizing : seaborn
seaborn : plot result
PROCESSING AND VISUALIZING DATA WITH PYTHON
▸ Visualizing : seaborn
seaborn : set layout
%matplotlib inlineimport numpy as npimport seaborn as snsimport matplotlib.pyplot as plt
f,ax = plt.subplots(1,1,figsize=(10, 10))sns.barplot(x=[1,2,3,4,5],y=[3,2,3,4,2])
PROCESSING AND VISUALIZING DATA WITH PYTHON
▸ Visualizing : seaborn
seaborn : set layout
PROCESSING AND VISUALIZING DATA WITH PYTHON
▸ Visualizing : seaborn
seaborn : set layout
%matplotlib inlineimport numpy as npimport seaborn as snsimport matplotlib.pyplot as plt
f,ax = plt.subplots(2,2,figsize=(10, 10))sns.barplot(x=[1,2,3,4,5],y=[3,2,3,4,2],ax=ax[0,0])sns.distplot([3,2,3,4,2],ax=ax[0,1])
PROCESSING AND VISUALIZING DATA WITH PYTHON
▸ Visualizing : seaborn
seaborn : set layout
PROCESSING AND VISUALIZING DATA WITH PYTHON
▸ Visualizing : seaborn
seaborn : axis setting
%matplotlib inlineimport numpy as npimport seaborn as snsimport matplotlib.pyplot as plt
f,ax = plt.subplots(figsize=(10, 5))sns.barplot(x=[1,2,3,4,5],y=[3,2,3,4,2])
ax.set_xlabel("number")ax.set_ylabel("value")
PROCESSING AND VISUALIZING DATA WITH PYTHON
▸ Visualizing : seaborn
seaborn : axis setting
PROCESSING AND VISUALIZING DATA WITH PYTHON
▸ Visualizing : seaborn
seaborn : with pandas dataframe
%matplotlib inlineimport numpy as npimport seaborn as snsimport matplotlib.pyplot as plt d = {'x' : [1., 2., 3., 4.], 'y' : [4., 3., 2., 1.]}df = pd.DataFrame(d)
f,ax = plt.subplots(figsize=(10, 5))sns.barplot(x=‘x’,y=‘y’,data=df)
PROCESSING AND VISUALIZING DATA WITH PYTHON
▸ Visualizing : seaborn
seaborn : with pandas dataframe
PROCESSING AND VISUALIZING DATA WITH PYTHON
▸ Visualizing : seaborn
seaborn : plot types
http://seaborn.pydata.org/examples/index.html
USING DATABASES WITH PYTHON
Quiz#7 : Adult Plot
1. Goto https://archive.ics.uci.edu/ml/datasets/Adult to read data description
2. Parse data into pandas using read_csv() and set columns name
3. Plot five charts.