Ireland « Ora-lytics

#GE2020 Comparing Party Manifestos to 2016

Posted on January 24, 2020

A few days ago I wrote a blog post about using Python to analyze the 2016 general (government) elections manifestos of the four main political parties in Ireland.

Today the two (traditional) largest parties released their #GE2020 manifestos. You can get them by following these links

The following images show the WordClouds generated for the #GE2020 Manifestos. I used the same Python code used in my previous post. If you want to try this out yourself, all the Python code is there.

First let us look at the WordClouds from Fine Gael.

Now for the Fianna Fail WordClouds.

When you look closely at the differences between the manifestos you will notice there are some common themes across the manifestos from 2016 to those in the 2020 manifestos. It is also interesting to see some new words appearing/disappearing for the 2020 manifestos. Some of these are a little surprising and interesting.

This entry was posted in Data Science, Ireland, Python, Text Mining and tagged Fianna Fail, Fine Gael, GE2020, Python, Text Mining, Word Cloud.

#GE2020 Analysing Party Manifestos using Python

Posted on January 21, 2020

The general election is underway here in Ireland with polling day set for Saturday 8th February. All the politicians are out campaigning and every day the various parties are looking for publicity on whatever the popular topic is for that day. Each day is it a different topic.

Most of the political parties have not released their manifestos for the #GE2020 election (as of date of this post). I want to use some simple Python code to perform some analyse of their manifestos. As their new manifestos weren’t available (yet) I went looking for their manifestos from the previous general election. Michael Pidgeon has a website with party manifestos dating back to the early 1970s, and also has some from earlier elections. Check out his website.

I decided to look at manifestos from the 4 main political parties from the 2016 general election. Yes there are other manifestos available, and you can use the Python code, given below to analyse those, with only some minor edits required.

The end result of this simple analyse is a WordCloud showing the most commonly used words in their manifestos. This is graphical way to see what some of the main themes and emphasis are for each party, and also allows us to see some commonality between the parties.

Let’s begin with the Python code.

1 – Initial Setup

There are a number of Python Libraries available for processing PDF files. Not all of them worked on all of the Part Manifestos PDFs! It kind of depends on how these files were generated. In my case I used the pdfminer library, as it worked with all four manifestos. The common library PyPDF2 didn’t work with the Fine Gael manifesto document.

import io
import pdfminer
from pprint import pprint
from pdfminer.converter import TextConverter
from pdfminer.pdfinterp import PDFPageInterpreter
from pdfminer.pdfinterp import PDFResourceManager
from pdfminer.pdfpage import PDFPage

#directory were manifestos are located
wkDir = '.../General_Election_Ire/'

#define the names of the Manifesto PDF files & setup party flag
pdfFile = wkDir+'FGManifesto16_2.pdf'
party = 'FG'
#pdfFile = wkDir+'Fianna_Fail_GE_2016.pdf'
#party = 'FF'
#pdfFile = wkDir+'Labour_GE_2016.pdf'
#party = 'LB'
#pdfFile = wkDir+'Sinn_Fein_GE_2016.pdf'
#party = 'SF'

All of the following code will run for a given manifesto. Just comment in or out the manifesto you are interested in. The WordClouds for each are given below.

2 – Load the PDF File into Python

The following code loops through each page in the PDF file and extracts the text from that page.

I added some addition code to ignore pages containing the Irish Language. The Sinn Fein Manifesto contained a number of pages which were the Irish equivalent of the preceding pages in English. I didn’t want to have a mixture of languages in the final output.

SF_IrishPages = [14,15,16,17,18,19,20,21,22,23,24]
text = ""

pageCounter = 0
resource_manager = PDFResourceManager()
fake_file_handle = io.StringIO()
converter = TextConverter(resource_manager, fake_file_handle)
page_interpreter = PDFPageInterpreter(resource_manager, converter)

for page in PDFPage.get_pages(open(pdfFile,'rb'), caching=True, check_extractable=True):
    if (party == 'SF') and (pageCounter in SF_IrishPages):
        print(party+' - Not extracting page - Irish page', pageCounter)
    else:
        print(party+' - Extracting Page text', pageCounter)
        page_interpreter.process_page(page)

        text = fake_file_handle.getvalue()

    pageCounter += 1

print('Finished processing PDF document')
converter.close()
fake_file_handle.close()

FG - Extracting Page text 0
FG - Extracting Page text 1
FG - Extracting Page text 2
FG - Extracting Page text 3
FG - Extracting Page text 4
FG - Extracting Page text 5
...

3 – Tokenize the Words

The next step is to Tokenize the text. This breaks the text into individual words.

from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
tokens = []

tokens = word_tokenize(text)

print('Number of Pages =', pageCounter)
print('Number of Tokens =',len(tokens))

Number of Pages = 140
Number of Tokens = 66975

4 – Filter words, Remove Numbers & Punctuation

There will be a lot of things in the text that we don’t want included in the analyse. We want the text to only contain words. The following extracts the words and ignores numbers, punctuation, etc.

#converts to lower case, and removes punctuation and numbers
wordsFiltered = [tokens.lower() for tokens in tokens if tokens.isalpha()]
print(len(wordsFiltered))
print(wordsFiltered)

58198
['fine', 'gael', 'general', 'election', 'manifesto', 's', 'keep', 'the', 'recovery', 'going', 'gaelgeneral', 'election', 'manifesto', 'foreward', 'from', 'an', 'taoiseach', 'the', 'long', 'term', 'economic', 'three', 'steps', 'to', 'keep', 'the', 'recovery', 'going', 'agriculture', 'and', 'food', 'generational',
...

As you can see the number of tokens has reduced from 66,975 to 58,198.

5 – Setup Stop Words

Stop words are general words in a language that doesn’t contain any meanings and these can be removed from the data set. Python NLTK comes with a set of stop words defined for most languages.

#We initialize the stopwords variable which is a list of words like 
#"The", "I", "and", etc. that don't hold much value as keywords
stop_words = stopwords.words('english')
print(stop_words)

['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're", "you've", "you'll", "you'd", 'your', 'yours', 'yourself',
....

Additional stop words can be added to this list. I added the words listed below. Some of these you might expect to be in the stop word list, others are to remove certain words that appeared in the various manifestos that don’t have a lot of meaning. I also added the name of the parties and some Irish words to the stop words list.

#some extra stop words are needed after examining the data and word cloud
#these are added
extra_stop_words = ['ireland','irish','ł','need', 'also', 'set', 'within', 'use', 'order', 'would', 'year', 'per', 'time', 'place', 'must', 'years', 'much', 'take','make','making','manifesto','ð','u','part','needs','next','keep','election', 'fine','gael', 'gaelgeneral', 'fianna', 'fáil','fail','labour', 'sinn', 'fein','féin','atá','go','le','ar','agus','na','ár','ag','haghaidh','téarnamh','bplean','page','two','number','cothromfor']
stop_words.extend(extra_stop_words)
print(stop_words)

Now remove these stop words from the list of tokens.

# remove stop words from tokenised data set
filtered_words = [word for word in wordsFiltered if word not in stop_words]
print(len(filtered_words))
print(filtered_words)

31038
['general', 'recovery', 'going', 'foreward', 'taoiseach', 'long', 'term', 'economic', 'three', 'steps', 'recovery', 'going', 'agriculture', 'food',

The number of tokens is reduced to 31,038

6 – Word Frequency Counts

Now calculate how frequently these words occur in the list of tokens.

#get the frequency of each word
from collections import Counter

# count frequencies
cnt = Counter()
for word in filtered_words:
cnt[word] += 1

print(cnt)

Counter({'new': 340, 'support': 249, 'work': 190, 'public': 186, 'government': 177, 'ensure': 177, 'plan': 176, 'continue': 168, 'local': 150, 
...

7 – WordCloud

We can use the word frequency counts to add emphasis to the WordCloud. The more frequently it occurs the larger it will appear in the WordCloud.

#create a word cloud using frequencies for emphasis 
from wordcloud import WordCloud
import matplotlib.pyplot as plt

wc = WordCloud(max_words=100, margin=9, background_color='white',
scale=3, relative_scaling = 0.5, width=500, height=400,
random_state=1).generate_from_frequencies(cnt)

plt.figure(figsize=(20,10))
plt.imshow(wc)
#plt.axis("off")
plt.show()

#Save the image in the img folder:
wc.to_file(wkDir+party+"_2016.png")

The last line of code saves the WordCloud image as a file in the directory where the manifestos are located.

8 – WordClouds for Each Party

Screenshot 2020-01-21 11.10.25

Remember these WordClouds are for the manifestos from the 2016 general election.

When the parties have released their manifestos for the 2020 general election, I’ll run them through this code and produce the WordClouds for 2020. It will be interesting to see the differences between the 2016 and 2020 manifesto WordClouds.

This entry was posted in Data Science, Ireland, Machine Learning, Python, Text Mining and tagged Data Science, GE2020, Ireland, Machine Learning, Python, Text Mining.

OUG Ireland Meetup 11th May

Posted on April 27, 2017 Updated on April 26, 2017

The next OUG Ireland Meetup is happening on 11th May, in the Bank of Ireland Grand Canal Dock. This is a free event and is open to every one. You don’t have to be a member to attend.

Following on from a very successful 2 day OUG Ireland Conference with over 250 attendees, we have organised our next meetup. This was mentioned during the opening session of the conference.

We typically have 2 presentations at each Meetup and on 11th May we have:

1. Oracle Analytics Cloud Service.

Oralce Analytics Cloud Service was only released a few weeks ago and we some local people who have been working with the beta and early adopter releases. They will be giving us some insights on this new product and how it compares with other analytics products like Oracle Data Visualization and OBIEE.

Running Oracle DataGuard on RAC on Oracle 12c

The second presentation will be on using Oracle DataGuard on RAC on Oracle 12c. We have a very experienced DBA talking about his experiences of using these products how to workaround some key bugs and situations to be aware of for administration purposes. Lots of valuable information to be gained.

Check out the full agenda and to register for the Meetup by clicking on this link or on the Meetup image above.

There will be some food and refreshments available for you to enjoy.

The Meetup will be in Bank of Ireland, Grand Canal Dock. This venue is a very popular locations for Meetups in Dublin.

This entry was posted in Ireland, oug_ire, UKOUG and tagged OUG Ireland, oug_ire, UKOUG.

OUG Ireland 2016: APPs Track highlights

Posted on February 18, 2016

Today I was joined by Debra Lilly who is the APPs track lead and conference chair for OUG Ireland. Debra lets us know what we can look forward to on the APPs track at this years conference.

Check out Debra’s video.

Click on the image below to get more details of the agenda and to register for this 2 day conference.

Follow the conference and OUG Ireland on twitter using #oug_ire

This entry was posted in Ireland, Irish Oracle SIG, oug_ire.

OUG Ireland 2015 is next week

Posted on March 9, 2015

The annual OUG Ireland conference is on next week on Thursday 19th March.

If you haven’t already signed up for the conference this is only a few days left to do so. Click here to go to the registrations pages.

Also don’t forget to sign Maria Colgan’s one day seminar on the Oracle 12c In-Memory Option.

As always there is a very full agenda with 7 streams, 47 presentations and several keynote presentations.

I’ll be a draw for a copy of my book and I’ll be giving away a few Oracle Press goodies too. Check out this blog post for the details and rules of the book draw.

The following are the presentations I’m planning on attending (so you know where to find me)

table.myTable { border-collapse:collapse; } table.myTable td, table.myTable th { border:1px solid black;padding:5px; }

Time	Presenter	Topic
09:10-09:30	Debra Lilley	OUG Ireland Welcome, Introduction and Opening
09:30-10:10	Jon Paul (Oracle)	Opening Keynote by Jon Paul from Oracle
10:15-11:00	Oralce Presentation	Oracle Big Data Strategy
11:00-11:25		Exhibition Hall
11:25-12:10	Antony Heljula	Real Business Value Using Predictive BI (I’ve seen this before but I worked with Antony on some of what he will be talking about)
12:15-13:00	Roel Hartman & Brendan Tierney	What Are They Thinking? With Oracle Application Express & Oracle Data Mining. (we gave this presentation at Oracle Open World back in September 2014)
12:15-13:00	Gurcan Orhan	How to handle Dev, Test & Prod with ODI
13:00-14:00		Lunch (and then freaking out before I give my second presentation)
14:00-14:45	Brendan Tierney	Predictive Queries in Oracle 12c Database (I suppose I have to turn up to my own presentation)
14:50-15:35	Roel Hartman	Hidden APEX 5 Gems Revealed (APEX 5 is due out any day now)
15:35-16:00		Exhibition Hall & Coffee (and then freaking out before I give my third presentation)
16:00-16:45	Brendan Tierney	Running R in your Oracle Database using Oracle R Enterprise (This presentation generally runs for 50 minutes)
16:50-17:35	Maria Colgan	BI, Dev & Tech Closing Keynote: Oracle Database In-Memory-The next big thing
17:35-18:35		Event Social i.e. free drink 🙂

As you can see it is going to be a busy, busy day.

I would love to attend lots of others, but being able to be in multiple places at the same time is not one of them.

NOTE:The User Group has a rule that a presenter can have a max of 2 presentations. Unfortunately we had to break this rule a week out from the conference, due to some cancellations. And that is why I’ve ended with 2.5 presentations.

This entry was posted in Conference, Ireland, oug_ire.

Book give away at OUG Ireland

Posted on March 3, 2015

The annual Oracle User Group in Ireland conference is on the 19th March in Croke Park.

I’ll be giving 2 presentations, with one each on the Development and Business Analytics tracks. Here are the details of these presentations.

table.myTable { border-collapse:collapse; } table.myTable td, table.myTable th { border:1px solid black;padding:5px; }

Time	Room	Presentation Title / Topic
14:00-14:45	InterConnect 681	Predictive Queries in Oracle 12c
16:00-16:45	Davin Suite	Running R in the Database using Oracle R Enterprise

I will be giving away a copy of my book to one luck person 🙂

How will this book give away work?

During both of my presentations I will pass around a “hat” for you to put your name or business card into. Then at end of my last presentation we will draw one name out of the hat.

But you have to be in the room to collect the book. If you are not there then I will draw out another name (and so on) until the winner is in the room.

So by attending both of my presentations you are doubling your chances of winning my book.

(Maybe this is an attempt by me to have a good attendance at my last presentation)

Plus I might have a few other Oracle Press goodies to give away too.

This entry was posted in Conference, Ireland, oug_ire.

Oracle ACEs at OUG Ireland 2015

Posted on February 22, 2015

The annual Oracle User Group in Ireland Conference will be on Thursday 19th March. This year the conference will be held in the Croke Park conference centre. This conference centre is only a short taxi ride from Dublin Airport and Dublin City Centre.

If you are planning a hotel stay for the conference I would recommend staying in a hotel in the city centre and get a taxi to/from the conference venue.

We have a large number of Oracle ACEs presenting at the conference. The following table lists the ACEs, their twitter handle and their website.

table.myTable { border-collapse:collapse; } table.myTable td, table.myTable th { border:1px solid black;padding:5px; }

Oracle ACE	Type of ACE	Twitter Name	Blog / Web Site
Brendan Tierney	ACE Director	@brendantierney	http://www.oralytics.com
Debra Lilley	ACE Director	@debralilley	http://www.debrasoracle.blogspot.ie/
Jonathan Lewis	ACE Director	@JLOracle	http://jonathanlewis.wordpress.com/
Tim Hall	ACE Director	@oraclebase	http://oracle-base.com/
Alex Nuijten	ACE Director	@alexnuijten	http://nuijten.blogspot.com/
Dhananjay Papde	ACE Associate
Stewart Bryson	ACE Director	@stewartbryson	http://www.redpillanalytics.com/
Antony Heljula	ACE	@aheljula	http://www.peakindicators.com/
Gurcan Orhan	ACE Director	@gurcan_orhan	https://gurcanorhan.wordpress.com/
Heli Helskyaho	ACE Director	@HeliFromFinland	https://helifromfinland.wordpress.com/
Marco Gralike	ACE Director	@mgralike	http://www.xmldb.nl
Roel Hartman	ACE Director	@roelh	http://roelhartman.blogspot.com/
Martin Widlake	ACE	@mdwidlake	https://mwidlake.wordpress.com/
Liron Amitzi	ACE	@amitzil	http://www.dbaces.com/
David Kurtz	ACE Director	@davidmkurtz	http://www.go-faster.co.uk/
Marcin Przepiorowski	ACE	@pioro	http://oracleprof.blogspot.com/

Make sure you check out the full agenda for the conference by clicking on the following image. Plus there is a full day session on Friday 20th March with Maria Colgan on the Oracle In-Memory option.

This entry was posted in Conference, Ireland, oug_ire.

OUG Ireland 2015 : Now open for Submissions

Posted on November 14, 2014

OUG Ireland Call for submissions is now open.

The closing date for submissions is 5th January, 2015.

and the submission webpage can be found here.

The OUG Ireland conference will be on Thursday 19th March. Yes it is only a one day conference 😦 but we will be 5 or 6 or more streams. So there will be something for everyone and plenty of choice.

On Friday 20th March we will have Maria Colgan, formally the Optimizer Lady and now the In-Memory Queen (or something like that), giving a full day workshop on the In-Database option and the Optimizer. She will also be about for the main conference on the 19th, so you can expect a presentation or two from her on the Thursday.

Agenda selection day is the 8th January, 2015. So hopefully you will be getting the acceptance emails soon after that or during week of 12th January.

There is a committee of about 10 people who are involved in selecting presentations and setting the agenda. If it was up to me then I would accept everything/everyone. So if your presentation is not accepted this time, please don’t blame me 🙂 I said YES to your presentation, I really, really did. I fought so hard to have your presentation included. If your presentation is not accepted then the blame is down to the other committee members 🙂

The conference will be held in Croke Park, and is a 15-20 minute taxi ride from the Airport.

You can follow the Conference and other OUG Ireland events using the twitter tag #oug_ire

This entry was posted in Ireland, Irish Oracle SIG.

OUG Ireland 2013 Agenda is now live

Posted on February 7, 2013

The agenda for the OUG Ireland 2013 event is now live. The event will be in the Dublin Convention Centre on the 12th March. There are lots of excellent sessions, across 7 tracks!! So there will be something (or lots of things) for everyone who works in the Oracle world here in Ireland.

I’m sure the Oracle Database track will be very popular. I wonder why!!!

Agenda : http://www.ukoug.org/2013-events/oug-ireland-2013/agenda/

Remember registration is FREE. You don’t have to be a member of the User Group to come to this event. It is open to everyone and did I mention that it is FREE. Registration is now open.

I’ll be there. Well I suppose I have to as I’ll be presenting Smile

I hope to see you there.

This entry was posted in Conference, Ireland, Irish Oracle SIG, Oracle, oug_ire.

Tom Kyte in Dublin 21st January 2013

Posted on December 18, 2012

Tom Kyte will be back in Dublin on the 21st January, 2013. He will be giving a number of presentations covering some of his popular Oracle Open World sessions and will also include a AskTom session

It will be a full day, kicking off at 9am and finishing around 3:30pm.

There is no better way to kick off the new year with a full day of FREE Oracle training and up skilling with Tom Kyte.

To register for the event send an email to marketing-ie_ie@oracle.com

As they say places are limited, so book early, I have Smile so I’ll see you there.

This entry was posted in Brendan Tierney, database, Ireland, Irish Oracle SIG, Oracle, oug_ire, Tom Kyte.

OUG Ireland 2013–Call for Presentations

Posted on December 14, 2012

The call for presentations at the OUG Ireland Conference is now open. The conference will be on Tuesday 12th March in Dublin city centre.

It is hoped to have at a number of concurrent tracks covering all the main topic areas, including application development, database administration, business intelligence, applications, etc.

If you are interested in submitting a presentation then you need to fill in some of the detail at

OUG Ireland – Submit a Paper

Follow the OUG Ireland conversation on twitter using the tag #oug_ire

call for papers

This entry was posted in Brendan Tierney, Conference, Ireland, Irish BI SIW, Oracle, oug_ire.

Update on : Adding numbers between

Posted on November 9, 2012

Over the past few days I’ve had a number of emails and comments based on my previous post. My previous post was called ‘Adding numbers between two values’. I included some PL/SQL code that can be used to add up the numbers between two values. I mentioned that this was a question that my pre-teen son (a few year pre-teen) had asked me.

There are two main solutions to the same problem. One involves just using a SELECT and the other involves using recursion. I will come back the these alternative solutions below.

But let me start off with a bit more detail and background to why I approached the problem the way that I did. The main reason is that my son is a pre-teen. Over the past couple of years he as expressed an interest in what his daddy does. We even have matching ORACLENERD t-shirts Smile

When I was working through the problem with my son I wanted to show him how to take a problem and by breaking it down into its different parts we can work out an overall solution. We can then take each of these parts and translate them into code. In this case some PL/SQL, yes it is a bit nerdy and we do have the t-shirt. The code that I gave illustrates many different parts of the language and hopefully he will use some of these features as we continue on our learning experience.

It is good sometimes to break a problem down into smaller parts. That way we can understand it better, what works and what does not work, if something does not work then we will know what bit and also leads to easier maintenance. At a later point as you develop an in-depth knowledge of certain features of a language you can then rewrite what you have to be more efficient.

All part of the learning experience.

Ok lets take a look at the other ways to answer this problem. The first approach is to just use a single SELECT statement.

SELECT sum(rownum + &&Start_Number – 1)
FROM dual
CONNECT by level <= &End_Number – &&Start_Number + 1;

An even simpler way is

SELECT sum(level)
FROM dual
CONNECT BY level between &Start_Number and &End_Number;

These queries create a hierarchical query that produce all the numbers between the Start_Number parameter and the End_Number parameter. The SUM is needed to all all the numbers/rows produced. This is nice and simple (but not that easy for by son at this point).

Thank you to everyone who contacted me about this. I really appreciated your feedback and please keep your comments coming for all my posts.

This entry was posted in Brendan Tierney, database, database design, Ireland, Oracle, OTN, oug_ire, PL/SQL, SQL, UKOUG.

Ireland

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: