Ireland

#GE2020 Comparing Party Manifestos to 2016

Posted on

A few days ago I wrote a blog post about using Python to analyze the 2016 general (government) elections manifestos of the four main political parties in Ireland.

Today the two (traditional) largest parties released their #GE2020 manifestos. You can get them by following these links

The following images show the WordClouds generated for the #GE2020 Manifestos. I used the same Python code used in my previous post. If you want to try this out yourself, all the Python code is there.

First let us look at the WordClouds from Fine Gael.

FG2020
2020 Manifesto
FG_2016
2016 Manifesto

Now for the Fianna Fail WordClouds.

FF2020
2020 Manifesto
FF_2016
2016 Manifesto

When you look closely at the differences between the manifestos you will notice there are some common themes across the manifestos from 2016 to those in the 2020 manifestos. It is also interesting to see some new words appearing/disappearing for the 2020 manifestos. Some of these are a little surprising and interesting.

Advertisement

#GE2020 Analysing Party Manifestos using Python

Posted on

The general election is underway here in Ireland with polling day set for Saturday 8th February. All the politicians are out campaigning and every day the various parties are looking for publicity on whatever the popular topic is for that day. Each day is it a different topic.

Most of the political parties have not released their manifestos for the #GE2020 election (as of date of this post). I want to use some simple Python code to perform some analyse of their manifestos. As their new manifestos weren’t available (yet) I went looking for their manifestos from the previous general election. Michael Pidgeon has a website with party manifestos dating back to the early 1970s, and also has some from earlier elections. Check out his website.

I decided to look at manifestos from the 4 main political parties from the 2016 general election. Yes there are other manifestos available, and you can use the Python code, given below to analyse those, with only some minor edits required.

The end result of this simple analyse is a WordCloud showing the most commonly used words in their manifestos. This is graphical way to see what some of the main themes and emphasis are for each party, and also allows us to see some commonality between the parties.

Let’s begin with the Python code.

1 – Initial Setup

There are a number of Python Libraries available for processing PDF files. Not all of them worked on all of the Part Manifestos PDFs! It kind of depends on how these files were generated. In my case I used the pdfminer library, as it worked with all four manifestos. The common library PyPDF2 didn’t work with the Fine Gael manifesto document.

import io
import pdfminer
from pprint import pprint
from pdfminer.converter import TextConverter
from pdfminer.pdfinterp import PDFPageInterpreter
from pdfminer.pdfinterp import PDFResourceManager
from pdfminer.pdfpage import PDFPage

#directory were manifestos are located
wkDir = '.../General_Election_Ire/'

#define the names of the Manifesto PDF files & setup party flag
pdfFile = wkDir+'FGManifesto16_2.pdf'
party = 'FG'
#pdfFile = wkDir+'Fianna_Fail_GE_2016.pdf'
#party = 'FF'
#pdfFile = wkDir+'Labour_GE_2016.pdf'
#party = 'LB'
#pdfFile = wkDir+'Sinn_Fein_GE_2016.pdf'
#party = 'SF'

All of the following code will run for a given manifesto. Just comment in or out the manifesto you are interested in. The WordClouds for each are given below.

2 – Load the PDF File into Python

The following code loops through each page in the PDF file and extracts the text from that page.

I added some addition code to ignore pages containing the Irish Language. The Sinn Fein Manifesto contained a number of pages which were the Irish equivalent of the preceding pages in English. I didn’t want to have a mixture of languages in the final output.

SF_IrishPages = [14,15,16,17,18,19,20,21,22,23,24]
text = ""

pageCounter = 0
resource_manager = PDFResourceManager()
fake_file_handle = io.StringIO()
converter = TextConverter(resource_manager, fake_file_handle)
page_interpreter = PDFPageInterpreter(resource_manager, converter)

for page in PDFPage.get_pages(open(pdfFile,'rb'), caching=True, check_extractable=True):
    if (party == 'SF') and (pageCounter in SF_IrishPages):
        print(party+' - Not extracting page - Irish page', pageCounter)
    else:
        print(party+' - Extracting Page text', pageCounter)
        page_interpreter.process_page(page)

        text = fake_file_handle.getvalue()

    pageCounter += 1

print('Finished processing PDF document')
converter.close()
fake_file_handle.close()
FG - Extracting Page text 0
FG - Extracting Page text 1
FG - Extracting Page text 2
FG - Extracting Page text 3
FG - Extracting Page text 4
FG - Extracting Page text 5
...

3 – Tokenize the Words

The next step is to Tokenize the text. This breaks the text into individual words.

from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
tokens = []

tokens = word_tokenize(text)

print('Number of Pages =', pageCounter)
print('Number of Tokens =',len(tokens))
Number of Pages = 140
Number of Tokens = 66975

4 – Filter words, Remove Numbers & Punctuation

There will be a lot of things in the text that we don’t want included in the analyse. We want the text to only contain words. The following extracts the words and ignores numbers, punctuation, etc.

#converts to lower case, and removes punctuation and numbers
wordsFiltered = [tokens.lower() for tokens in tokens if tokens.isalpha()]
print(len(wordsFiltered))
print(wordsFiltered)
58198
['fine', 'gael', 'general', 'election', 'manifesto', 's', 'keep', 'the', 'recovery', 'going', 'gaelgeneral', 'election', 'manifesto', 'foreward', 'from', 'an', 'taoiseach', 'the', 'long', 'term', 'economic', 'three', 'steps', 'to', 'keep', 'the', 'recovery', 'going', 'agriculture', 'and', 'food', 'generational',
...

As you can see the number of tokens has reduced from 66,975 to 58,198.

5 – Setup Stop Words

Stop words are general words in a language that doesn’t contain any meanings and these can be removed from the data set. Python NLTK comes with a set of stop words defined for most languages.

#We initialize the stopwords variable which is a list of words like 
#"The", "I", "and", etc. that don't hold much value as keywords
stop_words = stopwords.words('english')
print(stop_words)
['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're", "you've", "you'll", "you'd", 'your', 'yours', 'yourself',
....

Additional stop words can be added to this list. I added the words listed below. Some of these you might expect to be in the stop word list, others are to remove certain words that appeared in the various manifestos that don’t have a lot of meaning. I also added the name of the parties  and some Irish words to the stop words list.

#some extra stop words are needed after examining the data and word cloud
#these are added
extra_stop_words = ['ireland','irish','ł','need', 'also', 'set', 'within', 'use', 'order', 'would', 'year', 'per', 'time', 'place', 'must', 'years', 'much', 'take','make','making','manifesto','ð','u','part','needs','next','keep','election', 'fine','gael', 'gaelgeneral', 'fianna', 'fáil','fail','labour', 'sinn', 'fein','féin','atá','go','le','ar','agus','na','ár','ag','haghaidh','téarnamh','bplean','page','two','number','cothromfor']
stop_words.extend(extra_stop_words)
print(stop_words)

Now remove these stop words from the list of tokens.

# remove stop words from tokenised data set
filtered_words = [word for word in wordsFiltered if word not in stop_words]
print(len(filtered_words))
print(filtered_words)
31038
['general', 'recovery', 'going', 'foreward', 'taoiseach', 'long', 'term', 'economic', 'three', 'steps', 'recovery', 'going', 'agriculture', 'food',

The number of tokens is reduced to 31,038

6 – Word Frequency Counts

Now calculate how frequently these words occur in the list of tokens.

#get the frequency of each word
from collections import Counter

# count frequencies
cnt = Counter()
for word in filtered_words:
cnt[word] += 1

print(cnt)
Counter({'new': 340, 'support': 249, 'work': 190, 'public': 186, 'government': 177, 'ensure': 177, 'plan': 176, 'continue': 168, 'local': 150, 
...

7 – WordCloud

We can use the word frequency counts to add emphasis to the WordCloud. The more frequently it occurs the larger it will appear in the WordCloud.

#create a word cloud using frequencies for emphasis 
from wordcloud import WordCloud
import matplotlib.pyplot as plt

wc = WordCloud(max_words=100, margin=9, background_color='white',
scale=3, relative_scaling = 0.5, width=500, height=400,
random_state=1).generate_from_frequencies(cnt)

plt.figure(figsize=(20,10))
plt.imshow(wc)
#plt.axis("off")
plt.show()

#Save the image in the img folder:
wc.to_file(wkDir+party+"_2016.png")

The last line of code saves the WordCloud image as a file in the directory where the manifestos are located.

8 – WordClouds for Each Party

Screenshot 2020-01-21 11.10.25

Remember these WordClouds are for the manifestos from the 2016 general election.

When the parties have released their manifestos for the 2020 general election, I’ll run them through this code and produce the WordClouds for 2020. It will be interesting to see the differences between the 2016 and 2020 manifesto WordClouds.

OUG Ireland Meetup 11th May

Posted on Updated on

The next OUG Ireland Meetup is happening on 11th May, in the Bank of Ireland Grand Canal Dock. This is a free event and is open to every one. You don’t have to be a member to attend.

Following on from a very successful 2 day OUG Ireland Conference with over 250 attendees, we have organised our next meetup. This was mentioned during the opening session of the conference.

NewImage

We typically have 2 presentations at each Meetup and on 11th May we have:

1. Oracle Analytics Cloud Service.

Oralce Analytics Cloud Service was only released a few weeks ago and we some local people who have been working with the beta and early adopter releases. They will be giving us some insights on this new product and how it compares with other analytics products like Oracle Data Visualization and OBIEE.

Running Oracle DataGuard on RAC on Oracle 12c

The second presentation will be on using Oracle DataGuard on RAC on Oracle 12c. We have a very experienced DBA talking about his experiences of using these products how to workaround some key bugs and situations to be aware of for administration purposes. Lots of valuable information to be gained.

Check out the full agenda and to register for the Meetup by clicking on this link or on the Meetup image above.

There will be some food and refreshments available for you to enjoy.

The Meetup will be in Bank of Ireland, Grand Canal Dock. This venue is a very popular locations for Meetups in Dublin.

NewImage

OUG Ireland 2016: APPs Track highlights

Posted on

Today I was joined by Debra Lilly who is the APPs track lead and conference chair for OUG Ireland. Debra lets us know what we can look forward to on the APPs track at this years conference.

Check out Debra’s video.

Click on the image below to get more details of the agenda and to register for this 2 day conference.

NewImage

Follow the conference and OUG Ireland on twitter using #oug_ire

OUG Ireland 2015 is next week

Posted on

The annual OUG Ireland conference is on next week on Thursday 19th March.

If you haven’t already signed up for the conference this is only a few days left to do so. Click here to go to the registrations pages.

Also don’t forget to sign Maria Colgan’s one day seminar on the Oracle 12c In-Memory Option.

As always there is a very full agenda with 7 streams, 47 presentations and several keynote presentations.

I’ll be a draw for a copy of my book and I’ll be giving away a few Oracle Press goodies too. Check out this blog post for the details and rules of the book draw.

The following are the presentations I’m planning on attending (so you know where to find me)

table.myTable { border-collapse:collapse; } table.myTable td, table.myTable th { border:1px solid black;padding:5px; }

Time Presenter Topic
09:10-09:30 Debra Lilley OUG Ireland Welcome, Introduction and Opening
09:30-10:10 Jon Paul (Oracle) Opening Keynote by Jon Paul from Oracle
10:15-11:00 Oralce Presentation Oracle Big Data Strategy

11:00-11:25 Exhibition Hall
11:25-12:10 Antony Heljula Real Business Value Using Predictive BI

(I’ve seen this before but I worked with Antony on some of what he will be talking about)

12:15-13:00 Roel Hartman &

Brendan Tierney

What Are They Thinking? With Oracle Application Express & Oracle Data Mining.

(we gave this presentation at Oracle Open World back in September 2014)

12:15-13:00 Gurcan Orhan How to handle Dev, Test & Prod with ODI
13:00-14:00 Lunch

(and then freaking out before I give my second presentation)

14:00-14:45 Brendan Tierney Predictive Queries in Oracle 12c Database

(I suppose I have to turn up to my own presentation)

14:50-15:35 Roel Hartman Hidden APEX 5 Gems Revealed

(APEX 5 is due out any day now)

15:35-16:00 Exhibition Hall & Coffee

(and then freaking out before I give my third presentation)

16:00-16:45 Brendan Tierney Running R in your Oracle Database using Oracle R Enterprise

(This presentation generally runs for 50 minutes)

16:50-17:35 Maria Colgan BI, Dev & Tech Closing Keynote: Oracle Database In-Memory-The next big thing
17:35-18:35 Event Social i.e. free drink 🙂

As you can see it is going to be a busy, busy day.

I would love to attend lots of others, but being able to be in multiple places at the same time is not one of them.

NOTE:The User Group has a rule that a presenter can have a max of 2 presentations. Unfortunately we had to break this rule a week out from the conference, due to some cancellations. And that is why I’ve ended with 2.5 presentations.

Book give away at OUG Ireland

Posted on

The annual Oracle User Group in Ireland conference is on the 19th March in Croke Park.

I’ll be giving 2 presentations, with one each on the Development and Business Analytics tracks. Here are the details of these presentations.

table.myTable { border-collapse:collapse; } table.myTable td, table.myTable th { border:1px solid black;padding:5px; }

Time Room Presentation Title / Topic
14:00-14:45 InterConnect 681 Predictive Queries in Oracle 12c
16:00-16:45 Davin Suite Running R in the Database using Oracle R Enterprise

I will be giving away a copy of my book to one luck person 🙂

How will this book give away work?

During both of my presentations I will pass around a “hat” for you to put your name or business card into. Then at end of my last presentation we will draw one name out of the hat.

But you have to be in the room to collect the book. If you are not there then I will draw out another name (and so on) until the winner is in the room.

So by attending both of my presentations you are doubling your chances of winning my book.

(Maybe this is an attempt by me to have a good attendance at my last presentation)

Book Cover

Plus I might have a few other Oracle Press goodies to give away too.

Oracle ACEs at OUG Ireland 2015

Posted on

The annual Oracle User Group in Ireland Conference will be on Thursday 19th March. This year the conference will be held in the Croke Park conference centre. This conference centre is only a short taxi ride from Dublin Airport and Dublin City Centre.

If you are planning a hotel stay for the conference I would recommend staying in a hotel in the city centre and get a taxi to/from the conference venue.

We have a large number of Oracle ACEs presenting at the conference. The following table lists the ACEs, their twitter handle and their website.

table.myTable { border-collapse:collapse; } table.myTable td, table.myTable th { border:1px solid black;padding:5px; }

Oracle ACE Type of ACE Twitter Name Blog / Web Site
Brendan Tierney ACE Director @brendantierney http://www.oralytics.com
Debra Lilley ACE Director @debralilley http://www.debrasoracle.blogspot.ie/
Jonathan Lewis ACE Director @JLOracle http://jonathanlewis.wordpress.com/
Tim Hall ACE Director @oraclebase http://oracle-base.com/
Alex Nuijten ACE Director @alexnuijten http://nuijten.blogspot.com/
Dhananjay Papde ACE Associate
Stewart Bryson ACE Director @stewartbryson http://www.redpillanalytics.com/
Antony Heljula ACE @aheljula http://www.peakindicators.com/
Gurcan Orhan ACE Director @gurcan_orhan https://gurcanorhan.wordpress.com/
Heli Helskyaho ACE Director @HeliFromFinland https://helifromfinland.wordpress.com/
Marco Gralike ACE Director @mgralike http://www.xmldb.nl
Roel Hartman ACE Director @roelh http://roelhartman.blogspot.com/
Martin Widlake ACE @mdwidlake https://mwidlake.wordpress.com/
Liron Amitzi ACE @amitzil http://www.dbaces.com/
David Kurtz ACE Director @davidmkurtz http://www.go-faster.co.uk/
Marcin Przepiorowski ACE @pioro http://oracleprof.blogspot.com/

Make sure you check out the full agenda for the conference by clicking on the following image. Plus there is a full day session on Friday 20th March with Maria Colgan on the Oracle In-Memory option.

Ougire15 hp cfp v2

OUG Ireland 2015 : Now open for Submissions

Posted on

OUG Ireland Call for submissions is now open.

The closing date for submissions is 5th January, 2015.

and the submission webpage can be found here.

Ougire15 hp cfp v2

The OUG Ireland conference will be on Thursday 19th March. Yes it is only a one day conference 😦 but we will be 5 or 6 or more streams. So there will be something for everyone and plenty of choice.

On Friday 20th March we will have Maria Colgan, formally the Optimizer Lady and now the In-Memory Queen (or something like that), giving a full day workshop on the In-Database option and the Optimizer. She will also be about for the main conference on the 19th, so you can expect a presentation or two from her on the Thursday.

Agenda selection day is the 8th January, 2015. So hopefully you will be getting the acceptance emails soon after that or during week of 12th January.

There is a committee of about 10 people who are involved in selecting presentations and setting the agenda. If it was up to me then I would accept everything/everyone. So if your presentation is not accepted this time, please don’t blame me 🙂 I said YES to your presentation, I really, really did. I fought so hard to have your presentation included. If your presentation is not accepted then the blame is down to the other committee members 🙂

The conference will be held in Croke Park, and is a 15-20 minute taxi ride from the Airport.

You can follow the Conference and other OUG Ireland events using the twitter tag #oug_ire

OUG Ireland 2013 Agenda is now live

Posted on

The agenda for the OUG Ireland 2013 event is now live. The event will be in the Dublin Convention Centre on the 12th March. There are lots of excellent sessions, across 7 tracks!!  So there will be something (or lots of things) for everyone who works in the Oracle world here in Ireland.

I’m sure the Oracle Database track will be very popular. I wonder why!!!

Agenda : http://www.ukoug.org/2013-events/oug-ireland-2013/agenda/

Remember registration is FREE. You don’t have to be a member of the User Group to come to this event. It is open to everyone and did I mention that it is FREE.  Registration is now open.

I’ll be there. Well I suppose I have to as I’ll be presenting Smile

I hope to see you there.

agenda launch oug ireland

Tom Kyte in Dublin 21st January 2013

Posted on

Tom Kyte will be back in Dublin on the 21st January, 2013.  He will be giving a number of presentations covering some of his popular Oracle Open World sessions and will also include a AskTom session

It will be a full day, kicking off at 9am and finishing around 3:30pm.

There is no better way to kick off the new year with a full day of FREE Oracle training and up skilling with Tom Kyte.

To register for the event send an email to marketing-ie_ie@oracle.com

As they say places are limited, so book early,  I have Smile so I’ll see you there.

OUG Ireland 2013–Call for Presentations

Posted on

The call for presentations at the OUG Ireland Conference is now open. The conference will be on Tuesday 12th March in Dublin city centre.

It is hoped to have at a number of concurrent tracks covering all the main topic areas, including application development, database administration, business intelligence, applications, etc.

If you are interested in submitting a presentation then you need to fill in some of the detail at

OUG Ireland – Submit a Paper

Follow the OUG Ireland conversation on twitter using the tag  #oug_ire

call for papers

Update on : Adding numbers between

Posted on

Over the past few days I’ve had a number of emails and comments based on my previous post.  My previous post was called ‘Adding numbers between two values’. I included some PL/SQL code that can be used to add up the numbers between two values. I mentioned that this was a question that my pre-teen son (a few year pre-teen) had asked me.

There are two main solutions to the same problem. One involves just using a SELECT and the other involves using recursion. I will come back the these alternative solutions below.

But let me start off with a bit more detail and background to why I approached the problem the way that I did. The main reason is that my son is a pre-teen. Over the past couple of years he as expressed an interest in what his daddy does. We even have matching ORACLENERD t-shirts Smile

When I was working through the problem with my son I wanted to show him how to take a problem and by breaking it down into its different parts we can work out an overall solution. We can then take each of these parts and translate them into code. In this case some PL/SQL, yes it is a bit nerdy and we do have the t-shirt. The code that I gave illustrates many different parts of the language and hopefully he will use some of these features as we continue on our learning experience.

It is good sometimes to break a problem down into smaller parts. That way we can understand it better, what works and what does not work, if something does not work then we will know what bit and also leads to easier maintenance. At a later point as you develop an in-depth knowledge of certain features of a language you can then rewrite what you have to be more efficient.

All part of the learning experience.

Ok lets take a look at the other ways to answer this problem. The first approach is to just use a single SELECT statement.

SELECT sum(rownum + &&Start_Number – 1)
FROM    dual
CONNECT by level <= &End_Number – &&Start_Number + 1;

An even simpler way is

SELECT sum(level)
FROM    dual
CONNECT BY level between &Start_Number and &End_Number;

These queries create a hierarchical query that produce all the numbers between the Start_Number parameter and the End_Number parameter. The SUM is needed to all all the numbers/rows produced.  This is nice and simple (but not that easy for by son at this point).

Thank you to everyone who contacted me about this. I really appreciated your feedback and please keep your comments coming for all my posts.