Monthly Archives: December 2013

Permalink to single post

Online News Association launches Kenya’s digital future

Internews logo 1                ONA

Failed healthcare promises, the human cost of abortion limitations and the need for access to contraceptive to prevent unsafe abortions were some of the big stories in the Nation, the Standard and the Star in November.
The journalists who told these stories, delivering the biggest week in Data Journalism in Kenya, will share their experiences mining data, conceiving and delivering the stories in accessible and visualized formats.. The data journalists will share their hopes for an open data movement in Kenya that will inform policy decisions and improve healthcare access for Kenyans.
The five storytellers, who are data journalism fellows at Internews, will display their digital media work at the inaugural event of the Online News Association Nairobi Group. The Online News Association (ONA) helps shape the future of journalism by organizing networking events, training opportunities and public discussions for local journalism community. ONA members are thought leaders, blazing a trail for digital journalism.
ONA Nairobi, initiated by Internews in Kenya, will be a space for resource sharing, collaboration and experimentation. It will showcase the talents of local media producers and innovators and support the needs of the changing journalism environment.
Internews in Kenya and ONA invite you to the inauguration of the ONA Nairobi Group, where the Internews in Kenya Data Fellowship stories will be presented.
The event will take place on Wednesday November 11 at 3 pm at the MRC.
Please join us for a lively sharing and discussion on the place of data and data journalism in improving access to health for Kenyans. We welcome journalists, developers and other open data activists committed to data and digital journalism.

Permalink to single post

Simple data scraping using online tools

Scraping is a set of techniques required to extract information from various formats like the web, PDFs, or scanned images   into a file type that can be analyzed further, for example into table formats including comma-separated values (csv) or  Microsoft Excel (xls) files.

There are online tools/websites that enable users to extract data from files by converting them. Some of the web based software for simple scraping of PDFs include:

The common steps to converting your PDF file on any of the three platforms are:

a)      Upload your PDF file

b)      Enter a valid email address. You should be able to access this email address because the converted file will be sent there.

c)       Click on the convert button. The web service will momentarily process the file and on completion display a success dialogue box.

d)      Open your email address to access your converted document.

As a practical example, we will try to convert this document which is uploaded on Google Drive. You must have a Google account to access the file.  The data set is about projected health development budget estimates from 2011 to 2014. We want to calculate thesum of the total health development county budget from 2010 through to 2014. We will convert the document into an Excel file so that we can use Excels’ sum function to get the total.

1) Download the file from Google Drive.

2) From your browser navigate to http://www.pdftoexcelonline.com/

3) Click on the “Select  a File” button as shown below. Browse to your saved file and select it. Click open then enter a valid email address to which the converted document will be sent to. You can opt to use a junk email service like http://www.mailinator.com . It enables you to receive emails without signing up. Just enter a random name e.g. [email protected] and click on “Check it.”

Mailinator

Enter the same email address on pdftoexcelonline.com

First step

4)      Click on Convert it! Button.

Convert

5)  The browser will momentarily give a dialogue box to inform you that it is processing your document. Then on completion, you will get a screen like the one below:

Complete

6)  If you check your mailinator account, you should now have one email in your inbox. The email is from pdftoexcel. It contains the converted document.

7) Download your document by clicking on the link provided.

Get file

Save the file on your computer. The downloaded file is now in Excel format (.xls) and therefore by opening it in Microsoft Excel, you are able to perform calculations on the dataset.

8)  We can test our file by performing a simple sum calculation on the county budgets from 2010 to 2014. Open the file in Excel, then in column G header, input Total or Sum.  In cell G2 is where we will perform our calculation. Click on cell G2 then do a sum function like in the diagram below.

9)

Summation

10)  Hit Enter. You get a Total of county health development budgets from 2010 to 2014 per county. You can now fill down to get the rest of the values.

Fill Handle

By converting the data from a PDF format to an Excel format, we were able to add a computational column called Total/Sum. We would not have been able to do this in a PDF file. This is an example of data scraping.