Towards becoming a Data Scientist

The term data scientist seems to mean so many different things to different people. I was thinking it might suit me before looking into what it actually means: After all, I am interested in statistics, behaviour, machine learning and UX evaluation. Looking at job advertisements for data scientist and data analysist showed just how wide the range of job descriptions as well as the range of requirements can be. It seemed to include everything, from jobs where the main task is data entry to jobs where programming is the main task.

I found the infographic on datacamp helpful and decided to take it as my main guideline what skills in what area to acquire.

 

However, more detailed research was needed.

To find out what is involved, research on the sharp end was needed:

On the 27/03/2016 a search for “data scientist” brought up 23 results on irishjobs.ie and 275 on jobs.ie.

These sites were chosen because of their high rankings. Both sites are very popular in Ireland’s job market. The site is ranked 97, and 103 respectively in Ireland by http://www.alexa.com. It was therefore decided to focus on irishjobs.ie because its ranking was higher.

 alexa1

alexa2

Of the 23 hits, most were by recruitment agencies. These were not further considered, because it cannot be seen if they are advertising for the same company, which would lead to duplication. This left 8 adverts of which on further needed to be discarded, as it was the same advertisement twice published on different dates.

Of the remaining 7 adds that were life on that day,

6 required a relevant degree in either math (including statistics), computer science or engineering, half mentioned PhD level, although that was not an essential requirement. Apart from specific requirements for the job profile (e.g. experience in customer facing roles) all of the remaining adds want relevant experience from a minimum of 3 years to a minimum of 6 years.

Knowledge that was required was

Agile software (2x)

Other Programming (3x)

SAS (3x)

R (3 x)

Hadoop (3x)

SQL (3x)

Python (2x)

Machine Learning (3x)

 

Furthermore, on wanted web skills, such as HTML, CSS, JavaScript and PHP, another company was interested in SPSS, , Scala,, the L language, SQL, OLAP, MDX.

These seemed an interesting starting point to compare it with the infographics from datacamp.com

I was astonished to find only one company requiring basic webskills. However, with the raise of coding as a subject in schools in the UK, US, India etc it might be that they are not explicitly required because by now it is just assumed that these skills are given – a bit like not explicitly asking for basic literacy.

The heavy focus on mathematics and statics however, did make me hopeful. For one, it indicates and interesting field of work. Secondly, as long as mathematics still has the image of being a difficult thing that only a few chose can acquire, the amount of people who chose to delve into[i] it further will stay limited, enabling mediocre people to find work – the prospect of which makes my mediocre heart beat faster. This in addition to know that pen courses in mathematics can be fairly cheap.

It is also interesting to know that the chart seems to imply that a firm footing in traditional research will be further in demand. This is also supported by the adds that were used to give the insight into practice, that often mentioned Masters or PhD degrees, and according to the infographic nearly 10% of working data scientists have a PhD (compared with a about 3% of the general population in the US.)

The chart suggests ‘hacking skills’ as a major factor. I assume this is deliberately described in lose terms. The adds asked occasionally for experience or knowledge of specific languages or programming skills, but it seems that in general, once people get to grips with ANY programming language thoroughly, the acquisition of new or related languages is just a question of a bit of additional work.

 

Interestingly from this sample of adds (which is an unscientific sample of convenience) the demand for knowledge of database construction seems to be secondary, although there is a strong demand for SQL knowledge, and one add asked for Cassandra.

By looking at these two sources, the sample from the job add at a specific date and the chart made by datacamp.com shows a very interesting picture of what is involved in being a data scientist as well as to get there. The focus on math is interesting. As the technology and tools seems to move so quickly, it seems that mathematics is a tool or area of knowledge that is unchanging, and therefore worth investing time and effort in.

The other area that seems worth focusing on are longstanding programming languages such as python and Java. They should put you on a solid foundation.

SQL seems a useful tool to know, for work as a data scientist as well as for many other areas of work.

Interesting but not surprising is the complete absence of knowledge of digital marketing or management from both sources the infographic and the adds. The course in ‘Big Data’ seems to focus on these areas equally to the math and database. The way these areas are assessed makes it easy to get distracted by the areas that  are worth focusing on and these modules whose part in a big data course can only be guessed.

So for the continuation on the path to become a data scientist, for my Python will be the next area to look at, plus I finally want to enrol in the BSc in Mathematics degree, that I was looking at for years.

 

Leave a Reply

Your email address will not be published. Required fields are marked *