Thursday, April 19, 2018

A year of working as data scientist

I want to give a brief overview of working as a data scientist for one year. I wrote several posts about my progress and now it is time to look back and see what was accomplished (though all of this is only a beginning). Maybe my story will help to motivate some people planning to work as DS.


  • Started working in a bank on April, 17 last year;
  • Completed machine learning specialization by Yandex and MIPT on Coursera;
  • Completed most of cs231n course;
  • Build a first pet-project: online recognition of handwritten digits. This project was well received and people still use it - the app was used at least 2000 times in the current month;
  • Realized that working in the bank isn't interesting and rewarding, also SAS isn't fun at all. I changed my job and started working in a startup;
  • Completed second session of ml_open course (here is a link for english version) and finished at the 5th place;
  • During 2 months of working in startup I did only one task, and there were no definite plans for future. Also several crazy thing happened and I decided that I deserve something more. So I changed my job again;
  • Completed two courses (kaggle and nlp) in Advanced Machine Learning specialization on Coursera;
  • As a part of final task of nlp course I made a telegram chat-bot @amlnlpbot;
  • Started taking part in official and unofficial meetings of data scientists;
  • Continued working on my portfolio: https://erlemar.github.io/ ;
  • Realized that there was only little professional development for me in the current company and decided to change my job once more;
  • Tried to take part in machine learning competitions but without success. Currently I take part in a competition on Kaggle, where prizes are given for kernels with the most votes, here is the link;


Thoughts:

  • A year ago I couldn't imagine how amazing, motivating and useful would be ods.ai community :) It is possible that without it I would still stagnate in a bank;
  • It is very difficult to find balance between practice and theory. I know that my theoretical knowledge isn't enough, especially regarding statistics and maths, but I have yet to have problems with these spheres at job, so currently I practice morel
  • Sometimes I feel that most companies (excluding big or/and advanced ones) hire DS just to have them, while having little relevant tasks or not understanding their possibilities;
  • Realized it is necessary to take part in competitions, even if it is only for experience;
  • I suppose it is worth investing more time into learning DL. Of course there are many interesting tasks without it, but most of them are related to marketing;
  • It is important to improve programming skills;
Plans:
  • Pay a lot of attention to DL. Complete fast.ai course or some parts of it, then try to implement popular papers;
  • Take part in competitions on Kaggle. Earn at least a silver medal;
  • Maybe try learning R to understand why so many people praise it for data processing and visualizations as well as Shiny;
  • Improve programming skills. Maybe learn Java/Scala for writing production solutions;
  • Create 1-2 more pet-projects;


Sunday, March 25, 2018

Some things change, others stay the same.

More that 1.5 years ago I have decided to change my career to Data Science and Machine Learning. Since that fateful decision I have been spending most of my free time on getting new knowledge and skills. This is quite fun, though as a result some other things need to be sacrificed.

There are so many things to learn: fundamental knowledge is necessary (like statistics and math), programming (improving Python skills and possibly studying a new language), analytics and ML itself.

Telegram chat-bot
Some time ago I took a course on NLP on Coursera and mostly liked it. On the one hand it gave me a lot of useful practical and theoretical information, on the other hand sometimes there was too much theory, also writing code in Tensorflow and debugging it is tough (thanks to this I liked Keras even more :) ).
The final task was to build a Telegram chatbot. The main functionality was finding an intent of the question and giving the appropriate answer. There were two main intents. If the question was related to programming, the bot should send a relevant answer on Stackoverflow. Or if the user simply wanted to talk, bot should be able to chatter. For a default implementation a trained chatbot model was offered.
And then there was an additional (so called honor) assignment - the challenge was to build and train conversational model by ourselves.
I have successfully done it, here is the link to the bot and the link to the github.
I won't bother you with technical details (if you are interesting, you can read more on github), I'll just describe the bot functionality:

  • If you ask it a question related to programming (belonging to 10 most popular languages), it will look for an answer on Stackoverflow;
  • If you send a sentence with word "weather" and city name, it will give a weather forecast for the next 5 days;
  • 'tweet/twitter account_name' command shows the latest tweet by the user;
  •  'today!' - shows current date and random fact about it;
  • And, of course, bot can talk to you! Though it is kind of dumb :)


Working on this project was really fun and I learned a lot of interesting things - for example, how to host a running bot on Amazon EC2.

DonorsChoose competition on Kaggle
Founded in 2000 by a high school teacher in the Bronx, DonorsChoose.org empowers public school teachers from across the country to request much-needed materials and experiences for their students. At any given time, there are thousands of classroom requests that can be brought to life with a gift of any amount. DonorsChoose.org receives hundreds of thousands of project proposals each year for classroom projects in need of funding. Right now, a large number of volunteers is needed to manually screen each submission before it's approved to be posted on the DonorsChoose.org website.
So the fund has created a challenge on Kaggle to help building a model to automatically approve applications.
The prizes are awarded for the most voted kernels, so it isn't necessary to build a high accuracy model.
I have dediced to take part and here is my exploratory data analysis.

Other things
Sadly, I have little time for studying foreign languages currently... so I try my best to keep my knowledge: my phone system language is Spanish, I try to read Spanish and German news, still review flashcards in Anki and sometimes read manga in Japanese.

Last year I started reading indie books and some of them are true gems! Recently I have discovered "Chronicles of the Black Gate Series". This wonderful series takes place in the world, where humans believe that place in society is determined by your birth. Depending on place of your birth you could be a warrior, slave or even heretic. Thanks to reincarnation you can move to a better or worse place. There are multiple points of views ranging from teenagers to adults. Character development is quite believable. A lot of characters face a crisis of belief which results in strengthening of beliefs of some characters and changing or bending of beliefs by others. . Price which is paid for success - and sometimes it is too high! And speeches which characters give are really impressive and strike to the core. I really liked the series.

Tuesday, January 2, 2018

My results of 2017

My main accomplishement in 2017 was successfully changing career to Data Science. Doing it wasn't easy, but it was definitely worth it. Of course, there are many things which I have yet to learn, but this is a start and I already have a middle level position.

Also I have read a lot of books this year, most of them fantasy. The most prominent series were books by Brandon Sanderson about Cosmere. On the other hand I have more appreciation for indie authors and for writers of fanfiction - you can find real gems there.