Published inThe Startup·Jan 12, 2021Feature Engineering for Large DatasetsA Few Tips — It is estimated that in 2025, 175 trillions of gigabytes of data will be created (that is for the year 2025 only! source). On top of this, a greater number of companies and institutions are making an effort to increase data’s availability to the public; they develop API or create…Feature Engineering5 min readFeature Engineering5 min read
Jan 6, 2021Follow Your Gut!But not this time… — I only recently learnt how to code in Python. Or should I say to code at all? I still consider myself a Python newbie, and I even doubt I will ever fully master it when I see how many things I learn every time I code. …Python5 min readPython5 min read
Dec 30, 2020Data Analysis with PandasFrom A to Z — 0 — The Data import numpy as np import pandas as pd import seaborn as sns df = sns.load_dataset('tips') dfPandas8 min readPandas8 min read
Dec 23, 2020Distance as MetricsNo Pun Intended — In Machine Learning, being able to calculate distances is essential in a lot of cases. The most obvious use for it is when dealing with spatial data, but it also is one of the easiest ways to assess membership/similarity in supervised and unsupervised techniques. …Python6 min readPython6 min read
Dec 16, 2020Hypothesis Testing Applied to A/B TestingThere is a lot of decisions to make during an A/B test; most of them are made during the conception stage. In this article, I am going to focus on the steps following the data collection stage. You will not hear about sample size nor power in this article (or…6 min read6 min read
Dec 9, 2020Getting More Out of Your Jupyter NotebooksiPython flavoured — Jupyter notebooks are incredibly powerful to develop ideas quickly, then share them if need be. The notebooks run in a web browser, and support many languages out of the box (Jupyter actually stands for Julia, Python and R, when it does not refer to Galileo’s notebook). You can dynamically code…Jupyter5 min readJupyter5 min read
Dec 2, 2020Speed Up Your Git WorkflowGit is a lot of things. But one of the things I like the most about it is that it can be so painful to use, it is funny (as long as I am not the victim). From the classic XKCD comic to the Git Koans, I find those very…Git4 min readGit4 min read
Nov 24, 2020Geopandas: Accessible, Yet Powerful GIS With PythonYou are here! — Recently, I have been doing a lot more work on data that have a spatial meaning. Lucky for me, I have always been interested in GIS and, in my professional life, I have used a lot of GIS packages such as ArcGIS, MapInfo, and qGIS. …Python6 min readPython6 min read
Nov 18, 2020SamplingFocussing your efforts to understand the bigger picture — Imagine you want to know which candidate will win the next election. Ideally, you conduct a census, and you ask every single person in the country up to two questions: Will you vote next elections? And if the answer is yes: Who are you voting for? You expect some people…6 min read6 min read
Nov 11, 2020Recursive Functions in PythonRecursive Functions in Python — You might not realise this, but recursions are very common. A lot of well-known acronyms are recursive, for example: Do you know what VISA (as in the VISA card in your wallet) stands for? It is the acronym for “Visa International Service Association”. Maybe you are familiar with YAML files…3 min read3 min read