Winning Our First Data Competition: Our 4 Takeaways

Machines learn, and so do Mutters

Posted by Javier Mermet

on October 5, 2022 · 5 mins read

Some Context On The Competition

Every year, the Natural and Exact Sciences faculty of the University of Buenos Aires organizes the Escuela de Ciencias Informáticas AKA the "ECI". From their own page,

The goal of ECI is to offer Computer Science students and practitioners top-level intensive courses on topics not covered by the regular curricula. These courses are taught by lecturers from universities and institutions from all around the world which allows the schools’ participants to establish links for academic cooperation, as well as promote research and development activities.

There are usually one or two competitions hosted on MetaData encompassed in the activities of the ECI, sponsored by a partner. This year, AlixPartners sponsored the Time Series at Hopp competition. The goal was to predict principal payments for several loans, taking historic information into account.

Why This Data Competition?

On this occasion, Mutt Data was a Gold Partner, and what better way to be part of the event than to participate working on what we know best: data.

Secondly, be it HOPP or any other data competition or data event, we actively seek opportunities to get to know each other, as it can be a challenge for a remote-first async company.

Last but not least, HOPP was the perfect fit for our team. The competition had a low entry barrier, a well-defined problem statement, and a great local community around it. Additionally, we had a little over a month which kept it short and to the point.

Our Journey

In 2020, I created a Slack channel called`#kaggle-metadata to keep track of and organize a team to participate in ML challenges on Kaggle. With few members, some links as the channel's main content, and a lack of follow-up meetings it didn't take long to fade out. It wasn't until 2022 that we proactively revamped the space for it to kick off properly. This time around, with processes and a proper plan in motion.

MLContests was great in helping us curate ongoing competitions. Not only did it list Kaggle competitions, but several others we wouldn't have heard of otherwise. Those of us with more experience in ML competitions would help vet competitions based on their goals, data, complexities, and timelines.

Once we had picked our first competition, the operational details were straightforward: select teams, create repos and private slack channels to avoid breaking competition rules and allow each team to self-organize. Every two weeks, we met to discuss progress, blockers, issues, learnings, onboard new interested mutters, and next steps.

Key Takeaways

Our experience preparing as a team and the competition itself left us with the following takeaways.

1) Make Learning Fun!

In previous posts, we mentioned our onboarding process and Data Office Hours as two learning moments at Mutt. What ties it all together? What drives learning at Mutt? We thought we might answer that question with two quotes from Isaac Asimov:

Quote 1) That’s another trouble with education as we now have it. People think of education as something that they can finish. And what’s more, when they finish, it’s a rite of passage.

At Mutt we believe in learning as a continuous process. No matter your degree or years of experience in the industry there is always something to learn.

Quote 2) The trouble with learning is that most people don’t enjoy it because of the circumstances. Make it possible for them to enjoy learning, and they’ll keep it up.

At Mutt, we look for experiences like HOPP that can make learning fun. When people are genuinely interested and invested in activities is when they learn the most. This was visible in the energy that was present during this competition. Sometimes our daily tasks are not as engaging as we'd like them to be. Data competitions teach us that if this is the case, we can always go out and look for different learning spaces.

2) Pick Your Battles

What is the purpose of our competitions group? To create a safe space with no client deadlines, no need for maintainability or future-proofing concerns. A place where the team can practise what they want to practice. To create a playground for learning.

Leading the initiative means carefully planning and selecting competitions that are useful and interesting to everyone.

Before HOPP, we started with the Japan Stock Exchange data competition. Only to realize it wasn't possible to see how well we were doing and we wouldn't be able to compare our results against competitors. The possibility of comparing our work, learning and improving is a must for us.

3) Knowledge Transfer Is The Prize

The prizes were a great recognition, but not our goal. After the competition, we asked each Mutter to prepare a 15-minute run-through of their submitted prediction. Little did we know those estimated 75 minutes would end up being 3+ hours.

We asked ourselves some key questions: What were our objectives? What had we originally wanted to do? What had been possible to implement given the constraints? And, of course, what had we learned?

This was the most enriching part of the experience. I’ve participated in other data competitions, but I was always stuck with my way of solving problems, and my own experience. On this occasion, we were able to share our experiences and, hence broaden our minds.

Some of the main learnings that came from our knowledge transfer were:

Feature engineering is second to none when improving your model.
Having a quick setup helps… a lot.
The time to first submit is crucial.
A big part of the score is communication-oriented. Being able to explain and convey what you’re doing and why. It’s crucial to try to tell a story that guides the reader through your ideas, assuming as little as possible of their knowledge. (P.S this storytelling was what landed me the first spot)
Low TTFS improves morale, gives you a baseline, and motivates you to improve your score.
Having people around you participating in the same competition is encouraging! It allows for internal reviews, a great back and forth of questions, and a coming together of people with different backgrounds and expertise. Healthy competitiveness drives improvement.

4) Ownership

Creating and maintaining learning spaces of this nature requires more than good intentions. Constant nurturing and push are needed to thrive. Someone needs to step up, take ownership, and communicate action points and the next steps.

Results

The first three places went to Mutt Data! But, we didn't stop there. Our team also won 7th and 11th place.

We were eager to see if the positions would hold after the final jury evaluation to understand if other approaches had been better and why. The results held up, and the feedback during the event was great.

Wrapping Up!

This data competition has left us with many learnings which we look forward to applying. Our team is already taking on a computer vision competition. We plan to continue expanding this space to learn, get to know each other, and have fun.

As for the tech details, stay tuned for more posts in this series down the line.

We're Hiring!

We hope you’ve found this post useful, and at least mildly entertaining. If you like what you’ve read so far, got some mad dev skills and like applying machine-learning to solve tough business challenges, visit our lever account for current job openings!