You just broke into the world of data, Now what?

Anthony Patrick Saoud
5 min readOct 11, 2021

You probably heard a lot about the fascinating world of data through buzzwords like data science, artificial intelligence, business intelligence, data models, pipelines, data engineering, analytics and more. And you finally decided it’s time to bite the bullet and start your career as a data analyst. Well, what did you really walk into and what should you expect?

Photo credits: unsplash.com/@carlheyerdahl

It is day one on the job, and you are hoping that the courses you took before interviewing for this entry level job will give you everything you need to know to succeed. The reality is, however, far from the courses. The online courses only teach you the basics of what it takes to succeed. You would have learned about writing SQL queries on fictional tables, and a one size fits all approach. There is a misconception out there that understanding SQL/programming is all you need to become a good data analyst. Your success as a data analyst is not only dependent on how well you write queries/code!

SQL is your hammer, and not all nails are equal.

As a Data Analyst, it is your job to become the expert in everything data. So here is what you need to do to succeed.

Gain a very deep understanding of how tables are related to each other (Primary vs Foreign Keys).

You can be the greatest query builder, but if you do not understand the data model, I regret to inform you that you will not survive. It is crucial that you understand how tables are connected to each other. Understand the concept of primary vs foreign keys and identify them in the tables.

For example, you are going to connect the sales table to the customer table. In the Customer table, customer.id is the primary key that connects to the Sales table via sales.customer_id, which is a foreign key.

A primary key is a field in a table that is unique across all rows, whereas foreign key is a field in a table that links to a primary key in another table.

Understand what is in the tables, and how the tables are built.

You were just given a task to pull data from the customers and sales tables. Simple right? Just join the sales table to the customers table, and voila, now you know who the top 10 customers are. You submitted the request, closed off the ticket, then your manager comes over and says that the numbers are very inflated, in fact, it seems like you are double counting sales. You go back to review what happened and you find out that there are duplicate customer ID’s in the customer table. Wait a second, that should never happen right? Did you notice the “ingested_date” field? No you did not. But you just discovered the very hard way Type 2 Slowly Changing Dimensions, which will lead you down the path of learning how to only take the latest records for a primary key.

Point of the story is, understand the architecture of the tables you are working with. Understand how data is collected.

Make your work reproducible.

After spending hours researching how to take the most recent record, you just wrote that awesome query that can generate your top 10 customers. Request re-submitted, and you call it a day. Exactly 30 days later, the same request comes in. You scramble to get the work done but you cannot remember how to take the most recent records, and you don’t remember how you wrote the query 30 days ago. So you spend a few hours trying to figure it all out again.

How can you make this process better? It’s rather simple. Save your query as a view and then refresh it as needed. Or my preferred option is to create a dashboard that you can hand to the business.

What is the lesson here? You should understand what is repetitive work vs one time (AKA adhoc). At the time of request collection, that question should be asked. This will save you trouble down the road.

Stakeholder management.

In most corporations, if your data work is for internal use, then your customers are business stakeholders. In your head you should think this way: “my business is data, and my customers are stakeholders”. You want those customers to be as happy as can be or your business fails.

You need to have the following:

  1. Open communication with stakeholders.
  2. Detailed request intake. Ideally through a ticketing system.
  3. In person meeting at time of delivery of request, or detailed write up of what to expect in the project you just delivered. Either method depends on the size of the project.
  4. You have to be able to pushback on stakeholders if the request is going to get in the way of high priority projects (this is the manager’s job, but you never know).

Find a mentor, and carve out your career path.

I know, this sounds fluffy. You probably will not be a data analyst for the rest of your life, so you need to understand the different paths ahead of you.

You can take the technical path, which is in high demand today, or become a data expert who is a partner to the business. With a technical skillset, i.e. expert coding skills in Python, Scala, Java, or SQL, you likely will end up going down the technical path. This is not to say that you are reliant on coding only to succeed. There is still a big aspect to understanding the business process, since that is going to be baked in to your data models.

On the other hand, you can go down the not so technical path and become a partner to the business. This route does not require you to be a coding expert. What is expected of you is to understand the business inside out and proactively come to the business with recommendations. You also have to be an artist. In most cases you will be building reports and dashboards to the business. If you build ugly reports, you will get ugly feedback. So make those fonts pop.

In short, you just joined a fun and exciting domain. Leverage your communication skills with your business and technical skills and you have a winning strategy.

--

--