// Azure Services vs. Roles in Data
Have you ever been overwelmed by all the different services in Azure and so confused that you don’t know what to choose and how to get started?
Here is an approach for demystifying all the data servicesm, where I try to guide you through the jungle and keep things simple.
The approach is based on roles in the team of data professionals:
- The Data Engineer
- The Data Scientist
- The IoT Engineer
- The Power BI developer
The Data Engineer
The Data Engineer is a must have in every data organisation and is the one who moves data from sources in the business to the data storage for further work. The role also makes sure to set up delta loads, claen up the data and implement first levels of the data modelling. The list is long and these are just examples of the work that this role is executing on the team.
The services that follow this role are:
- The Lakehouse concept supported and implemented with DataBricks.
- Notebooks in either DataBricks or Synapse - supported by the coding languages Python, .NET, SQL and Scala.
- Data Pipelines - either from Azure Data Factory or from Synapse Pipelines (same engine, different UI - and functionality for now).
- Spark job definitions - not everything can be done with triggers and pipelines in Synapse or ADF - sometimes you need to know your way around the job definitions inside Spark.
This role could also be needed to know the way around a SQL Server service in Azure. This depends on the organisaion around the data team. If there is found an infrastructure specialist and/or DBA, then you could argue that these roles are implementing and servicing the SQL server for you. If not, then of course the Data Engineer would be the next in line to service and handle the SQL server.
The Data Scientist
The Data Scientist is the one who does the ML/AI/DL work and through this supports the business with insights beyond the given data in the Data Warehouse. The work often includes a lot of data clensing and modelling, but with different tools than the Data Engineer. The roles is also to make pretotyping, POC’s and trial’n’error work to test out thesis’ and come up with ideas for new models. Again above is only examples of the roles’ responsibilities - not an exausting list.
The services that follow this role are:
- Azure ML Studio - to support the work of AI modelling and implementation in the business for end users to benefit from the workof the Data Scientist.
- Python code and R scripts - this is not a service it itself. But I do believe that it is mentionable as the Python and R code needs a place to be developed and executed.
- Notebooks in either DataBricks or Synapse - again as stated in the Data Engineer role - supported by the coding languages Python, .NET, SQL and Scala.
The data scientist is often mized in organisations to also be doing something (or everything) from the Data Engineer role. This then demands the person fulfilling this role, to span a lot more technology and from this, a lot more services and knowledge.
The IoT Engineer
The IoT Engineer is the role you add to the team, if you are dealing with IoT data, streaming data, timeseries data or log data (perhaps just different words for the same). The role is to make sure that the data from IoT services (and other sources) are maintained in a proper way and made available to the business as needed. The role is also doing streaming analytics tasks - ex. detecting outliers or driving deep analytics in runtime.
The services that follow this role are:
- Streaming Analytics - to leverage the work with “live” analytics on the data being streamed into the Kusto cluster. This is help with ex. outlier detection and other run-time analytics.
- Kusto / Azure Data Explorer / Synapse Data Explorer - the service that stores and processes the data. This service is cluster based and the query language is somewhat easy adoptable if you are used to writing T-SQL.
- Event hub / IoT Hub - this is the service(s) that leverages the ingestion of data from the devices or external services.
The Power BI developer
The Power BI developer is also a must-have in every data organissation. This role has the reponsibility to develop end user elements like data models, reports, metrics etc. This role can also be accountable for the governance inside the Power BI portal - maintennance of workspaces, naming conventions etc.
The services that folow this role are:
- Power BI Dataflows - the service from Microsoft which makes it possible to build data ingestion directly from the Power BI portal and then make this data available to end users or other tech-people who then builds reports based on this data
- Power BI Datamarts - the service that leverages the data wrangling to a SQL server endpoint and automatically builds a datamodel based on the entities from the wrangling. Data can be exposed directly from the SQL server endpoint. The services does not demand knowledge of the SQL language.
- Power BI - the front end service. Both on the Power BI Desktop version and the Power BI service in Azure - this exposes the reports, metrics, dashboards and other artifacts build by the Power BI developer. This is also the main service the end users will access to gain value from the data platform.
All in all a list of 4 main roles in the data organisation - some of them are not needed for every team and some of them are must-haves in every team.
Rember to sign up to the next blog post in the form below.
Happy coding ☕