Power BI dataflow (aka Common Data Model CDM previously) is a new feature inside Power BI which enables self-service data warehousing capabilities in Power BI. This is all about self-service data preparation (cleanse, aggregate, transform, integrate, refresh) inside Power BI. Before this, Power Query was there to handle your normal ETL process like data wrangling inside the Power BI. But if you noticed an existing power query capabilities are only available for limited data sets. You cannot connect to a source direct query and do those work in Power Query.
If you think about existing ETL processes in your organizations there are considerable limitations in the existing Power Query. One of them is once you have prepared data model other users in your organization couldn’t take advantage of your ETL work. When in contrast to that, the Power BI dataflows can do the ETL workloads in a centralized manner. So you can implement the dataflow in a workspace and share with other users in your organization. If someone says dataflow means Power Query for DW/Big Data workloads no one can 100% disagree with that. Because, once you implement a dataflow its resides inside Azure Data Lake (gen 2) which is designed for handling Big Data workloads. You have to connect to On-premises data sources via On-premises Data Gateway connector which also has been using to connect to on-premises data sources for existing power BI reporting.
Common Data Model (CDM)
Dataflow also includes standard schema called Common Data Model that contains most common business entities(tables) in various domains like sales, finance, marketing etc along with number of connectors you can plug with.
How to Get Started with Dataflow
It is predominantly implemented in Power BI service. You have to log in to the Power BI service account in order to create a dataflow. I’ll explain step-by-step how to create a dataflow and use it for reporting. Once you created a dataflow you can use your Power BI desktop application to create reports using the already created dataflow. Further, you can schedule the refresh times individually.
Steps to Create a Data Flow
1. Log in to Power BI service, go to your workspace and click the plus button to create a Data Flow as below.
2. Click Add New Entities button in order to create entities.
3. Just similar to the Power BI desktop, you can see a number of data sources to select.
In this demo, I’m selecting a SQL Database which is in my local computer. So select SQL Server Database.
4. You need to provide server access details and On-premises data gateway credentials. (You have to download and configure On-premises data gateway which is working as a data bridge between your On-premises server and Azure services like Power BI. )
5. You will notice the experience is very similar to the Power BI desktop. You can select the tables and preview it.
6. Once you moved to the next window you can see the more advanced features like Power Query.
7. After you clicked Done it will move to the below window. Which allow you to view the entities and settings like schedule re-fresh.
8. You can save the Dataflow with a name you preferred.
9. Once saved successfully you will get a notification and allow you to go to the refresh schedule.
10. It will move the Dataflows tab and you can provide preferred time to refresh the data.
How to Use Existing Data Flow to Create Reports
In this video, I’ve demonstrated how to use Power BI desktop in-order to consume the dataflow you created.
Hope you got some kind of high-level understanding about Power BI dataflow. Will talk about more advanced use cases like How to use the dataflow in Azure Data Lake store for other scenarios in the future post.
Blog courtesy: Nisal Mihiranga