Knight's Microsoft Business Intelligence Hour Trainer: Leveraging Microsoft SQL Server Integration, Analysis, and Reporting Services with Excel and. and was an author on the books Knight's Hour Trainer: Microsoft SQL Server. Integration Services, Knight's Microsoft Business Intelligence Hour Trainer, and SharePoint .. instructions if you learn better that way. Either way if you. i data Warehousing and Business intelligence . Knight's Microsoft® Business intelligence hour trainer: Leveraging Microsoft sQL .. requirements and hints and begin coding, or you can read the step-by-step instructions if you learn.
|Language:||English, Spanish, Portuguese|
|Genre:||Health & Fitness|
|ePub File Size:||21.89 MB|
|PDF File Size:||9.46 MB|
|Distribution:||Free* [*Regsitration Required]|
Knight's Microsoft® Business Intelligence Hour Trainer: Leveraging Microsoft SQL Server® Integration, Analysis, and Reporting Services with Excel® and. Brian Knight, SQL Server MVP, is the founder of Pragmatic Works. Knight's Microsoft Business Intelligence Hour Trainer: Leveraging Excerpt 2: (PDF). Free Download Udemy Knight's Microsoft Business Intelligence Hour Trainer. With the help of this course you can Leveraging Microsoft SQL.
Select Destination from the drop-down list labeled Connection Manager. These are the opposite of what you want to accomplish with a great BI infrastructure. It is designed to cache the source metadata so that it does not have to hold a continuous connection to the data source. Many times documentation is not created for reports or the metadata underneath them, and this lack of documentation is a critical problem. Select the radio button labeled Delimited on the Flat File Format screen. Open and execute Lesson8Create.
By sending in errata you may save another reader hours of frustration and at the same time you will be helping us provide even higher quality information.
To find the errata page for this book, go to www. Then, on the book details page, click the Book Errata link. On this page you can view all errata that has been submitted for this book and posted by Wrox editors.
The forums are a Web-based system for you to post messages relating to Wrox books and related technologies and interact with other readers and technology users. The forums offer a subscription feature to e-mail you topics of interest of your choosing when new posts are made to the forums.
Complete the required information to join as well as any optional information you wish to provide and click Submit. You will receive an e-mail with information describing how to verify your account and complete the joining process. You can read messages in the forums without joining P2P but in order to post your own messages, you must join. Once you join, you can post new messages and respond to messages other users post.
You can read messages at any time on the Web. If you would like to have new messages from a particular forum e-mailed to you, click the Subscribe to this Forum icon by the forum name in the forum listing.
For more information about how to use the Wrox P2P, be sure to read the P2P FAQs for answers to questions about how the forum software works as well as many common questions specific to P2P and Wrox books. Fact Table Modeling 1 why Business intelligence? The largest challenges most organizations face around their data are probably mirrored in yours.
Challenges include: This spread causes data to be analyzed in different ways and metrics to be calculated inconsistently, which, in turn, leads to unpredictable analysis and inappropriate actions being taken based on the data. Many times documentation is not created for reports or the metadata underneath them, and this lack of documentation is a critical problem.
With reports coming from so many different places, you run into the same problems mentioned in the previous point. These challenges require more people to know different reporting systems, lead to more administrative headaches, and so on. These challenges cause many reporting teams to rework large portions of their reports several times as opposed to spending that time understanding what the users are asking for and delivering more actionable information. Business Intelligence BI is a term that encompasses the process of getting your data out of the disparate systems and into a unified model, so you can use the tools in the Microsoft BI stack to analyze, report, and mine the data.
Business Intelligence systems take this repetitive activity out of your life. Getting Intelligence from Data How do you get information from data?
First, you need to understand the difference. As you learned earlier, data can come from many different places, but information requires context and provides the basis for action and decision-making. Identifying your data, transforming it, and using the tools and techniques you learn from this book will enable you to provide actionable information out of the mass of data your organization stores.
There are several ways to transform your data into actionable information and each has its pros and cons. Typical solutions for reporting include a few different architectures: Many organizations have their own departmental reporting environments. This situation leads to a significant increase in licensing costs, since using different vendors for each department and reporting environment increases spending on hardware and software licensing, end-user training, and ramp-up time.
Some organizations find it easier to grant lots of individual users access to the data. This is not only dangerous from a security perspective, but likely to lead to performance problems, because users are not the most adept at creating their own queries in code. Also, with all the industry and federal compliance and regulation governing data access, widespread access can quickly lead to a security audit failure, especially in a publicly held company.
When teams seek out and apply different strategies, it exacerbates the original problem of data being all over the organization. It may be nice to get the automated reports and they likely serve a purpose, but they usually cannot be counted on to run an enterprise. These are the opposite of what you want to accomplish with a great BI infrastructure.
BI to the Rescue A well-thought-out BI strategy will mitigate the problems inherent to each of the previously listed approaches. A good BI approach should provide the targeted departmental reporting that is required by those end users while adjusting the data so it can be consumed by executives through a consolidated set of reports, ad hoc analysis using Excel, or a SharePoint dashboard.
Business Intelligence provides a combination of automated reporting, dashboard capabilities, and ad hoc capabilities that will propel your organization forward. BI provides a single source of truth that can make meetings and discussions immediately more productive. Business Intelligence standardizes organizational calculations, while still giving you the flexibility to add your own and enhance the company standard. These capabilities allow everyone to speak the same language when it comes to company metrics and to the way the data should be measured across the enterprise or department.
Using a combination of a data warehouse and BI analytics from Analysis Services and Excel, you can also perform in-depth data mining against your data. This enables you to utilize forecasting, data-cluster analysis, fraud detection, and other great approaches to analyze and forecast actions.
Data mining is incredibly useful for things like analyzing sales trends, detecting credit fraud, and filling in empty values based on historical analysis. This powerful capability is delivered right through Excel, using Analysis Services for the back-end modeling and mining engine. We provide more details on that shortly, but the most important bit of information you should take away right now is that the cost of managing multiple products and versions of reporting solutions to meet departmental needs is always higher than the cost of a cohesive strategy that employs one effective licensing policy from a single vendor.
When organizations cannot agree or get together on their data strategy, you need to bring them together for the good of the organization. Realizing the degree value of that approach and seeing the value it can have in your organization are the two most important first steps. SQL Server Enterprise includes industry-leading software components to build, implement, and maintain your BI infrastructure. SharePoint is at the top as the most end user—facing program for reporting, dashboard, and analytic capabilities.
On the next level down you see the more common end user tools, and continuing down you can see the development tools, the core components, and some of the multitude of potential data sources you can consume with the products we discuss in this book.
To see this at a glance, review Table Excel 1. Excel 2. Reporting Services 2. Report Builder 2. SharePoint Dashboards 3. Reporting Services 4. SharePoint Dashboards try it Your Try It for this lesson is a bit different than most others in the book. Throughout the book you will be challenged with hands-on tasks to enhance your understanding.
For this lesson, your Try It is to make sure the things you learned in this lesson are in your mind as you learn the technologies to apply them. For instance, ask yourself these questions as you go through the rest of the book. As this chapter is just an introductory overview, it does not have an accompanying video. In this lesson you will learn what makes a dimensional model different and then have the opportunity to convert a simple model yourself.
This is to enhance the quick insertion and retrieval of data. The goal in designing a data warehouse or star schema is to denormalize the model in order to simplify it and to provide wider, more straightforward tables for joining and data-retrieval speed. Why do you need to do this denormalization in order to report on your data, you may ask? The largest reason is that you need to consolidate some of the redundancy between tables.
Consolidating redundancy will put the database into a star schema layout, which has a central fact table surrounded by a layer of dimension tables, as shown in Figure As you can see in Figure , we have abstracted out tables such as DimProduct, DimCustomer, DimPromotion, and DimDate and put the additive and aggregative data, like sales amounts, costs, and so on into a single fact table, FactInternetSales more on fact tables in Lesson 3; for now focus on the dimensions.
This abstraction allows you to implement a number of important elements that will provide great design patterns for dealing with the challenges discussed later in this chapter. Some of these situations are: How can you efficiently handle that situation in a highly normalized fashion? This would involve multiple tables and require some complicated updates.
We will go over this later in the lesson. How are you going to handle multiple calendars with multiple relationships to different tables if the calendars change? For instance, your fiscal calendar or accounting periods may change from year to year, but will need to maintain relationships to the previous methods for historical and trend analysis. More on this later. Some systems will have alphanumeric keys, and others will have integer or composite keys.
Integrating different key types is a real challenge that often leads to hybrid, system-crossing keys kept in sync with some complicated ETL. Hybrid keys are not necessary in the dimensional model because you abstract all of that. For instance, in the example we discuss in this lesson the customer dimension has a record of all the customers and their historically accurate information based on the dates for which the record was accurate.
These are added to support the new key structure put in place. There will be more on how to do this shortly. This structure provides portability for the warehouse and the ability to integrate numerous systems despite their key differences. How Does Dimensional Modeling Work? Before you try some dimensional modeling for yourself, we want to show you an example.
For our example, we use the AdventureWorksR2 sample databases from Microsoft available at www. We create a simple start schema from the reseller sales information in the OLTP version of the database. The tables we use from OLTP will be as follows: First, take notice of the differences and key elements in Figures and New CustomerKey column to provide SurrogateKey. We are using a new column we created to provide the primary key for this new dimension table.
We have modified the primary key column that is coming over from the source OLTP system to act as the alternate key. All this means is that if we need to bring in data from several systems whose primary keys have overlapped or are in different formats, we can do it with a combination of our alternate or business key and our surrogate key CustomerKey.
Much of the demographic data and store sales data was also tapped to get columns like DateFirstPurchase and CommuteDistance so you can find out more about your customers. Some of these columns could be calculated in the ETL portion of your processing by comparing information like a work and home address, for example.
For instance, now if you refer to multiple customers in a single order, you need only one customer dimension with a fact table row that has multiple key relationships to the customer table. This is much better than having a bill-to customer table and a ship-to customer table to handle subsidiaries or other issues.
This requires multiple links to a product table for multiple columns; instead we can link directly to DimDate for these values with our numeric surrogate keys. Remember, these keys keep all the tables in the warehouse linked as your new key system. You can see the StartDate and EndDate columns and how they control the historical loading. These columns allow you to expire a row when a historical change is required. For instance, when a product line gets a new account manager, you would expire the current product line row and insert into the dimension table a new row with an EndDate of null that links to the new product manager.
Otherwise, historical reporting could mistakenly tie sales to the wrong manager. There are three main types of slowly changing dimensions: Same as Type II, but only tracks a certain number of revisions It is common to have columns from each type in the same table; for instance, if you need to track history on last names for employees, but not on their addresses, you may have a Type II LastName column and a Type I Address column.
This is perfectly acceptable and common. This design has also been proven to improve performance significantly since the main goal of a data warehouse or BI system is to extract data as quickly as possible.
The more denormalized type of this model lends itself to the quick retrieval of data from the tables to serve to populate a cube, run a report, or load data into Excel. Here are some general design tips for working with your dimension tables: For now, you have to trust us.
This allows for better sorting of dimension and fact tables. Lesson Requirements The columns you put in your table are up to you, but your dimension will need to track history. Also, the dimension table will be getting data from other sources, so it will need to be able to handle that.
The first thing you should do is identify some columns you might want in your table. Table has a number of standard product dimension columns that you can pick from. Now, in order to make these into a proper dimension table, you need to review your requirements.
Your first requirement was to make sure you can track history, so you need to make sure you have a StartDate and EndDate column so you can expire rows as they become updated.
Your next requirement was to make sure the dimension table could handle data from multiple systems either now or in the future, which means you need to apply the best practice you learned about surrogate keys. The fi nished product should look something like Figure Congratulations, you have just designed your fi rst dimension table.
Great job! Please select Lesson 2 on the DVD to view the video that accompanies this lesson. It consists of two primary types of data: Numeric data is more common in financial or inventory situations. You need fact tables because they allow you to link the denormalized versions of the dimension tables and provide a largely, if not completely, numeric table for Analysis Services to consume and aggregate. In other words, the fact table is the part of the model that holds the dollars or count type of data that you would want to see rolled up by year, grouped by category, or so forth.
Since many OLAP tools, like Analysis Services, look for a star schema model and are optimized to work with it, the fact table is a critical piece of the puzzle. The process of designing your fact table will take several steps: Decide on the data you want to analyze. Will this be sales data, inventory data, or fi nancial data?
Each type comes with its own design specifics. This means that you need to decide on the lowest level of analysis you want to perform. For example, each row may represent a line item on a receipt, a total amount for a receipt, or the status of a particular product in inventory for a particular day. Decide how you will load the fact table more on this in Lesson 9. Transactions are loaded at intervals to show what is in the OLTP system.
An inventory or snapshot fact table will load all the rows of inventory or snapshot-style data for the day, always allowing the user to see the current status and information based on the date the information was loaded. Fact tables are often designed to be index light, meaning that indexes should be placed only to support reporting and cube processing that is happening directly on that table.
It is a good idea to remember that your fact tables will often be much larger in row count and data volume than your dimensions. This means you can apply several strategies to manage the tables and improve your performance and scalability. Table partitioning can significantly help your management of this type of larger fact table.
Implementing a sliding-window partitioning scheme, where you roll off old partitions and roll on new ones periodically, can drive IO and query times down and access speeds up since you will be accessing only the specific areas of data on the disk that are needed for your query.
Processing speeds for data loading and cube processing will also be faster since the new versions of SQL Server and R2 allow for increased parallelization of queries across partitions. The details of these are out of the scope of this lesson, but you should investigate using these in your data warehouse, because of the significant performance and disk space improvements you will see. Compression will deliver a decreased amount of IO for queries and a lower total amount of IO will be needed to satisfy them, resulting in faster, more nimble tables, even though some fact tables will have millions of rows.
There is some debate over this one, with hardcore performance purists focused on the minimal amount of overhead that the key constraints can employ during a load process or heavy-duty reporting. For this minimal overhead, however, Analysis Services will optimize its processing based on the fact that it can assume these keys are keeping the data valid and disable some of the internal checks built into the processing algorithm. This will allow Analysis Services to consume the data from your fact and dimension tables much faster than if it did not have those keys.
To complete the lesson you need to build the table, and include any valuable cost, sales, and count fact columns, along with the key columns for the important dimensions. Next are a couple of hints to get you started. Make sure you identify the dimensions that are important to your analysis and then include references to those surrogate key columns in your fact table. Check out the example in Figure Yours should look similar. It consists of a myriad of diverse components, each of which can perform a multitude of operations.
SSIS is a tool that can be used to construct high-performance workflow processes, which include, but are not limited to, the extraction, transformation, and loading of a data warehouse ETL. In this section we demystify some of the key components of SSIS. We focus primarily on those that are essential in the development of most SSIS packages. This tool allows you to develop packages that can: If you want to improve the way the application starts you can add a startup parameter to the application shortcut.
Close and reopen BIDS. You should notice that the splash screen that appeared when you initially opened BIDS no longer appears. Select Integration Services Project from the Template pane. Before proceeding we should provide a brief explanation of solutions and projects. A solution is a container of projects. One solution can include several Integration Services projects, but a solution can also contain other types of projects.
For example, you can also include a Reporting Services project in the same solution as the Integration Services project. An Integration Services project, on the other hand, is a container of packages. The project will typically contain several data sources and SSIS packages. As a result, you can create a project within a solution that is specific to the task that you are trying to accomplish or complete. In addition to creating solutions and projects and designing packages, BIDS will allow you to save copies of the packages to an instance of SQL Server or the file system, create a deployment utility that can install multiple packages and any dependencies to an instance of SQL Server or the file system at once, and also provide multiple tools that can assist in monitoring and troubleshooting your packages.
You can view the package inside the Solution Explorer. The Solution Explorer will also show the projects and solution. Then select Project and Solutions in the navigation pane of the Option screen that appears.
In the later sections of this book we will explain how to create data sources and packages. However, note that any data sources created in the Solution Explorer can be used by any packages created within the same project. The next tab is Data Flow, which is where the actual movement and cleansing of data will take place.
Then there is the Event Handlers tab, which can be used to perform actions like those that can be performed on the Data Flow and Control Flow tabs. Event handlers can be constructed for the entire package or for a specific task in the package on a specific tab.
Each of the three aforementioned tabs is accompanied by a corresponding toolbox. This tab provides a tree view that groups and lists each item. Since a package can contain many tasks, Connection Managers and other items, this summarized view makes it easier to navigate, locate, and configure items within a specific package.
In addition to the four tabs, the designer also includes the Toolbox and Connection Manager sections. Each tab has a corresponding toolbox, except the Package Explorer tab. The selected tab will determine the contents of the toolbox. Regardless of the active tab, each task can be dragged onto the design surface and configured as needed.
When the Control Flow tab is active, the toolbox will contain two types of tasks: If the Data Flow tab is active it will contain source, transformation, and destination tasks. You can design workflows on the Event Handlers design surface just as you would on the Control Flow and Data Flow tabs using the toolbox that corresponds to the Event Handlers tab.
The final section, the Connection Manager, allows you to create connections to various types of data sources. The difference between connections created in the Connection Manager and those created in the Solution Explorer under the Data Sources folder is that the connections created in the Connection Manager section are exclusive to a specific package.
In other words, the connections created in the Connection Manager can be used only within the containing package. Therefore, if you realize you are creating the same connection over and over again within different packages you may want to create it under the Data Sources folder in the Solution Explorer. That way it can be reused by every package within your project. If you look toward the bottom of the Toolbox section you will notice two tabs. One is labeled Toolbox and the other Variables.
When you click the Variables tab, a list of all the system variables and user-defined variables will appear. If they do not, click the system variables icon and the show all variables icon this icon appears blue. This will cause all the variables to appear. We will show you how to create user-defined variables in Lesson 5 of the book.
No matter the type of variable, system or user-defined, it can be used in the package to hold data, assist in control flow, and audit changes, among other things. In later lessons of this book, we will introduce additional elements that will assist in the development of these packages. As mentioned earlier, Control Flow orchestrates the entire workflow of the package. It contains three elements: The containers add structure and order to the package workflow. These tasks primarily consist of loop and sequence containers.
The precedence constraints connect and control the flow of tasks from one to the other. The Data Flow tab provides a different type of functionality from the Control Flow tab. It becomes active only once a Data Flow task is added to the design surface of the Control Flow tab. The Data Flow tab contains three types of tasks, which are labeled as follows: NET, and others. The data flow transformations are tasks that can assist you in transforming, massaging, or cleansing the data so that it meets your business requirements.
For example, you may need to add a column to a result set that contains the first and last name, as opposed to having them in two separate fields. In this case, you could use a Derived Column task that will take advantage of the expression language built into SSIS.
The final type of task, data flow destinations, contains tasks that allow you to load data into various data sources. Much like data flow sources, the destinations consist of ADO. Once you have implemented all the necessary containers, tasks, and precedent constraints, on both the control and data flows, you can debug your package by executing it in BIDS. You can execute a package several different ways in BIDS.
One way to run a package is to press F5 on your keyboard another is to click the green icon on the toolbar. Finally, you can right-click the package in the Solution Explorer and select Execute Package. Any way will run the package. If a warning occurs it will be designated with the following icon. If an error occurs there will be at a minimum two statements indicating the failure. The fi rst will provide the details of the error, and the second usually just states that the task has failed.
The results will be similar to what is shown in Figure Once these are complete, you will create an extract fi le that contains a list of the name of every person from the Person. Person table in the AdventureWorks database. Person table. The download for this lesson available at www. To extract data from a database you need to: Accept the default for the text box labeled Location.
Accept the default for the drop-down list labeled Solution. Click OK. If Package. Ensuring that the Control Flow tab is active, drag a Data Flow tab onto the design surface. Right-click the table and select Rename from the context menu. Type Extract Person Data. Click the New button. Double-click the Data Flow item and now the Data Flow tab is active.
Person Click OK. Drag a Derived Column task onto the design surface. Rename the derived column Add Full Name. In the lower section of the screen, select from the column Derived Column. Drag a Flat File Destination task onto the design surface. Drag the green arrow from the derived column onto the Flat File Destination task. Select Delimited from the Flat File Format screen that appears.
Then the Flat File Connection Manager will appear. Click Browse and select a directory on your computer where the flat file will reside. Name the file FullName. Accept all the defaults and click OK. Ensure that each column is mapped accordingly, for example, FirstName to FirstName.
Right-click the package in the Solution Explorer and select Execute Package. When the package is complete it will resemble the screen shown in Figure If you click the Progress tab you can view each individual step that was executed during the package run. Click the square blue button on the menu bar to stop the package.
Browse to the location on your computer that you specified in Step Open the FullName. Please select Lesson 4 on the DVD to view the video that accompanies this lesson.
As mentioned in Lesson 4, each tab has a toolbox that contains items specific to that tab. You have three types of items available to you in the control flow. First, there are two types of items available in the Control Flow tab toolbox: One fi nal item available to you in the control flow, one not contained in the toolbox, is the precedence constraint. These items available to you in the control flow can be configured with a graphical user interface, with the exception of the Data Flow task.
These containers are as follows: In addition, you can drag multiple items within a container, including other containers. If there are multiple containers on the control flow design surface, all the items contained within the task must complete before any subsequent tasks will finish.
This is, of course, if they are linked together by precedence constraints. If they are not linked, they may execute in parallel. Each task has a graphical user interface GUI that can be used to configure the properties required to run each task.
You can access each GUI by double-clicking the task. The container may loop over a list of files and insert the contents of the files into a database. A precedence constraint will connect a minimum of two items, a task to a task, a task to a container, a container to a container or a container to a task. The precedence constraint will determine if the next task or container will be executed based on the state success, complete, fail of the constraint. Each item that is added to the design surface automatically includes a connector or precedence constraint.
These constraints relate and connect the items to each other. One thing to note is that the constraint defaults to a successful completion. For example, if you want the Foreach Loop to execute if the Script task completed and success is not a factor, you should select Completion from the Value drop-down list. Connection Manager Since many of the tasks and at least one of the containers may require a connection to a data source, you must create a connection manager to a source that is required by any of the tasks included in the control flow.
However, you can create a connection to a data source in the Solution Explorer. The connection can be used by all packages contained within the same project. If you suspect that your control flow will contain several small sets of actions or any repeating actions, then the first step is to add a container to the design surface.
This is an optional step in the control flow design because some control flows contain only one task. As a result, adding a container does not serve much of a purpose. The next step is to add a task either on the design surface or to a container that already exists on the design surface. Once you have added and configured all your containers and tasks, the final step is to connect them using precedence constraints.
If the variable that is specified in the expression is not greater than zero, the next step in the workflow will not execute. EmployeePayHistory table. You will also need to know the name of your SQL Server.
To complete this lesson you will create an SSIS package. The package will loop through the contents of a table in the AdventureWorks database.
As it is looping, the package will update a column in each row. The download for this lesson, available at www. Right-click the connection in the Connection Manager and select Rename. Change the name of the connection to AdventureWorks Ensuring that you click the white space in the Control Flow designer space, right-click and select Variables.
Select Integration Services Project. Name the project BI Trainer Project 5. Ensure that the Control Flow tab is active. Click the item in the Variable menu labeled New Variable. Type results for the variable name and change the data type to Object. Click the toolbox tab, then drag an Execute SQL task onto the design surface. Select New Variable from the Variable menu again. Type businessEntityID for the variable name and accept Int32 as the data type.
Then click in the text box labeled Connection. This will activate the drop-down list. Select AdventureWorks from the drop-down list.
Click in the text box labeled SQLStatement and ellipses will appear. Click the ellipses. EmployeePayHistory Click the Result Set Item in the left navigation pane.
Click Add. Type 0 in the Result Name column. In the Variable Name column, select User:: Drag the Foreach Loop container onto the Control Flow design surface. Double-click the Foreach Loop Container. Click the Collection item in the left navigation pane. Choose User:: Select Rows in the first table radio button.
Click the Variable Mappings item in the left navigation pane. Choose the User:: Type 0 in the Index column. Double-click the Execute SQL task inside the container. Select AdventureWorks from the Connection drop-down list. Click the ellipses next to the checkbox labeled SQL Statement.
Click the Parameter Mapping item in the left navigation pane.
In the Parameter Name column, type 0. See Figure for an example. Right-click your package in the Solution Explorer and select Execute. The Execute SQL task outside of the container will immediately turn green, and the one on the inside will blink from yellow to green, eventually turning green. Once this occurs, the package has successfully completed.
Please select Lesson 5 on the DVD to view the video that accompanies this lesson. In this lesson you will focus on designing and implementing processes that extract data from various sources, then transform or cleanse the data, and fi nally load the data into a destination source. The data flow toolbox exposes several data sources, transformation tasks, and destination sources that may play a role in the architecture of your data warehouse ETL process.
Each of the aforementioned items can be connected to each other by various paths. These paths connect the inputs and outputs of the components on the data flow design surface.
Initially, the Data Flow tab resembles Figure You can click the hyperlink on the Data Flow tab or you can activate the Control Flow tab and drag a Data Flow task onto the control flow design surface. No matter which approach you choose, a Data Flow task will be added to the design surface of the control flow. Unlike with the other tasks added to the design surface of the control flow, double-clicking the Data Flow task will activate the corresponding Data Flow tab.
Multiple Data Flow tasks can be placed on the control flow design surface. If you do this you can switch back and forth between the data flows by selecting the data flow of choice from the drop-down list labeled Data Flow Task, which is located above the data flow design surface. If the connection manager was created earlier, when you were designing the control, it can be reused on the design flow. NET data sources.
If you are connecting to a SQL Server or Oracle database, you can extract directly from a table or view or you can use a query to source the data.
NET source. If you have not created a connection manager, you will need to create one. However, this choice could detrimentally affect the performance of your SSIS task. All the data from the selected table will be returned and a client-side filter will be performed. As with most of the configurable properties of an SSIS task, you can use a variable in place of hard-coded text.
For example, if you want to source the query or table dynamically based on some condition in the work flow, you can assign the variable a value. Then you can use that variable as the value for your table or query. One thing to note is that if your expression syntax is incorrect, the expression will turn red. If you hover over the expression or click OK, you will be provided with detailed information about the error. Once the error is corrected, if any existed in the first place, click OK.
Data Flow Transformations Once you have created a data source, it is time to perform any transformations that may be needed on the data.
SSIS provides several tasks to assist in cleansing the data before it is loaded into its final destination. Other tasks can split the data based on a condition or look data up based on matching keys between two data sources. The purpose of this task is either to create a new column in the existing data set or to replace an existing column.
To configure this task, you must first drag a connection from a data source onto the task. For example, assume that FirstName and LastName are two columns available in the tree. If you wanted to add a FullName column whose value was a concatenation of the two columns, with FirstName first and LastName last and a space between the two, you would type the following expression in the column labeled Expression in the lower pane.
For example, if you wanted to load only rows that meet a certain condition, you could use a Conditional Split task. Or if you wanted to load the data into multiple destinations you could use a Multicast task. Data Flow Paths You can connect the items that have been added to the Data Flow designer by using the data flow paths that protrude from each element.
At a minimum each item will have a success path green arrow and an error path red arrow. To use these paths, simply drag one from an item onto the next item you want to execute. At that point, the items will execute in the sequence that you have specified. In some cases, you can customize the output of a data flow path. For example, with the Lookup task you can specify what if anything is output from the error flow path.
Two additional columns are added to the output columns. If any rows do not match, the component execution will fail. Any rows that do not match will be sent down this path.
Data Flow Designer The data flow provides you with the tools needed to build an effective and efficient data cleansing or ETL process. The typical steps involved in designing the data flow are as follows: Add a data flow source to the design surface.
Do not forget to add a connection manager. Add data flow destination s. Add as many transformation tasks as are required to meet your business needs. Connect the data flow source to the transformation tasks and perform any additional connections using the data flow paths exposed by each source and transformation. Connect the final transformations to the destinations using the data flow paths.
The five preceding steps are typical of most data flows for building a data warehouse ETL process. Additional sources, transformations, and destinations may be included as you determine your needs. Then you will learn to use a Lookup task to determine the last time each employee received a pay increase.
If the person does not have any pay data, you will add a Derived Column task to the data flow. The task will add two columns to the output set. On the other hand, if the person does have pay data and has been working for more than 10 years, you will increase the pay rate by 10 percent. Finally, you will export the data to a flat file data source. Lesson Requirements To complete this lesson you will need the AdventureWorks database, which can be downloaded from www.
Employee table. Right-click the connection in the connection manager and select Rename. Name the project BI Trainer Packages 6. Drag a Data Flow task onto the control flow design surface.
Double-click the Data Flow task or click the Data Flow tab. Employee Drag a Lookup task onto the design surface. Click Connection in the left window pane. Paste the following query into the text box below that radio button: Double-click the Lookup task. Click Columns in the left window pane. In the list of columns labeled Available Lookup columns, ensure that only the RateChangeDate and Rate columns are checked.
Double-click the Derived Column task. In the lower pane, in the column labeled Derived Column Name, type Rate. In the column labeled Derived Column, select. Drag a Union All transformation onto the design surface.
Drag the green arrow from the Derived Column task onto the Union All task. Add a Conditional Split task onto the design surface. Drag the green arrow from the Union All task onto the Conditional Split. Double-click the Conditional Split task.
Then type the following expression in the column labeled Condition: Drag the green arrow from the Conditional Split task onto the newly added Derived Column task. Double-click the derived column. Drag a flat file destination onto the design surface. Drag the green data flow path onto the flat file destination.
Click Mapping in the left pane. Double-click the flat file destination. Click New. Select the radio button labeled Delimited on the Flat File Format screen. Browse to the C: Please select Lesson 6 on the DVD to view the video that accompanies this lesson.
You may find yourself implementing the same solution over and over again, for example, moving data from one database to another.
You would typically do this by using the data flow, as in Figure The data flow source is connected to a data flow destination, which is where the data will be exported. You may place some data flow transformations between the source and destination, but ultimately the goal is the same, moving data from one data source to another.
You may also encounter a situation in which you need to import data from multiple text fi les into one single destination. You would typically do this by using a Foreach Loop container on the control flow, as in Figure Select Foreach File Enumerator in the drop-down list labeled Enumerator. Then specify a file path in the text box labeled Folder. Next, click Variable Mappings in the left navigation pane. When configuring a Foreach Loop container that will move through a list of files, you typically declare a variable that will hold each file name as the container loops through the list.
Select that variable from the column labeled Variable and accept the value of 0 for the column labeled Index. Click OK and the container is configured. Finally, you may be asked to move and rename the file s after the data has been imported.
This is often a method that is used to ensure that the data is not imported more than once. By double-clicking the File System task, you can configure source and destination locations.
You can also specify whether you want to move, copy, or rename the file. To use the task, you should begin by deciding on the operation you want to perform. Do this by selecting the choice from the drop-down list labeled Operation. If you are trying to move and rename a file in one step, choose the Rename file option. Then you will decide on the destination and source locations. As the package loops over each fi le, it uses a data flow to import the data into a database and fi nally moves the fi le to a new location.
Then you should create a folder on your C: In that folder create two folders, one named DataFiles and the other Archive. Step-by-Step Open and execute Lesson7Create. Name the project BI Trainer Packages 7. Right-click the control flow designer and select variables from the context menu.
Add a new string variable named DestinationFileLocation. Set the value to C: Add a new string variable named SourceFileLocation. Add a new string variable named SourceFileName. Drag a Foreach Loop container onto the control flow design surface.
Double-click the container. Select Collection from the left navigation pane. Ensure that Foreach File Enumerator is selected in the drop-down list labeled Enumerator. Click Variable Mappings in the left navigation pane. Select User: SourceFileName from the column labeled Variable. In the text box labeled Folder type or paste C: Accept the default of 0 from the column labeled Index.
Right-click in the Connections Manager section of the control flow. Select New File Connection. The File Connection Manager Editor will appear. Paste or type the following in the text box labeled File: Rename the file connection to Archive.
Right-click the file connection and select Properties. Locate Expression in the Properties window. Click the ellipsis.
In the Property column, select ConnectionString. Click the ellipsis in the column labeled Expression. Type or paste the following value in the text box labeled Expression: Drag a Data Flow task into the Foreach Loop container.
Activate the Data Flow tab. Drag a flat file source onto the data flow designer. Type or paste C: Click Columns in the left navigation pane. Click OK twice. Double-click the destination and select dbo. Click Mappings in the left navigation pane.
Activate the Control Flow data tab. Drag a File System task into the Foreach Loop container. Drag the precedence constraint from the Data Flow task onto the File System task. Double-click the File System task. Select Archive from the drop-down list labeled DestinationConnection. Select Source from the drop-down list labeled SourceConnection.
After executing the package, open C: You will notice that three files have been renamed to this location. Please select Lesson 7 on the DVD to view the video that accompanies this lesson.
Before SSIS, building the logic to load a dimension would require thousands of lines of code. However, because SSIS includes a Slowly Changing Dimension Wizard in the data flow, many of the challenges have been removed from designing and creating the extraction, transformation, and loading of your dimension table. One specific advantage of the task is that these types can be set on a column-by-column basis.
A Type 0 dimension, which is specified as a fi xed attribute, does not make any changes to the column. For example, you may want to ensure that all TaxIDs for a customer or vendor remain the same; therefore, you would set the column as a fi xed attribute. A Type 1 dimension changing attribute will update the value of a column, but does not track the history of the change. Finally, a Type 2 dimension historical dimension will track column data changes.
For example, if the discount percentage for a promotion changes, and it was specified as a Type 2 column, the original row expires, and a new row with the updated data is inserted into the dimension table.
If you are just starting to get a handle on Microsoft BI tools, this course provides you with the just the right amount of information to perform basic business analysis and reporting.
With this crucial resource, you will explore how this newest release serves as a powerful tool for performing extraction, transformation, and load operations ETL. A team of SQL Server experts deciphers this complex topic and provides detailed coverage of the new features of the product release. The authors present you with a new set of SISS best practices, based on years of real-world experience and it includes case studies and tutorial examples to illustrate advanced concepts and techniques.
Please note that we have removed the version number from the database names. If you already have a versioned database like the AdventureWorksR2 database though, this will work for the purposes of the demo. Threats Fundamentals. Learn Microsoft Flow. Get More Done with Trello: Online Project management tools.
Last updated Jan 1, Also Download.