środa, 17 października 2018

IDisposable monad (IUsable) - part 2

Previously I introduced IUsable<T> monad which can be used to compose computations in a safe way and guarantee that all the resources are disposed. In the examples there were many calls to AsUsableOnce<T> which were amplifying IDisposable types. In this post I will explain how we can add additional operator to simplify that code.

We will start with the same example as before. Please notice the presence of calls to AsUsableOnce<T>. They were necessary because the SelectMany operator expected Func<TOuter, IUsable<TInner>> selector and AsUsableOnce<T> was one of the way to provide required return type.

using System;

// NuGet
using Dapper;
using System.Data.SqlClient;
using Microsoft.Extensions.Logging.TraceSource;

namespace UsableRepositories
{
    public class OrdersRepository
    {
        const string deleteLines = "DELETE OrderLines WHERE OrderId = @Id";
        const string deleteOrder = "DELETE Orders WHERE OrderId = @Id";

        private readonly string connectionString;

        public OrdersRepository(string connectionString) =>
            this.connectionString = connectionString;

        public void Delete(Guid id) =>
            DeleteUsable(id).Value();

        public IUsable<int> DeleteUsable(Guid id) =>
            from trace in new TraceSourceScope("Execute: DeleteUsable").AsUsableOnece()
            from conn in new SqlConnection(connectionString).AsUsableOnece()
            from tran in conn.BeginTransaction().AsUsableOnece()
            select conn.Execute(deleteLines, id, tran)
                + conn.Execute(deleteOrder, id, tran);
    }
}

Knowing that IUsable<T> fits perfectly for the scenarios which use IDisposable resources we can optimize for this case. Just as we created SelectMany operator to work with IUsable<T> we can create a new operator dedicated to IDisposable. Since both operators have the same signature we need to put them in a separate classes.

public static class UsableDisposable
{
    public static IUsable<TResult> SelectMany<TOuter, TInner, TResult>(
        this IUsable<TOuter> outerUsable,
        Func<TOuter, TInner> innerDisposableSelector,
        Func<TOuter, TInner, TResult> resultSelector)
        where TInner : IDisposable
    {
        return new SelectManyDisposableUsable<TOuter, TInner, TResult>(
            outerUsable, innerDisposableSelector, resultSelector);
    }

    private class SelectManyDisposableUsable<TOuter, TInner, T> : IUsable<T>
        where TInner : IDisposable
    {
        private readonly IUsable<TOuter> source;
        private readonly Func<TOuter, TInner> collectionSelector;
        private readonly Func<TOuter, TInner, T> resultSelector;

        public SelectManyDisposableUsable(
            IUsable<TOuter> outerUsable,
            Func<TOuter, TInner> innerDisposableSelector,
            Func<TOuter, TInner, T> resultSelector)
        {
            this.source = outerUsable;
            this.collectionSelector = innerDisposableSelector;
            this.resultSelector = resultSelector;
        }

        public TResult Use<TResult>(Func<T, TResult> func)
        {
            return source.Use(outer =>
            {
                using (var inner = collectionSelector(outer))
                {
                    return func(resultSelector(outer, inner));
                }
            });
        }
    }
}

The biggest differences from the operator introduced in the previous post are the selector signature which accepts TInner directly and the where constraint on the type parameter. Just as before we had to introduce a type to hold the state (SelectManyDisposableUsable).

With this operator we can simplify original example to the following form:

using System;

// NuGet
using Dapper;
using System.Data.SqlClient;
using Microsoft.Extensions.Logging.TraceSource;

namespace UsableRepositories
{
    public class OrdersRepository
    {
        const string deleteLines = "DELETE OrderLines WHERE OrderId = @Id";
        const string deleteOrder = "DELETE Orders WHERE OrderId = @Id";

        private readonly string connectionString;

        public OrdersRepository(string connectionString) =>
            this.connectionString = connectionString;

        public void Delete(Guid id) =>
            DeleteUsable(id).Value();

        public IUsable<int> DeleteUsable(Guid id) =>
            from trace in new TraceSourceScope("Execute: DeleteUsable").AsUsableOnece()
            from conn in new SqlConnection(connectionString)
            from tran in conn.BeginTransaction()
            select conn.Execute(deleteLines, id, tran)
                + conn.Execute(deleteOrder, id, tran);
    }
}

There is still one call to AsUsableOnce<T> to enter the monad, but once we are there, we can compose it with IDisposable types without leaving it! Of course as long as we are in the monad we benefit from all its properties, which in this case is guaranteed disposal of all the resources.

What is a little surprising and amazing at the same time is that C# compiler can correctly pick the right extension method for each composition pattern. When you compose two usables, it selects one operator, but when you compose usable with IDisposable it selects another. There is one quirk when you have a class that implements IUsable<T> as compiler might still want to select the IDisposable version. This can be easily solved by adding as IUsable<T> suffix to force the right operator. This trick would work also if somebody decided to implement both interfaces on the same class.

Operators introduced in the previous and this post are by far the most important for working IUsable<T>. In the next post I will introduce other operators which can further simplify the code in specific scenarios.

References

  1. It is not overengineering - IDisposable monad (IUsable) - part 1
  2. Usable Extensions - Source code

IDisposable monad (IUsable) - part 1

Introduction

I've always said that C# is my favorite functional programming language. It is not a functional-first language so you have to be careful, but results are worth it as the FP code tends to be clearer and more maintainable. Recently, I discovered that there was a new book published on the topic of FP in C# - "Functional Programming in C#: How to write better C# code" by Enrico Buonanno. I learned about it from a newsletter listing recommended books on Safari Books Online. This was in fact a great recommendation!

Functional Programming in C#: How to write better C# code

I would recommend this book to everyone who writes software in C#. Even if it doesn't turn you into a FP enthusiast it will definitely change the way you think about certain problems and make you a better C# programmer. It is full of examples, patterns and references to complete source code hosted on GitHub.

The chapter I enjoyed the most was "Chapter 11. Lazy computations, continuations, and the beauty of monadic composition". I first learned about how to use monads in C# from an amazing, 13-part series of blogs by Eric Lippert. Enrico in his book offered slightly different approach. Instead of focusing on the theory, he prepared many examples that guide the reader though a series of problems and solutions to finally land on a beautifully composed program. One of these examples was Middleware<T> monad, which is similar to continuation, but adds behaviors both before and after a given function. This was new pattern to me and very intriguing!

Requirement to define behavior before and after some operation is very common. Every time you work with resources that have to be released (e.g. streams) you need to write the setup and teardown code. Middleware<T> delegate is defined as a higher order function which accepts a function to act on the resource/state (this pattern was first introduced in the book in section 1.5.1. "Encapsulating setup and teardown into a HOF"). When working with Middleware<T> callers specify only the function they want to apply to the resource, but they have no control over how the resource is created nor how it is released. This is very powerful pattern because you cannot get it wrong. No matter what you do, teardown code will be executed.

Another great thing about Middleware<T> is the fact that you can compose them to form data pipelines. For example you could combine "stopwatch" middleware with "open database connection" middleware and "open transaction scope" middleware to define pipeline for working with databases in your application. I recommend reading the book for other great examples.

Problem statement

In the book Middleware<T> was defined as the following delegate with a set of extension methods.

public delegate dynamic Middleware<T>(Func<T, dynamic> cont);

public static class Middleware
{
    public static T Run<T>(this Middleware<T> mw) => mw(t => t);

    // [...]

    public static Middleware<RR> SelectMany<T, R, RR>
        (this Middleware<T> @this, Func<T, Middleware<R>> f, Func<T, R, RR> project)
        => cont => @this(t => f(t)(r => cont(project(t, r))));
}

There is one important element in this definition worth discussing in detail - Middleware<T> accepts and returns dynamic! Why is it a big deal? Remember when I praised this pattern for making sure you cannot get it wrong? While it still guarantees that teardown code executes, with dynamic there are other traps you may fall into. Author also identified this as a problem:

The problem is that T (the input to the continuation) and R (its output) are not known at the same time. For example, suppose you want to create a Middleware instance from a function such as Connect, which has signature

public static R Connect<R>(ConnectionString connString, Func<SqlConnection, R> func)

The continuation accepted by Connect takes a SqlConnection as input, so we can use Connect to define a Middleware<SqlConnection>. That means the T type variable in Middleware<T> resolves to SqlConnection, but we don't yet know what the given continuation will yield, so we can't yet resolve the R type variable in Connect<R>.

Unfortunately, C# doesn't allow us to "partially apply" type variables; hence, dynamic. So although conceptually we're thinking of combining HOFs of this type

(T → R) → R

we’re modeling them as follows:

(T → dynamic) → dynamic

Later you'll see that you can still work with Middleware without compromising on type safety.

~Functional Programming in C#: How to write better C# code

After I read this section it stayed in my head for long. Although Enrico explained how to alleviate this problem with Run<T> method I couldn't stop thinking about it. C# doesn't allow us to "partially apply"... Does it mean that C# type system isn't flexible enough to define Middleware<T>? What a great riddle!

I decided to run some experiments and try to answer the following question:

Is it possible to define Middleware<T> in C# in a type-safe manner without falling back to dynamic?

Solution

I started by trying to understand the bigger picture of what problem Middleware<T> is supposed to solve. It defines resource of type <T> and accepts a function to apply to the resource. This function can return arbitrary type, which is unknown at the time when Middleware<T> is created. This coupling is the main problem. As Enrico correctly pointed out C# doesn't allow us to "partially apply" type so solving this problem will require splitting generic parameters <T, TResult> into two constructs. It is impossible to do it with a single delegate, but C# has much more to offer. After all, it is an Object Oriented language.

Interface

When I was looking for how to represent Middleware<T> as object I found an elegant solution which I will call IUsable<T> to separate it from the middleware.

public interface IUsable<T>
{
    TResult Use<TResult>(Func<T, TResult> func);
}

In its definition IUsable<T> has two type parameters: T for the resource type used on the interface and TResult for the result of computation used on the Use method. This separation is the key to the partial application of type parameters. When you create an instance of IUsable<T> you do need to specify one type parameter (partial) and when you want to use the resource with Use<TResult> you specify second type parameter. Pretty simple when you think about it.

For this interface to follow monadic composition pattern I also need to define few additional operators.

Amplification

First of all we do need a way to go from a regular value to IUsable<T> (amplify type). This can be achieved with the following extension methods AsUsable<T> and AsUsableOnce<T>. I decided to create both options because it is very common to use IDisposable resource just once. Since there is a disconnect between the time usable is instantiated and the time when the value is actually used I created specialized classes to keep the state.

public static partial class Usable
{
    public static IUsable<T> AsUsable<T>(this T value) =>
        new ValueUsable<T>(value);

    public static IUsable<T> AsUsableOnece<T>(this T value)
        where T : IDisposable =>
        new UsableOnce<T>(value);

    internal class ValueUsable<T> : IUsable<T>
    {
        private readonly T value;

        public ValueUsable(T value) =>
            this.value = value;

        public TResult Use<TResult>(Func<T, TResult> func) =>
            func(this.value);
    }

    internal class UsableOnce<T> : IUsable<T>
        where T : IDisposable
    {
        private readonly T value;

        public UsableOnce(T value) =>
            this.value = value;

        public TResult Use<TResult>(Func<T, TResult> func)
        {
            using (this.value)
            {
                return func(value);
            }
        }
    }
}

Composition

Second, we need a way to compose two usables. Since this post is about C# I will demonstrate implementation of SelectMany with 3 type parameters. Although it is not the signature of the canonical bind operator, this is the code you want in your application as this method unlocks the power of composition with LINQ expression language. Similarly to the amplification section, here I also decided to create a specialized inner class to save usable and selector delegates for the time when usable is used (lazy evaluation).

public static partial class Usable
{
    public static IUsable<TResult> SelectMany<TOuter, TInner, TResult>(
        this IUsable<TOuter> outerUsable,
        Func<TOuter, IUsable<TInner>> innerUsableSelector,
        Func<TOuter, TInner, TResult> resultSelector)
    {
        return new SelectManyUsable<TOuter, TInner, TResult>(
            outerUsable, innerUsableSelector, resultSelector);
    }

    internal class SelectManyUsable<TOuter, TInner, T> : IUsable<T>
    {
        private readonly IUsable<TOuter> outerUsable;
        private readonly Func<TOuter, IUsable<TInner>> innerUsableSelector;
        private readonly Func<TOuter, TInner, T> resultSelector;

        public SelectManyUsable(
            IUsable<TOuter> outerUsable,
            Func<TOuter, IUsable<TInner>> innerUsableSelector,
            Func<TOuter, TInner, T> resultSelector)
        {
            this.outerUsable = outerUsable;
            this.innerUsableSelector = innerUsableSelector;
            this.resultSelector = resultSelector;
        }

        public TResult Use<TResult>(Func<T, TResult> func)
        {
            return outerUsable.Use(outerScope =>
            {
                return innerUsableSelector(outerScope).Use(innerScope =>
                {
                    return func(resultSelector(outerScope, innerScope));
                });
            });
        }
    }
}

Extraction

Finally we need a way to extract the value from the usable (exit monad). One natural option is actually use the usable and calculate result. I decided, however, to add helper method to easily extract the value from usable. Please notice that it applies identity function and returns resource outside of the Use<TResult> scope. This is potentially risky behavior as one could try to extract disposable resource just to discover that it already has been disposed.

public static partial class Usable
{
    public static T Value<T>(this IUsable<T> usable)
    {
        return usable.Use(value => value);
    }
}

That's it! These are the elements we need to replicate behavior of Middleware<T> in a type-safe way. Now, let's see how to use it all in an application.

Example

To demonstrate how to use the IUsable<T> I will use the same example that was introduced in the book. It shows how to build a repository (Repository pattern) to access Orders stored in SQL database. We will focus on the Delete method which should have the following properties:

  • Delete operation executes in the TraceSourceScope to keep the logs clean.
  • When we delete orders we also need to delete order lines within the same transaction.
  • All disposable resources are released.

The following code satisfies these requirements. Please notice how DeleteUsable was implemented. It is a single LINQ expression which combines trace, connection and transaction to create a right scope for executing SQL statements. All the resources are amplified to IUsable<T> with AsUsableOnce<T> which guarantees that they are going to be released. The computation in select combines two steps. Finally, there is Delete method to simplify the interface in case callers don't need to compose delete operation with other operations in the system.

In the next post I will show how we can introduce new SelectMany operator to remove AsUsableOnce calls.

using System;

// NuGet
using Dapper;
using System.Data.SqlClient;
using Microsoft.Extensions.Logging.TraceSource;

namespace UsableRepositories
{
    public class OrdersRepository
    {
        const string deleteLines = "DELETE OrderLines WHERE OrderId = @Id";
        const string deleteOrder = "DELETE Orders WHERE OrderId = @Id";

        private readonly string connectionString;

        public OrdersRepository(string connectionString) =>
            this.connectionString = connectionString;

        public void Delete(Guid id) =>
            DeleteUsable(id).Value();

        public IUsable<int> DeleteUsable(Guid id) =>
            from trace in new TraceSourceScope("Execute: DeleteUsable").AsUsableOnece()
            from conn in new SqlConnection(connectionString).AsUsableOnece()
            from tran in conn.BeginTransaction().AsUsableOnece()
            select conn.Execute(deleteLines, id, tran)
                + conn.Execute(deleteOrder, id, tran);
    }
}

Performance analysis

We've seen how IUsable<T> can help with writing succinct code with, but did we have to sacrifice performance to achieve it? To investigate the impact on performance I prepared a benchmark which compares classical using statement, Middleware<T> and IUsable<T> in a typical scenario where you want to execute some operation in the context of 3 disposable resources. I selected TraceSourceScope as the resource because it is super fast, thus, the benchmark should highlight the cost of composition rather than the costs of acquiring and releasing resources.

Benchmark was defined with BenchmarkDotNet, which is a great library/runner for .NET.

public class UsableBenchmark
{
    [Params(1000)]
    public int N;

    [Benchmark]
    public void UsingStatement()
    {
        using (var s1 = new TraceSourceScope(1))
        using (var s2 = new TraceSourceScope(10))
        using (var s3 = new TraceSourceScope(100))
        {
            Trace.WriteLine(s1.Operation + s2.Operation + s3.Operation;);
        }
    }

    [Benchmark]
    public void Middleware()
    {
        var middleware =
            from s1 in TraceSource(1)
            from s2 in TraceSource(10)
            from s3 in TraceSource(100)
            select s1.Operation + s2.Operation + s3.Operation;

        Trace.WriteLine(middleware.Run());
    }

    [Benchmark]
    public void Usable_AsUsableOnce()
    {
        var usable =
            from s1 in new TraceSourceScope(1).AsUsableOnece()
            from s2 in new TraceSourceScope(10).AsUsableOnece()
            from s3 in new TraceSourceScope(111).AsUsableOnece()
            select s1.Operation + s2.Operation + s3.Operation;

        Trace.WriteLine(usable.Value());
    }

    [Benchmark]
    public void Usable_CustomUsable()
    {
        var usable =
            from s1 in new UsableTraceSourceScope(1)
            from s2 in new UsableTraceSourceScope(10)
            from s3 in new UsableTraceSourceScope(100)
            select s1.Operation + s2.Operation + s3.Operation;

        Trace.WriteLine(usable.Value());
    }


    // [...]
}

After running this benchmark for a couple of minutes I received the following result. It was full of surprises!

Method N Mean Error StdDev
UsingStatement 1000 765.5 us 7.996 us 7.089 us
Middleware 1000 621.4 us 8.591 us 7.615 us
Usable_AsUsableOnce 1000 623.0 us 8.354 us 7.406 us
Usable_CustomUsable 1000 622.0 us 8.825 us 7.823 us

First of all, There was no difference between Middleware<T> and IUsable<T>. This was a great news because it means that the extra state required to implement IUsable<T> balances the impact of dynamic in Middleware<T>. Second, very practical observation, was that there was no difference between composing usables with AsUsableOnce or by creating a custom class. Finally, both Middleware<T> and Usable<T> showed noticeable improvement from the classical using statement. This one is hard for me to trust, but I haven't spot anything wrong in the benchmark to reject this result so I decided to keep it here.

Conclusions

Though this post I wanted to demonstrate that it is possible to implement the Middleware<T> pattern in a type-safe way without falling back to dynamic. Solution involved combining Object-Oriented code with Functional Programming concepts to work around limitations of C# type system. Although IUsable<T> was defined as an interface and not a delegate, I could define necessary operators to fully implement monad pattern and enable composition through LINQ expressions.

One could think why to go through all this effort to implement this pattern when you can simply use using statement to manage the lifetime of your resources. I'll answer with a citation from Enrico's book because I think he captured it well.

One fundamental difference between the functional and imperative style is that imperative code relies on statements; functional code relies on expressions. In essence, expressions have a value; statements don’t. While expressions such as function calls can have side effects, statements only have side effects, so they don’t compose.

~Functional Programming in C#: How to write better C# code; section 5.5.1. Expressions vs. statements

Yes, it is all about side-effects and composition! Next time I will show how we can add few specialized operators to help us with IDisposable composition. This will allow us to skip the AsUsableOnece used throughout examples in this post.

PS. I'm still looking for a good name for a library. So far I selected "usable" to emphasize relationship to using statement because cleaning up (disposing) resources is one of the most important use cases for IUsable<T>. If you have some better ideas please leave them in the comments section.

References:

  1. Functional Programming in C#: How to write better C# code - Amazon
  2. Functional Programming in C#: How to write better C# code - Code samples
  3. Fabulous adventures in coding - Monads
  4. Middleware - Code definition
  5. Middleware - Code examples
  6. Usable Extensions - Source code
  7. BenchmarkDotNet - Powerful .NET library for benchmarking

piątek, 8 grudnia 2017

How to create dynamic OData query in Power BI?

Power BI allows you to import data from any service that exposes OData endpoint. You can either select "OData feed" in the "Get Data" window or go directly to Power Query editor and use OData.Feed function. Although documentation doesn't explain it clearly this function can operate in two different modes. You can pass either URL to the feed or complete OData query. The latter option is particularly interesting as it allows us to build dynamic queries, however, there is a price!

Let's start with a simple example that loads work items from Visual Studio Team Services using recently announced OData endpoint [2]. In the example I load all the work items that have been created in the last 7 days. Please notice that I use DateTimeZone.UtcNow function to get current date, subtract 7 days and convert it to the ISO 8601 format.

let
    since = DateTimeZone.UtcNow() - #duration(7, 0, 0, 0),
    Source = OData.Feed("https://stansw.analytics.visualstudio.com/_odata/WorkItems?" 
        & "$filter=CreatedDate lt " & DateTimeZone.ToText(since, "yyyy-MM-ddThh:mm:ssZ")
        & "&$select=WorkItemId, Title"),
    #"Removed Other Columns" = Table.SelectColumns(Source,{"WorkItemId", "Title"})
in
    #"Removed Other Columns"

This code will work great in Power BI Desktop, but when you publish it, you won't be able to schedule refresh, which defeats the purpose of publishing. Why would you share stale data?

The problem is with string concatenation. Power BI refuses to detect host for dynamic URL, thus, it cannot figure out which credentials should be applied to the dataset.

In order to fix it we need to switch to the other mode and use feed. This might not work for an arbitrarily complex OData query, but it should be sufficient for the majority of real-world queries.

This solution boils down to starting with feed and relying on Power Query feature called "folding" to translate function calls to the operations natively supported by the data source. In our case we are interested in functions Table.SelectRows and Table.SelectColumns which can be folded. Please notice that not every function can be folded! A good example is DateTimeZone.UtcNow which has to be called upfront and only its result can be passed to the filtering expression.

let
    since = DateTimeZone.UtcNow() - #duration(7, 0, 0, 0),
    Source = OData.Feed("https://stansw.analytics.visualstudio.com/_odata"),
    WorkItems_table = Source{[Name="WorkItems", Signature="table"]}[Data],
    #"Filter Rows" = Table.SelectRows(WorkItems_table, each [CreatedDate] < since),
    #"Select Columns" = Table.SelectColumns(#"Filter Rows",{"WorkItemId", "Title"})
in
    #"Select Columns"

During execution, this code will be translated to the following OData query, which is exactly what we wanted.

https://stansw.analytics.visualstudio.com/_odata/WorkItems?
  $filter=CreatedDate lt 2017-12-01T15:54:56.8747859Z
  &$select=WorkItemId,Title

If you publish the dataset to Power BI, the data source will be correctly detected and after you specify credentials you will be able to refresh the data and define a schedule.

I showed how to apply this solution to Visual Studio Team Services feed. It is, however, general and will for your feeds as well.

References

  1. Power Query M function reference - OData.Feed
  2. VSTS Analytics OData now publicly available
  3. Date and time format - ISO 8601

czwartek, 7 grudnia 2017

Syntax highlighting for DAX

I'm not sure if you noticed it, but one of my previous posts had code snippet in DAX and apart from being nicely formatted with Dax Formatter the syntax was highlighed as well. In order to create that snippet I added DAX support to the amazing library highlight.js I use to build my blog.

With this contribution you can format your DAX code as well.

MaxLength =
VAR Separator = "; "
RETURN
    MAXX (
        'Work Items - Today',
        1 + LEN ( [Tags] )
            - LEN ( SUBSTITUTE ( [Tags], Separator, "" ) )
    )

The pull request to the main project is still pending so you might need to cherry-pick my commit and build it.

https://github.com/isagalaev/highlight.js/pull/1560

Apart from adding it to the main project, I've also contributed it to the fork which is used by Microsoft in their documentation pipeline. Using this fork is another option you have to enable this feature in your websites.

https://github.com/DuncanmaMSFT/highlight.js

Hopefully all pull request will be completed soon and everyone will be able to use it by default.

References:

  1. It is not overengineering! - How to filter multivalued column in Power BI?
  2. DAX Formatter
  3. highlight.js

sobota, 22 lipca 2017

How to migrate VSTS reports from Excel to Power BI?

Disclaimer: I'm the author of the Open in Power BI extension.

If you are using Visual Studio Team Services and you want to see the status of the project in a form of reports in Power BI you are probably using great VSTS Open in Excel extension that was available for years now. It allows you to load results of your queries into Excel. It has been very popular among Power BI enthusiasts because of the great integration between Power BI and Excel.

I've talked to many people who setup elaborate data pipelines which pump data from VSTS into Excel and then into Power BI Service so that it can power the reach reports. The most typical problem with this approach is how to trigger data refresh operation. Power BI can handle refresh operations provided that you use one of the supported data sources. Unfortunately, as you can read in the "VSTS Open in Excel" extension page it requires "Visual Studio 2017" or "Team Foundation Server Office Integration 2017" to run. This makes it impossible to use directly in the Power BI Service. You can work around this problem by manually refreshing Excel workbook and publishing updated data. In this post I will briefly describe what Open in Power BI extension does and how it can help in such scenarios.

Official documentation describes this extension as:

Use this extension to create stunning reports in Power BI based on the Work Item Queries saved in Team Services. You can configure your reports to be refreshed daily to make sure your reports are always up to date. When you modify your Query by adding extra filters or columns these changes will be automatically reflected in Power BI upon next refresh.

What is particularly interesting is that with this extension you configure data refresh in Power BI Service. It is possible because it was build on top of Power Query functions, which are part of Visual Studio Team Services Power BI Data Connector [3]. These functions are available in the Service and in the Desktop since January 9 2017. Let's see how we can use it in our scenario.

Problem statement

Given existing report build on top of VSTS Open in Excel extension, migrate it to Open in Power BI, publish dataset to Power BI Service and configure refresh.

Solution

First let's look into how one could build a reporting pipeline on top of VSTS Open in Excel extension. As always, it all starts with a WIQL query. This time it is "Flat" query, which is a flat table of work items (I like to be creative with names).

Create reports based on "VSTS Open in Excel" extension

I can click "Open in Excel" button and the query result will appear in Excel. Please notice that columns have the right names and type - neat!

I don't need to make any changes here. I can simply save the spreadsheet locally and switch to Power BI Desktop to import the data.

I can do it by selecting Get Data and then Excel.

This will show me all the tables I can import. Please notice that there are 2 tables I can choose from: VSTS_e394e0b4... and Sheet1. It is because when you open a query in Excel the data is imported to a table embedded in a spreadsheet. You can refer to it in both ways but VSTS_e394e0b4... is a much better option. It will not only fetch the data but also metadata such as column names and types. As a result the Power Query code will be cleaner.

For my query this produced the following result. Column names were imported correctly, while types had to be set explicitly in code. This code was automatically generated and it got the types almost right because Tags is set to type any instead of type text. This column had no data, thus, Power BI failed to detect its type. I would need to manually set it to text if I wanted to perform some transformations later.

In order to make this migration more realistic I will customize the query. One of the popular customization I've seen in the past is splitting Area Path into multiple level columns. These columns make it easy to create roll-up reports. I've updated the Power Query code accordingly.

let
    Source = Excel.Workbook(File.Contents("C:\Sources\example.xlsx"), null, true),
    VSTS_e394e0b4_9fbb_4489_b0ea_f38c7103f8bb_Table = Source{[Item="VSTS_e394e0b4_9fbb_4489_b0ea_f38c7103f8bb",Kind="Table"]}[Data],
    #"Changed Type" = Table.TransformColumnTypes(VSTS_e394e0b4_9fbb_4489_b0ea_f38c7103f8bb_Table, {
        {"ID", Int64.Type}, 
        {"Work Item Type", type text}, 
        {"Title", type text}, 
        {"Assigned To", type text}, 
        {"State", type text}, 
        {"Tags", type any}, // Type detection failed for Tags!
        {"Area Path", type text}}),

    // -- Customization -------------------------------------------------------
    #"Split Column by Delimiter" = Table.SplitColumn(#"Changed Type","Area Path",
        Splitter.SplitTextByDelimiter("\", QuoteStyle.Csv),
        {"Area Path.1", "Area Path.2"}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter", {
        {"Area Path.1", type text}, 
        {"Area Path.2", type text}})
in
    #"Changed Type1"

After loading the dataset I can publish it to Power BI Service, but there are some problems with scheduling refresh as you can see in the screen below.

First, you need to install Data Gateway because Power BI Service cannot access path C:\Sources\example.xlsx. It is also worth noticing that the gateway has to run on the same machine where the file is stored and you cannot move the file, otherwise the refresh operation will fail. Second, if you modify the query and add more columns, then they will not show up in the dataset. Finally, refresh will only fetch data that is already present in the spreadsheet. This means that unless you manually open Excel file and press Refresh button in the Team toolbar, you will end up with the stale data.

There are some workarounds such a saving workbook to SharePoint and scripting refresh operation, but they are more complicated than what an average user could set up.

Create reports based on "Open in Power BI" extension

Right now we should have a pretty good understanding of how one can use Excel to build reports in Power BI. Let's see how to achieve similar result with Open in Power BI extension.

Just as before I start with the query result page where I can find Open in Power BI button. This time, instead of opening intermediate app (Excel), it will launch Power BI Desktop straight away with a dataset ready to refresh.

When I open Query Editor I can immediately see some differences. Probably the biggest one is that there are 2 queries: Work Items and Functions. The latter is darker because it is not a table, it is a Power Query module that exports functions used in the other queries.

I will ignore Functions and look into how I can modify the Work Items query to apply the same customization as before. The query text is shorter and probably looks cleaner. It is divided into definition of parameters and then Functions[WiqlRunFlatQueryById]() is used to fetch a table. Column names and types are the same and I can use my old code directly. The only thing that I need to update is the name of the variable that is passed to Table.SplitColumn function. Previously it was Table.SplitColumn(#"Changed Type" and now it is Table.SplitColumn(Source. That's it, no other changes!

let
    url = "https://stansw.visualstudio.com", 
    project = "vsts-open-in-powerbi",
    id = "d5349265-9c9d-4808-933a-c3d27b731657",

    Source = Functions[WiqlRunFlatQueryById](url, [Project = project], id),

    // -- Customization -------------------------------------------------------
    #"Split Column by Delimiter" = Table.SplitColumn(Source,"Area Path",
        Splitter.SplitTextByDelimiter("\", QuoteStyle.Csv),
        {"Area Path.1", "Area Path.2"}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter", {
        {"Area Path.1", type text}, 
        {"Area Path.2", type text}})
in
    #"Changed Type1"

Now I can publish dataset to Power BI Service and inspect the settings page. First thing that I can notice is in the Gateway connection - Connect directly is the only available option. I do not need to install anything on my machine to get the refresh going!

Section Data source credentials is equally interesting. There is one entry Visual Studio Team Services. Because Open in Power BI extension is built on top of functions from the offial Data Connector, I can use standard authentication mechanism [3].

Finally I can expand Scheduled refresh section and define how often I want my dataset to be refreshed.

Conclusions

Open in Power BI extension was created with Power BI scenarios in mind, thus, it is much easier to use than building pipeline with VSTS Open in Excel. At the end of the day both solutions give me a table so any customization I create in Power Query for one solution, I will be able to easily move to another. The strongest advantage of Open in Power BI approach is that it works strainght away in Power BI Service without the gateway.

References:

  1. Visual Studio Team Services Marketplace - VSTS Open in Excel
  2. Visual Studio Team Services Marketplace - Open in Power BI
  3. Visual Studio Team Services - Functions available in Power BI Data Connector

niedziela, 2 lipca 2017

How to filter multivalued column in Power BI?

Recently someone asked me what is the best way to filter on a multi-valued columns in Power BI. The question was in the context of Tags property of the Work Items - Today table in the Visual Studio Team Services [1]. Although the original question was very specific, the solution I came up with can be generalized to multivalued columns in any data model.

What is a multivalued column?

Multivalued column is a database design pattern where instead of normalizing and splitting data across multiple tables you keep multiple values in a single table. You can see it typically in the data warehouses where normalization would lead to a too granular fact tables. One of the best examples is the Categories column in the Product table where want to allow users to select multiple values, but you don't want to create a separate Categories table.

In the context of Visual Studio Team Services there is one table where this pattern was applied - Work Items - Today. It contains Tags column which is a "; " delimited list of tags like in the example below.

Work Item Id Title Tags
1 Add column A to table B database; milestone1
2 Create migration script database
3 Improve performance of slow queries performance; database
... ... ...

Problem statement

Given table with a multivalued column prepare data model that will allow users to easily filter on distinct values.

For example, we can start with the table below, which has multivalued Tags column.

let
    Source = #table(
        {"Work Item Id", "Title", "Tags"},
        {
            { "1", "Add column A to table B", "database; milestone1" },
            { "2", "Create migration script", "database" },
            { "3", "Improve performance of slow queries", "performance; database" }
        })
in
    Source

If we simply selected Tags for the slicer it would produce the following result. Instead of values users could only select combinations that appear in the dataset. That's not what we want.

A much better design is to extract distinct values from the Tags column so that we can build the following slicer.

Solution

The solution I would like to show you is based on the post by SQLJason where he talks about handling delimited rows [2]. I modernized and improved it a little to cover columns of arbitrary length and to avoid contaminating model with auxiliary tables. The idea stays the same and can be broken down into the following steps.

  1. Create temporary index table.
  2. Apply CROSSJOIN operation and convert source table from wide to long format.
  3. Define relationships.

In my previous post "Creating index table in DAX" I explained how to create index table for a given N. Here, N should be selected as the max number of elements in the multivalued column.

MaxLength =
VAR Separator = "; "
RETURN
    MAXX (
        'Work Items - Today',
        1 + LEN ( [Tags] )
            - LEN ( SUBSTITUTE ( [Tags], Separator, "" ) )
    )

Now we can use this DAX expression and create Indexes table.

Indexes =
VAR Separator = "; "
RETURN
    FILTER (
        SUMMARIZE (
            ADDCOLUMNS (
                CALENDAR (
                    DATE ( 2000, 1, 1 ),
                    DATE ( 2000
                        + MAXX (
                            'Work Items - Today',
                            1 + LEN ( [Tags] )
                                - LEN ( SUBSTITUTE ( [Tags], Separator, "" ) )
                        ), 1, 1 )
                ),
                "Index", YEAR ( [Date] ) - 2000
            ),
            [Index]
        ),
        [Index] > 0
    )
Index
1
2
3

We do not need to store this table in our model. Instead, we can simply save it in the DAX variable and reuse later.

The final expression consists of the following operations:

  1. Save separator in the variable.
  2. Create index table.
  3. Add TagsCount to the Work Item - Today table to keep track of the index range.
  4. Apply CROSSJOIN with Indexes table.
  5. Filter out indexes that are outside of the range.
  6. Use PATHITEM to extract single value from the multivalued field by index and save it in Tag column.
  7. Summarize to reduce set of columns in the output table.
Tags =
VAR Separator = "; "
VAR Indexes =
    FILTER (
        SUMMARIZE (
            ADDCOLUMNS (
                CALENDAR (
                    DATE ( 2000, 1, 1 ),
                    DATE ( 2000
                        + MAXX (
                            'Work Items - Today',
                            1 + LEN ( [Tags] )
                                - LEN ( SUBSTITUTE ( [Tags], Separator, "" ) )
                        ), 1, 1 )
                ),
                "Index", YEAR ( [Date] ) - 2000
            ),
            [Index]
        ),
        [Index] > 0
    )
RETURN
    SUMMARIZE (
        ADDCOLUMNS (
            FILTER (
                CROSSJOIN (
                    ADDCOLUMNS (
                        'Work Items - Today',
                        "TagsCount", 1
                            + ( LEN ( [Tags] ) - LEN ( SUBSTITUTE ( [Tags], Separator, "" ) ) )
                                / LEN ( Separator )
                    ),
                    Indexes
                ),
                [Index] <= [TagsCount]
            ),
            "Tag", PATHITEM ( SUBSTITUTE ( [Tags], Separator, "|" ), [Index] )
        ),
        [Work Item Id],
        [Tag]
    )

It will produce the following result. This table captures relationship between work items and tags.

Work Item Id Tag
1 database
2 database
3 performance
1 milestone1
3 database

Now we need to define relationships and specify cross filtering direction. First, task is easy because most likely Power BI will automatically detect the relationship like in the example below.

The automatic relationship is a standard one-to-many relationship, which means that it will allow us to filter Tags based on Work Item - Taday selection. That is exactly opposite of what we need.

Double-click on the relationship to open the advanced editor and under "Cross filter direction" select "Both".

Finally, create a new slicer with Tag field from the newly created Tags table to get the best filtering experience! You can also try out amazing Smart Filter custom visual, which fits perfectly for this scenario.

References:

  1. Team Services & TFS - Available data tables in the Power BI Data Connector for Team Services
  2. Split a Delimited Row into Multiple Rows using DAX Queries
  3. It is not overengineering - Creating index table in DAX
  4. Custom visuals for Power BI - Smart Filter by OKViz
  5. SQLBI - Best Practices Using SUMMARIZE and ADDCOLUMNS
  6. Power BI Documentation - Calculated tables in Power BI Desktop

sobota, 1 lipca 2017

Creating index table in DAX

Sometimes, when you create advanced calculations in DAX, you need to create an auxiliary index table which contains increasing number like in the example below.

Index
1
2
...

There are several ways one can approach this problem. Marco Russo described some of them for the general case of creating static tables in his article [1]. Here I would like to present a different approach which works great for the more specific problem of index tables.

Problem statement

Create a table with one column where rows are increasing from 1 to 5.

Index
1
2
3
4
5

Solution

In Power BI we can create such table with Power Query or directly in DAX using calculated tables. In the first approach we would use List.Numbers function. Second, is more challenging as there is no such function in DAX. Luckily, there is CALENDAR function which can be used to generate a table!

IndexTable =
VAR Length = 5
VAR DummyDate = DATE ( 2000, 1, 1 )
RETURN
    CALENDAR ( DummyDate, DummyDate + Length )
Date
2000-01-01 00:00:00
2000-01-02 00:00:00
2000-01-03 00:00:00
2000-01-04 00:00:00
2000-01-05 00:00:00

Now we just need to clean it up by converting dates to numbers, adding new column, and removing the original column [4].

IndexTable =
VAR Length = 5
VAR DummyDate = DATE ( 2000, 1, 1 )
RETURN
    FILTER (
        SUMMARIZE (
            ADDCOLUMNS (
                CALENDAR ( DummyDate, DummyDate + Length ),
                "Index", INT ( [Date] ) - INT ( DummyDate )
            ),
            [Index]
        ),
        [Index] > 0
    )

References:

  1. Create Static Tables in DAX Using the DATATABLE Function
  2. Power Query M function reference - List.Numbers
  3. DAX Function Reference - CALENDAR
  4. Best Practices Using SUMMARIZE and ADDCOLUMNS
  5. Calculated tables in Power BI Desktop