Some time ago we wrote about the Excel link and data import of our platform (see The Excel Link). In this article, we would like to take a closer look at the technical details of data import and how to store it afterwards. As already mentioned, Excel is widespread and used in many companies anyway. To some extent, the data sets are already available in Excel, in other cases they can simply be exported to Excel. This is why we decided to set up our data import in Excel.
From data import…..
Of course, there are cases in which data is automatically imported overnight, but that’s not what this is all about. We want to import data during our daily work without having to go and fetch a coffee or even take a lunch break while it’s in process. In a lot of cases, datasets have to be imported over and over again because they keep changing or expanding. And that’s why import must be fast enough so that it doesn’t disrupt the normal working routine. We have managed it – like this:
We use key columns to identify datasets. If these key columns in OPTANO correspond to those in the Excel file, then we have the same dataset. In other words – only the data which varies must be imported and you can keep the rest. Naturally, datsets which are not yet available will be imported completely. So not everything is deleted and replaced by new data but in actual fact, only what wasn’t available is imported. A wonderful side effect of this procedure is that the data object remains the same and so all references to this object remain – and it is a great deal faster.
But what use is the fastest data import if it takes ages to store the data afterwards? Storing the entity framwork took too long in our opinion, and so we invested a lot of time and energy into making sure that fast storage follows fast data import. We are now placing our bets on tabular value parameters in connection with so-called stored procedures. Here, instead of countless SQL Statements, one single chart is compiled in the SQL server memory/library. Based on this chart, stored procedures then store the data. Fewer connections and shorter round-trip times are the result – storing goes by in no time! (if anybody wants to learn more, they can read about this at: https://msdn.microsoft.com/de-de/library/bb510489.aspx.)
Look how fast we are now!
Of course we couldn’t wait to see how fast this all was! In a nutshell, we conducted a small trial run with various numbers of datasets – from 250 to 50,000 – and just for OPTANO, even with 1,000,000 datasets for 12 columns each. We conducted a comparative test with OPTANO and the Entity Framework 6.1  The Timing contained the creation of new objects, filling these with values and storing them in the database. Finally, we formed the median value over 10 test runs (whereby the best and the worst were excluded). The convincing result can be seen in the diagram below. The comparison clearly shows: OPTANO only needs 10% of the time – just 2.5 minutes for 1,000,000 datasets with 12 columns!
: 1.000.000 ~= 10^2 is the maximum number of rows that Excel can process. This means that OPTANO is well prepared for the largest Excel spreadsheets possible. See also: https://support.office.com/en-us/article/Excel-specifications-and-limits-CA36E2DC-1F09-4620-B726-67C00B05040F
: Entity Framework is data Access technology as recommended by Microsoft, see also: https://msdn.microsoft.com/en-us/data/ef.aspx
Autorin: Sabrina Geismann