Frequently Asked Questions

What EmcienScan does with your data and how it calculates its results

1) What do you mean by "Connected Strength"?

Connected Strength is measured based on a type of correlation between every pair of columns. Because data can be both numeric and categorical, EmcienScan uses mathematics involving information theory to measure and rank the data.
To see how results from EmcienScan compare to results from other statistical or analysis software, try computing a correlation matrix using only numerical data and note the similarity to your results in EmcienScan. However, unlike most other statistical software, EmcienScan can compute the correlations for non-numeric data as well.

2) How does the system of five dots convey Connected Strength?

The amount of green filling in the dots indicates the relative connected strength or quality of a dataset or field. For example:  
Five full green dots indicate that a specific field or many of the fields in the dataset are connected. The dataset or specific field contains a large amount of information allowing the user to discover relationships between fields and the key drivers for analyzing a desired outcome.
Five empty gray dots indicate that a specific field or all of the fields in the dataset are not connected at all. Knowing that data is disconnected is just as valuable as knowing the key drivers within a dataset, and provides a measurement of the quality of data before analysis.

One dot outlined in green indicates that the selected field or the majority of fields within the dataset are slightly connected.

Most fields or datasets will have some value between the two extremes, this is fairly representative of natural data, that will contain natural patterns as wel as disparate data points.

3) If a variable has twice as many dots as another variable, does that mean that it's twice as connected?

In general, the more dots a variable has, the more connected it is. However, to add clarity and make the results more understandable, the variables are sorted with a decay function to emphasize differences in connected strength. So a variable with twice as many dots will be slightly more than twice as connected.

4) Can EmcienScan analyze data across two separate tables?

EmcienScan discovers connections and correlations with a table or view. To discover connections across two or more tables, you need to create a single view of the tables and scan that combined view. This can be done easily by using tool such as EmcienConnect- contact your Emcien representative for more information.  

5) Does EmcienScan analyze all of my data or only a sample?

EmcienScan randomly samples across databases and files to ensure accuracy in discovery while delivering results very quickly. For every data set, EmcienScan will calculate how much it will need to sample, and reports that number in the top right of the home page. If you would like to test if the autodetect feature will work for your data sets, try running the same set multiple times to see if the results change.

6) Is the input data stored anywhere?

EmcienScan does not keep any of the previously analyzed data aside from the few sample rows shown within the application for readibility of the analysis.

7) Will EmcienScan find non-linear relationships?

EmcienScan will find all relationships within data, regardless of whether the data is numerical or categorical. Linear relationships will show a higher amount of connected strength than relationships of higher orders, but non-linear relationships and correlations will still show a high amount of predictability within the software.

8) What format does data have to be in for it to be scanned?

EmcienScan can explore any data that is organized in rows and columns, with each column having a header.

9) Can I use the results EmcienScan in my traditional statistical model?

Yes! There are various ways to use these outputs in your current models or simulations. For instance, EmcienScan can discover hidden predictable variables in various tables that may have previously gone unnoticed, and supplies groups of tightly-knit related columns that can aid in the beginning steps of analysis.

What you need to know to get EmcienScan up and running

1) How much of a load does EmcienScan put on my database?

EmcienScan makes only a single SQL SELECT query against your database for each scan. If the Database you are analyzing would be hindered by a read query, you can use EmcienScan along with a data virtualization solution that provides caching for queries and views.

2) Does EmcienScan need to be installed on a server, or can it be installed on a computer or laptop?

EmcienScan was designed to be able to scan large amounts of data very quickly. With that in mind, we recommend installing on a server with a large amount of RAM, Cores, and Hard drive space. However, if you don't have access to a large server and only have a laptop or computer for installation, EmcienScan can still explore your data, although it may scan slower and have a smaller maximum data size that it can explore.

3) Can I install the EmcienScan Virtual Machine on top of another Virtual Machine?

We strongly suggest that EmcienScan be installed as a virtual applicance through a hypervisor or on an OS running directly on the host OS's hardware. However, it is possible in the Proof of Concept stage to install it as a virtual machine on top of a virtual machine as long as the virtualization software supports that capability and hardware assisted virtualization is exposed to the guest OS.