Data Analysis
Analyzing data
Printing the analyses
Analyzing parts or subsets of your data
Using Access query design (QBE editor)
Creating a cross-tabulation or pivot analysis in Excel
Frequency distribution (histogram) of numerical characters
Comparing characters with external classifications
Summarizing items
List of all analysis queries and reports available
Analysis queries provided
Analysis reports provided
Analysis charts/graphs provided
Next table of contents: Editing data
Previous table of contents
Main table of contents
Several queries or reports to analyze your data can be selected from a pick list. You must first select a group of queries or reports under Analysis type. All analysis queries or reports of this group are then available in the second pick list Open analysis or report. Click the OK button to run the analysis. See List of all analysis queries and reports available for a complete list of all available analyses.
Some queries will generate views of the data, while other queries will perform updates to the data. A warning message will appear before any data are updated. If you have a full version of Microsoft Access (i.e. not a run-time version), you can select whether you want the query to run directly or to see it in design mode first. For notes about printing see the next chapter printing analyses. The checkboxes for interaction with Microsoft Excel are explained further below.
Note that most queries will automatically be deleted after use. If you want to keep a query to run it directly from the database explorer, you can use the File, Save as command in the menu. Save the query under a different name (e. g., add a "1") to avoid it being deleted when closed. If you want to compare the results of several queries, you must use saved queries, since you can only open one query at a time using the current dialog box. If you select a report, a report specific to your descriptor project will be created and opened. Reports are not deleted after use.
The main analysis dialog box also provides a button to directly access the interactive identification Identify.
The following chapters explain further details about the analysis options:
Printing the analyses
Analyzing parts or subsets of your data
Cross-tabulation (crosstab) queries, pivot analysis in Excel
List of all analysis queries and reports available
Analysis queries provided
Analysis reports provided
Analysis charts/graphs provided
Contents of Analysis
Introduction to the main DeltaAccess dialog box
Main table of contents
Normally no special printing command is required. You can print out queries, forms, and reports/graphs using the standard Microsoft Access menu commands. To avoid wasting paper, you can switch into print preview mode first. You may want to adjust column widths and heights, or switch to landscape printing to make the query fit well. When you view a report you will see a page preview, which should be identical to what your printer produces.
Formatting analysis queries for printing:
Most analysis queries are not preformatted for printing on a certain page format. To adjust page layout and formatting, select the portrait or landscape page orientation from the page/print setup (File menu). Change the page margins if necessary. Adjust the column width to your needs and perhaps hide columns that you do not want to have printed. To change the sequence of columns mark them (move the mouse on the column label, and click while the mouse cursor has the form of a down-pointing arrow) and drag them to their new position. Sometimes it may be preferable to have narrow columns and to increase the line height to display several lines of text for each record. Use the horizontal line between the record selectors on the left side of the table, and drag it up or down. You can also change the sorting order or filter your data before you print them. Use page preview (in the File, not in the View menu!) to check the results before printing.
Note that if you save the query under a new name, most of these settings are saved. Yet, whenever you create the same query for a new descriptor project using the DeltaAccess dialog box, you will have to repeat this procedure. Please contact me if you think that the sequence of fields should always be different.
In contrast to queries, reports are already provided with a certain page setup. Since in Europe we use A4 page format, you may have problems with the American letter format. I tried to take this into account, but it might not be working. You may have to adjust the report in design mode, e. g., by making the graph object frame smaller. If you send me an adjusted version of a report, I will change the setup in the next version of DeltaAccess accordingly.
Contents of Analysis
Main table of contents
Analyzing parts or subsets of your data
Sometimes you may want to restrict your analysis to a subset of your data. You can create character or item subsets, in order to reduce the number of characters in a report, or to run character state crosstab queries only on those items which fulfill a certain condition (e. g., are in some genus, or contain a specific character state). Use the link option together with the subset option of the management action create new project to create such subsets. If you disable the create editing forms option, the subsets will be created much faster and will use very little space in your database.
You can also design "on the fly" subsets if you modify the SQL queries, which form the basis of the analysis options. Some basic procedures for modifying queries are explained in the following section Using query design.
Microsoft Excel: Sometimes it may be faster to simply export your data to Microsoft Excel and delete unwanted rows or columns before you format and print your results. Excel is considerably more powerful in formatting tables than Microsoft Access. You can print labels at any angle, join cells, center headings over multiple cells, or use lines and other graphical elements to format your results. I especially recommend Excel if you want to design your own charts. Modifying and fine-tuning charts/graphs is considerably more transparent in Excel than it is in Access (although both programs use the same applet for the charts). The disadvantage of using Excel is that you have to repeat the export and chart creation process each time you change your data, while with MS Access, you can save a modified query or a report including a chart for later use. Microsoft Access queries are dynamic, i.e., they use your most current data whenever they are executed. Many queries will even update their data while they are open, if a colleague changes the data they are based upon (Access calls this a "Dynaset"). In a network you may get different results on two consecutive runs of an analysis, because as colleague changed something during the previous 5 minutes! Consult the next chapter on how to export data to Microsoft Excel.
Contents of Analysis
Main table of contents
Using Access query design (QBE editor)
DeltaAccess frequently uses queries instead of reports. Reports are much more powerful in formatting, but they are difficult to modify. Queries, on the other hand, are much easier to understand and to modify. You can create your own reports based on queries if you need more sophisticated formatting. Microsoft Access provides a report wizard and an autoreport function, which make this easy for many types of reports. Select the saved query in the database explorer and select Insert, Report from the menu.
Some modifications to queries can be performed in datasheet view and are discussed in the previous chapter Printing the analyses.
To make the best use of the standard queries provided by DeltaAccess, you can modify them in design view. You must have a full version of Microsoft Access to do this, the run-time version lacks the necessary design tools.
The MS Access QBE (query by example) editor is opened if you click on the design button () in the tool bar, or select View, Design from the menu. You can also open the query directly in design mode using the design-mode-option in the main analysis dialog box.
In the upper part of the QBE window you see a graphical representation of the table included, and the relations between them. You can add or remove tables, and change the relations between them. Doing so requires some understanding of the internal structures of the database (i.e. the entity relation model, see the chapter Information model).
In the lower part of the window you see a datasheet of the attributes (fields) present as columns in the query. Changing the design in this part is much simpler and you should start trying to understand this part first.
The first row of the datasheet determines the name of the attribute. This name may have two parts: first the name shown to the user, then after a colon (':'), the name of the table attribute (= field name). Thus 'IID \ CID=: IID' means: this is the field IID, but the column label is 'IID \ CID='.
Depending on the type of query and the settings of your version of Access, you may or may not see a row containing the table name to which an attribute belongs, a row where you can define that the records will be sorted by one or several attributes, and a row with checkboxes where you can define whether attributes are visible (checkbox on) or hidden.
The most important rows for modifying queries start below the row with checkboxes. In the English version of Access, the rows are labeled criteria. This part can be used to restrict the number records which will be returned when the query is executed. Using criteria can be a very fast way to analyze many temporary item or character subsets, without creating subsets as full linked subset projects. If you want to restrict a character usage crosstab query to the first 20 items and to characters 1, 3, 6, 8, 9, 10, 11, 12, you should enter 'Between 1 And 20' as criteria for the IID attribute, and 'IN (1, 3, 6, 8, 9, 10, 11, 12)' as criteria for the CID attribute. Both criteria should be entered without the quotation marks.
Consult your Microsoft Access documentation for further explanations about the use of the qbe (query by example) editor. It requires a little effort to learn at first, but then it really is a wonderful tool. For more general information about Microsoft Access queries see the Contents of the Access 97 help file, Introduction to Microsoft Access 97, Queries:
Queries: What they are and how they work
Ways to customize a query
Ways to specify multiple criteria in a query
Ways to work with data in a query's datasheet
Example of a crosstab query
Join types and how they affect query results
and Working with Queries, Creating and Modifying Queries:
Create a query
Design or modify a query
About using the asterisk in the query design grid
About working with blank fields in queries
and many further topics under the heading Working with Queries.
Note that even the reports supplied get most of their power from the queries used for their recordsource. If you view a report in design mode, you can modify the query behind the recordsource property of the report.
Contents of Analysis
Main table of contents
Creating a cross-tabulation or pivot analysis in Excel
DeltaAccess provides only simple crosstab queries, which are limited in several ways. They are limited to 2 dimensions, they can not provide column sums (row sums are complicated, but possible) and they will leave fields with a count of '0' empty. The latter is convenient for a quick overview, but inconvenient if you want to analyze your data further, e. g., in Excel. Because of these limitations, I recommend using the provided crosstab for quick direct analyses and error checking, and the Excel pivot wizard for more sophisticated analyses.
In the analysis section of the DeltaAccess dialog box you will find a checkbox Analyze in MS Excel, which will automatically export the results of the query to Microsoft Excel. You must have Excel 97 installed on your computer.
If the export option provided by DeltaAccess does not work (e. g., because of version or registry problems), check if the file has been created (either in your default data directory or in the directory where DeltaAccess.mdb is placed). You can also try to use the standard Microsoft Access functionality: Select Tools menu (German version: Extras menu), Office Links, Analyze in MS Excel (for other languages: Tools is the 3rd-last menu, Office Links the 3 entry there) and the data will be displayed in Excel. Note that Excel (97, version 8) is limited to 65536 rows. If your query returns more rows (look at the record count at the bottom, next to the navigation buttons), you must first restrict your query so that it returns less data. In the DeltaAccess export dialog box you can also find an option for a simple export of all your project data to Microsoft Excel, but this will be of limited value for larger projects.
The pivot function of Excel is one of the most powerful functions on your PC. The checkbox Create pivot table in the DeltaAccess dialog box tries to automatically create a pivot table for you. Currently this functionality is provided only for few queries (e. g. Char-state usage Char 1 x 2 (list), and ... (list, names)). You need a list type analysis output, i.e., you can not create an Excel pivot table from an Access crosstab type analysis. If you always get error messages during the automatic creation of pivot tables with DeltaAccess, either Microsoft Office may not be correctly installed, or the code may not work with all language versions of Excel 97.
To manually create a pivot table, put the cursor into the cell where the data you want to analyze start and select pivot wizard from the Data menu of MS Excel. The first 2 pages can usually just be accepted as they are. On the third page you must define which variable you want to display where. The result variable, often called X or Count should be dragged into the main part of the table displayed on this form. By default is will be entered as Sum of ..., which is correct for most purposes. (In fact, since Microsoft Access already groups the data into counts, there will usually be nothing to sum at all. A pivot table could itself do the grouping, which Microsoft Access already did, but this would be much less effective.) Further, you must drag the other variables into the column and row headers, or into the page (upper left box, this will result in a separate page for each value in this variable). You can drag more than one variable into the row or the column headers. The sequence of multiple row or column header variables can be changed and defines the hierarchy of the pivot table. On the last page of the wizard, you should decide to create the pivot table on a new Excel sheet. The option button that is available on this page gives you access to several pivot options, but you do not have to change them the first time.
There are more options for a pivot table than you can imagine. Some of the more important options are:
You may also right-click on any of the variables (the gray, button-like cells). The context menu entry Field will give you options specific to this variable. Please consult your Excel documentation for further details on pivot tables.
It is possible to integrate these pivot tables into Microsoft Access forms or reports (provided Microsoft Excel is installed). You will have to create these reports yourself, because they will be specific to the combination of characters you currently analyze. If anybody creates generally useful reports, forms, or queries in any way, which do not entirely depend on his or her own project, I would be grateful to receive them and will integrate such reports into the next releases of DeltaAccess.
For more general information about the interaction of Microsoft Access and Microsoft Excel see the Contents of the Access 97 help file, Sharing Data Between Applications, Sharing Data with Microsoft Excel:
Export a table or query to a spreadsheet
Save an object's output as a Microsoft Excel or rich-text format file
What is a PivotTable?
See also Introduction to character cross tabulation for a discussion of the general properties of cross-tabulation analyses and statistics.
Contents of Analysis
Main table of contents
Frequency distribution (histogram) of numerical characters
Currently, DeltaAccess offers no frequency distributions for numerical characters. This chapter explains how to retrieve the relevant information and how to create a frequency distribution using Microsoft Excel. An independent tool will be introduced in a future version of DeltaAccess, together with the mapping of categorical and numerical character states.
To create a frequency distribution of numerical characters, using a combination of DeltaAccess and Microsoft Excel, try the following steps:
In DeltaAccess, open the analysis dialog box and select the Analysis type Edit or view data. Select the query Mean, or middle of ranges/min/max of 1 numerical char. Check the checkbox Analyze in MS Excel and click on the OK button.
Continue the analysis in Microsoft Excel, which will normally be opened automatically with the Excel file into which the data were exported. The Excel spreadsheet contains the information about the character you selected, the measurement unit, the item names, and a column displaying the statistical attributes (mean, ranges, etc.) combined into a single formatted string. The final column (CalcMean) calculates a single "middle" value for each item. This is the mean if available, otherwise the middle of upper and lower range (or minimum/maximum, if ranges are lacking as well).
In the column to the right of the CalcMean column, enter the class borders for the intervals into which you want to group the values:
To count all items where the calculated mean of the character in question is in these intervals: |
Enter these values as class borders: |
smaller or equal 10: |
10 |
greater 10, smaller or equal 20: |
20 |
greater 20, smaller or equal 40: |
40 |
greater 60, smaller or equal 60: |
60 |
greater 60: |
As you can see, the upper class border is included in the range, i.e. a value of 10 would be included in the first interval in the example above. You now have two options:
a) Create a frequency table using a wizard:
This requires that you have installed the Analysis add-in supplied together with Excel. At the end of the end of the Tools menu you should then have an entry Data analysis. Select this entry and select Histogram from the list in the dialog box. Follow the instructions, selecting the data range (note: you should select the exact range containing your values, not the whole column) and the class border range ("bin") in your spreadsheet.
The wizard can optionally create columns with cumulative percentages, and create a second copy of the frequency table sorted by frequency instead of class borders (Pareto diagram). It can also automatically create a chart for you.
The disadvantage of the wizard method is that the data are static, i.e., not calculated through a formula. Updating data will not automatically result in an update of the frequency table. For this reason and because some users may not have the Analysis add-in installed, a manual method is described below, as an alternative to using the wizard.
b) Create a frequency table manually:
In the column to the right of the class border column, select an area with one cell more than you have entered class borders. Keeping the cells selected, type "=FREQUENCY(" on the keyboard. Select the column with the CalcMean values, type a ";" on the keyboard, and select the column or area with the class borders. Close the function by typing ")" and press Shift-Ctrl-Enter to enter it as an array formula. The selected cells should now contain the frequency distribution. In the formula bar, the function should look like "{=FREQUENCY(F:F;G:G)}". If the braces (= "{}") around the function are missing, you have not used Shift-Ctrl-Enter. Do not type the braces themselves; the formula will then look correctly but will not work!
Note for users of non-English versions: Unfortunately, in contrast to Access, Excel does not automatically translate the English function names for you. The appropriate function can be found in the "Statistical" category of the function wizard (click on the button labeled fx in the toolbar).
The result (note the area which was selected before entering the array formula) could look like:
To create a histogram chart based on these results, click on the ChartWizard button () while the cells containing the frequency function are still selected. Normally you can accept the default (Chart type column, sub-type 2D clustered columns). Click Next, accept the defaults for data range (the series are in columns), click on the second tab of this step, labeled Series. Go to Category (X) axis labels at the bottom, select the range of cells with the class borders (the cell addresses are automatically entered if you select an area with the mouse). Click Next again. In the options step you can enter titles for your chart and uncheck the box to show the legend (Legend tab). Click Finish to create the chart. The labels on the category axis are slightly confusing, because they correspond to the class borders, but are placed in the center of each class. For publication, you may wish to add an additional column containing the corresponding intervals (compare the table above) and use this column for the Category (X) axis labels.
Consult also the chapter Character cross tabulation (numeric) for further details.
Contents of Analysis
Main table of contents
Comparing characters with external classifications, phylogenetic trees, or cluster analyses
|
A frequent request is to compare trees derived from external analyses with data in a DELTA data set. The tree may be derived from classical morphological studies, statistical multivariate analyses or from molecular data (e. g. DNA sequences). DeltaAccess currently does not offer any sophisticated analyses to map character distributions on phylogenetic trees. The current chapter describes how the use of character cross-tabulation nevertheless provides some interesting insights. |
To some extent, relevant features of tree diagrams can be reduced to categorical characters. In the following hypothetical example:
You could be interested in the main bifurcation and analyze groups A versus group B, or you could compare the 3 groups 1, 2, and 3. If the distances on the tree correlate to distances in the data set (e. g., in the example above, the distance between groups 2 and 3 is larger than between items 1 and 2), you should select groups which encompass the major distances in the tree. Although an analysis which traverses the above tree further to the right would be possible, it would be difficult to interpret the results of such an analysis, because it yields to many groups on the same level. This is less of a problem if only a subtree is analyzed, e. g., in a comparison of item1 to item 5.
To perform the analysis in DeltaAccess, you would:
Using a statistical analysis tool (e. g. SAS, categorical data analysis) you can analyze the significance of the differences. You could use a chi-square test, a loglinear test ("G-test"), or Fisher's exact test to test the assumption of independence of the two factors. If the null-hypothesis of the test can not be rejected (no significant result), the two factors are independent. This means that the classification factor ("tree-character") does not influence the character you are analyzing. Such a result would be expected, if the members of each group were chosen at random.
Note that chi-square or loglinear tests are valid only if a sufficient number of samples in each group are present. The expected frequencies under the assumption of independence should not fall below a certain number (which depends on the test used). Most statistical packages will issue a warning if this condition is violated, but you might want to verify it yourself. Although Fisher's exact test will be valid for low frequencies, this does not necessarily imply that the result is very meaningful if the null hypothesis could not be rejected. As with other tests, the probability of accepting a false null hypothesis is usually not calculated. The error "beta" (as opposed to the error of rejecting a true null hypothesis = the significance level "alpha") depends on the sample size and the minimum required difference.
Contents of Analysis
Main table of contents
Next