Difference between revisions of "Database files in IFs"

From Wiki
Jump to: navigation, search
Line 34: Line 34:
 
'''Variable''': Name of variable you assign (Automatically generated from import)
 
'''Variable''': Name of variable you assign (Automatically generated from import)
  
                         ex. WSSJMPSanitationRural%Improved
+
                ex. WSSJMPSanitationRural%Improved
  
 
'''Table: '''Automatically generated 
 
'''Table: '''Automatically generated 
  
                         ex. SeriesWSSJMPSanitationRural%Improved
+
                ex. SeriesWSSJMPSanitationRural%Improved
  
 
'''Group: '''Group or category you assign to the variable (Automatically generated from import)
 
'''Group: '''Group or category you assign to the variable (Automatically generated from import)
  
                         ex. Infrastructure, Water, Health
+
                ex. Infrastructure, Water, Health
  
 
'''Subgroup: '''Subroup you assign to the variable (Automatically generated from import)
 
'''Subgroup: '''Subroup you assign to the variable (Automatically generated from import)
  
     ex. Sanitation
+
                ex. Sanitation
  
 
'''Series: '''Yes
 
'''Series: '''Yes
Line 56: Line 56:
 
'''Definition: '''Specific definition of the variable that will be displayed in IFs
 
'''Definition: '''Specific definition of the variable that will be displayed in IFs
  
     ex. Proportion of Rural population served with Improved Sanitation (%)
+
               ex. Proportion of Rural population served with Improved Sanitation (%)
  
 
'''Extended Source Defenition: '''Additional information about the variable if needed
 
'''Extended Source Defenition: '''Additional information about the variable if needed
  
     ex. Likely to ensure hygenic separation of human excreta from human contact. They include: 1) Flush/pour flush to (piped sewer system septic tank or pit latrine) 2) Ventilated improved pit (VIP) latrine 3) Pit latrine with               slab 4) Composing toilet
+
               ex. Likely to ensure hygenic separation of human excreta from human contact. They include: 1) Flush/pour flush to (piped sewer system septic tank or pit latrine) 2) Ventilated improved pit (VIP) latrine 3) Pit latrine                       with slab 4) Composing toilet
  
 
'''Units: '''Specification of the units the variable is measured by
 
'''Units: '''Specification of the units the variable is measured by
  
     ex. Percent
+
               ex. Percent
  
 
'''Currency:'''
 
'''Currency:'''
Line 70: Line 70:
 
'''Years: '''Years that data set covers (Automatically generated but make sure its accurate)
 
'''Years: '''Years that data set covers (Automatically generated but make sure its accurate)
  
     ex. 2000-2015
+
               ex. 2000-2015
  
 
'''Source: '''Website name the data source is from
 
'''Source: '''Website name the data source is from
  
     ex. WSS JMP WHO/UNICEF JMP
+
               ex. WSS JMP WHO/UNICEF JMP
  
 
'''Original Source: '''URL of data source
 
'''Original Source: '''URL of data source
  
     ex. [https://washdata.org https://washdata.org]
+
               ex. [https://washdata.org https://washdata.org]
  
 
'''Notes: '''Any additional notes and the initials of the person that pulled the data
 
'''Notes: '''Any additional notes and the initials of the person that pulled the data
  
     ex. EB
+
               ex. EB
  
 
'''Last IFs Update: '''Automatically generated
 
'''Last IFs Update: '''Automatically generated
Line 88: Line 88:
 
'''Aggregation: '''Tells the model how to aggregate country values to groups within the model. Will automatically generate if classified during import.
 
'''Aggregation: '''Tells the model how to aggregate country values to groups within the model. Will automatically generate if classified during import.
  
     ex. POP
+
               ex. POP
  
 
'''Disaggregation: '''Tells the model how to disaggregate country values to subnational bodies when the model is broken out into its subnational form (Automatically generates).
 
'''Disaggregation: '''Tells the model how to disaggregate country values to subnational bodies when the model is broken out into its subnational form (Automatically generates).
  
     ex. GDP
+
               ex. GDP
  
 
'''Treat Nulls as 0's: '''If checked, will treat nulls in the data set as zeroes
 
'''Treat Nulls as 0's: '''If checked, will treat nulls in the data set as zeroes
Line 100: Line 100:
 
'''Name in Source: '''Information on how the data was pulled so someone can replicate exactly
 
'''Name in Source: '''Information on how the data was pulled so someone can replicate exactly
  
     ex. Sum of Rural Latrines, Septic Tanks, Sewer Connections
+
               ex. Sum of Rural Latrines, Septic Tanks, Sewer Connections
  
 
'''Used in Preprocessor: '''If checked, data set is used in preprocessor
 
'''Used in Preprocessor: '''If checked, data set is used in preprocessor
Line 112: Line 112:
 
'''Decimal Places:''' Decimal places for data in data series
 
'''Decimal Places:''' Decimal places for data in data series
  
     ex. 4
+
                ex. 4
  
 
'''Country Concordance: '''Country concordance used to import data into IFs. (Will automatically generate)
 
'''Country Concordance: '''Country concordance used to import data into IFs. (Will automatically generate)
  
     ex. IFs Country
+
                ex. IFs Country
  
 
'''Formula: '''Option during import to manipulate the data
 
'''Formula: '''Option during import to manipulate the data
  
     ex. *100
+
                ex. *100

Revision as of 22:03, 6 November 2017

IFs Historical Database Files

IFs uses Microsoft Access files to store data and data dictionary (meta-data). All data files are in the “C:/My Documents/Users/Public/IFs/Data” folder. Data and related files are listed below:

  • IFsHistSeries.Mdb is the largest and most frequently used IFs data file containing more than three thousand data tables each containing 186 rows (one row of data per country) and several columns (one column per year). The figure below shows data from an IFsHistSeries table

RTENOTITLE

  • DataDict.Mdb is the data dictionary file with a table containing one row of meta-data (e.g., definition, unit, source, last date of update) for each of the data tables in IFsHistSeries.Mdb

Screenshot from DataDict

  • IFs.Mdb is the Microsoft Access file that contains several IFs data tables. One of these table - "Country Translation" - is requied for automated IFs data import/update. Country Translation table maintains (and updates) a concordance list between country names used by IFs and data sources
  • IFsWVSCohort.Mdb is the file that contains data from waves of World Value Survey, a global survey of cultural values conducted by University of Michigan.
  • IFsDataImport.Mdb, is an MS Access database that holds the data series imported using IFs software's automated single series 'import' interface.
  • IFsDataImportBatch.Mdb, is the Access database that houses the data series imported using IFs software's automated batch import interface.

IFs Data Table Naming Convention

Names of all data tables in IFsHistSeries.MDB start with the prefix “Series”. The “Series” prefix is followed by an issue area prefix, e.g., “Ag” for agriculture or “Ed” for education. This second tier of prefix might be followed by additional prefixes (e.g., "EdSec" for secondary education) or might be absent altogether (e.g., in some of the earlier imports).

Data series names might also contain a suffix, the usual purpose of which is to differentiate among the sources for the same/similar series.

No spaces or symbols (other than %) are allowed in series name.

IFs DataDict Columns

The Datadict.mdb file serves as a reference for all series in the IFsHistSeries.mdb file. Every series in IFsHistSeries has an entry in DataDict containing all of the metadata on that series. The Data Dictionary lists each variable, the groups to which it belongs (e.g., Agriculture, Economics) its subgroup (e.g., Trade, Consumption), and additional identifying information. This information includes whether or not the data is a series (Yes/No), CoVaTrA, Cohort. It also includes a definition of the variable, and a column for an extended definition provided by the data source. The data dictionary has columns identifying the years for which a series has data, the source of the data, the original source (e.g. a series may have been pulled from the FAO website, but may have originated as World Bank research.) and the source name of the series, and an identifier for which team member last updated the series and when. It also includes instructions on how data should be aggregated or disaggregated for provincial models (e.g., by population or GDP distribution). Some additional information is supplied that is used by the model such as whether a datum of 0 should be treated as a null or as a zero, if a series is used in the preprocessor, if it is compared to other forecasts, the number of decimal places to read, and any formulas applied to the data.

DataDict Inputs

The following describes each column in the DataDict and provides an example of each.

Variable: Name of variable you assign (Automatically generated from import)

                ex. WSSJMPSanitationRural%Improved

Table: Automatically generated 

                ex. SeriesWSSJMPSanitationRural%Improved

Group: Group or category you assign to the variable (Automatically generated from import)

                ex. Infrastructure, Water, Health

Subgroup: Subroup you assign to the variable (Automatically generated from import)

                ex. Sanitation

Series: Yes

CoVaTra: No

Cohort: No

Definition: Specific definition of the variable that will be displayed in IFs

               ex. Proportion of Rural population served with Improved Sanitation (%)

Extended Source Defenition: Additional information about the variable if needed

               ex. Likely to ensure hygenic separation of human excreta from human contact. They include: 1) Flush/pour flush to (piped sewer system septic tank or pit latrine) 2) Ventilated improved pit (VIP) latrine 3) Pit latrine                       with slab 4) Composing toilet

Units: Specification of the units the variable is measured by

               ex. Percent

Currency:

Years: Years that data set covers (Automatically generated but make sure its accurate)

               ex. 2000-2015

Source: Website name the data source is from

               ex. WSS JMP WHO/UNICEF JMP

Original Source: URL of data source

               ex. https://washdata.org

Notes: Any additional notes and the initials of the person that pulled the data

               ex. EB

Last IFs Update: Automatically generated

Aggregation: Tells the model how to aggregate country values to groups within the model. Will automatically generate if classified during import.

               ex. POP

Disaggregation: Tells the model how to disaggregate country values to subnational bodies when the model is broken out into its subnational form (Automatically generates).

               ex. GDP

Treat Nulls as 0's: If checked, will treat nulls in the data set as zeroes

Proprietary: Leave blank

Name in Source: Information on how the data was pulled so someone can replicate exactly

               ex. Sum of Rural Latrines, Septic Tanks, Sewer Connections

Used in Preprocessor: If checked, data set is used in preprocessor

Used in Preprocessor File Name: File name that is used in preprocessor. If Used in Preprocessor is checked, this should be filled

Compare Other Forcasts: Leave blank

Code in Source: Only used in batch pulls

Decimal Places: Decimal places for data in data series

                ex. 4

Country Concordance: Country concordance used to import data into IFs. (Will automatically generate)

                ex. IFs Country

Formula: Option during import to manipulate the data

                ex. *100