Subjects

Your browser does not support JavaScript, or its support is disabled. Some features may not be available.

SAS Aplication in Demography III - MD360P46R

Title:	Demografické aplikace SAS III
Czech title:	Demografické aplikace SAS III
Guaranteed by:	Department of Demography and Geodemography (31-360)
Faculty:	Faculty of Science
Actual:	from 2017
Semester:	winter
E-Credits:	3
Examination process:	winter s.:combined
Hours per week, examination:	winter s.:1/1, Ex [HT]
Capacity:	unlimited
Min. number of students:	unlimited
4EU+:	no
Virtual mobility / capacity:	no
State of the course:	taught
Language:	Czech
Level:	specialized
Note:	enabled for web enrollment

Guarantor:	prof. RNDr. Jitka Rychtaříková, CSc.
Teacher(s):	prof. RNDr. Jitka Rychtaříková, CSc.
Incompatibility :	MD360P46
Is incompatible with:	MD360P46

Opinion survey results Examination dates WS schedule

Annotation -

Last update: prof. RNDr. Jitka Rychtaříková, CSc. (29.06.2021)

The course provides students with the SAS software including practical examples. In addition, selected procedures from SAS/STAT software are taught in order to use multidimensional statistics in demography. SAS/STAT (BOXPLOT, ANOVA, FACTOR, STDIZE, CLUSTER, DISTANCE, TREE, VARCLUS).

Literature -

Last update: prof. RNDr. Jitka Rychtaříková, CSc. (29.06.2021)

Obligatory literature:

Base SAS 9.4 Statistical Procedures

SAS/STAT 14.2 User's Guide.

Hendl, J. (2004): Přehled statistických metod zpracování dat. Praha: Portál.

Requirements to the exam -

Last update: prof. RNDr. Jitka Rychtaříková, CSc. (29.06.2021)

Examination: written. Precondition is the final written test (program preparation) and active participation in lessons are required.

Syllabus -

Last update: prof. RNDr. Jitka Rychtaříková, CSc. (04.07.2021)

1. The BOXPLOT Procedure (SAS/STAT). Box-and-whisker plots, referred also as a box plot displays the mean, quartiles, and minimum and maximum observations for a group. The length of the box represents the interquartile range (the distance between the 25th and the 75th percentiles), the dot in the box interior represents the mean, the horizontal line in the box interior represents the median, the vertical lines issuing from the box extend to the minimum and maximum values of the analysis variable. BOXSTYLE=SKELETAL (the whiskers are drawn from the edges of the box to the extreme values of the group). BOXSTYLE=SCHEMATIC, a whisker is drawn from the upper edge of the box to the largest observed value within the upper fence and from the lower edge of the box to the smallest observed value within the lower fence.

2. The UNIVARIATE Procedure (Base SAS Statistical Procedures). Descriptive (summary) statistics based on moments (mean, variance, standard deviation, coefficient of variation, skewness, kurtosis), quantiles, mode(s), extreme values, frequencies. Confidence intervals for the mean, standard deviation, and variance. FREQ and WEIGHT statements. Histograms (HISTOGRAM), options (parametric distributions, kernel density estimation-nonparametric, graphic options). Placement of a box or a table of summary statistics in the graph (INSET). Quantile-Quantile plots (Q-Q plots), and probability-probability plots (P-P plots). Grouping data or creating comparative plots with CLASS statement. Rounding values of a variable (ROUND). Goodness-of-fit tests for a variety of distributions including the normal.

3. The FREQ Procedure (Base SAS Statistical Procedures). Creating one-way and n-way frequency and contingency (crosstabulation) tables. Goodness-of-fit tests for equal proportions or specified null proportions, and confidence limits. Testing for association in a crosstabulation table. TABLES (specify the type of a table, 2x2 Tables – Odds ratio and Relative Risks). TEST (Chi-Square Test), Pearson Correlation Coefficient, Spearman Rank Correlation Coefficient. WEIGHT statement.

4. The CORR Procedure (Base SAS Statistical Procedures). Pearson product-moment correlation (parametric measure of association for two variables. It measures the strength and direction of a linear relationship), Spearman rank-order correlation (nonparametric measure of association based on the ranks), Kendall’s tau-b coefficient (measure of association based on the number of concordances and discordances in paired observations). Pearson, Spearman, Kendall partial correlation (PARTIAL statement, a partial correlation measures the strength of a relationship between two variables while controlling the effect of other variables). FREQ and WEIGHT statements available.

5. ODS Graphics (Base SAS). The ODS Graphics procedures, sometimes called Statistical Graphics procedures produce plots for exploratory data analysis.The SGPLOT procedure creates line plots, scatters, histograms, area plots, etc. Different types of scatter plots, also in panels and with different layouts. The SGSCATTER procedure creates a paneled graph of scatter plots for multiple combinationsof variables. The SGPIE Procedure produces pie charts and donut charts. The SGPANEL procedure creates a panel of graph cells for the values of one or more classificationvariables.

6. Standardization Procedures: Standard, Stdize.

· The STANDARD Procedure (Base SAS). The procedure standardizes variables in a SAS data set to a given mean and standard deviation, and it creates a new SAS data set containing the standardized values.

· The STDIZE Procedure (SAS/STAT). The STDIZE procedure standardizes one or more numeric variables in a SAS data set by subtracting a location measure and dividing by a scale measure. A variety of location and scale measures are provided.

7. The ANOVA Procedure (SAS/STAT). The analysis of variance (ANOVA) for balanced data. The goal is to test for differences among the means of the levels and to quantify these differences. The classification variable is specified in the CLASS statement. Response variables must be numeric. Tukey’s multiple comparison tests for each level of the main effects can be produced. Procedure GLM handles unbalanced data.

8. MACRO Facility. Macro variables, macro Functions, macro Statements, macro Programs. Define and invoke a Macro Variable. Macro with Parameters. Positional macro arguments, Keyword macro arguments. Define and invoke Macro Program. Including External Macros, Autocall Macro Libraries.

9. The FACTOR Procedure (SAS/STAT) performs a variety of common factor and component analyses and rotations. The purpose of common factor analysis is to explain the correlations or covariances among a set of variables in terms of a limited number of unobservable, latent variables. Factor extraction includes principal component analysis, factor rotation, factor loadings, factor scores. FREQ and WEIGHT statements.

10. The DISTANCE Procedure (SAS/STAT) computes various measures of distance, dissimilarity, or similarity between the observations (rows) of an input data set, which can contain numeric or character variables, or both, depending on which proximity measure is used. Various nonparametric and parametric methods can be used for standardizing variables.

11. The CLUSTER Procedure (SAS/STAT). The purpose of cluster analysis is to place objects into groups or clusters suggested by the data, not defined a priori, such that objects in a given cluster tend to be similar to each other in some sense, and objects in different clusters tend to be dissimilar. The CLUSTER Procedure performs hierarchical clustering of observations. The data can be coordinates or distances. Scaling or transforming variables. Computing Euclidean distances or using the Distance Procedure and Distance Matrix. Different (11) clustering methods, creating an output dataset (OUTTREE) in order to draw a tree diagram by TREE procedure.

12. The TREE Procedure (SAS/STAT). The tree procedure reads a data set created by the CLUSTER or VARCLUS procedure and produces a tree diagram (dendrogram or phenogram). Horizontal or vertical tree diagram. ID statement (identifies objects).

13. The VARCLUS Procedure (SAS/STAT) divides a set of numeric variables into disjoint or hierarchical clusters. Associated with each cluster is a linear combination of the variables in the cluster. This linear combination can be either the first principal component (the default) or the centroid component. The first principal component is a weighted average of the variables that explains as much variance as possible. Centroid components are unweighted averages of either the standardized variables (the default) or the raw variables. A dendrogram of variable clusters is displayed.

Last update: prof. RNDr. Jitka Rychtaříková, CSc. (29.06.2021)

1) The BOXPLOT Procedure (SAS/STAT). Krabicový graf s anténami (box-and-whisker plot), označovaný též boxplot znázorňuje medián, eventuálně průměr, kvartily, minimum a maximum v dané skupině, odlehlé a extrémní hodnoty. Délka krabičky se rovná kvartilovému rozpětí QR (rozdíl mezi hodnotou horního a dolního kvartilu), uvnitř je čára mediánu, horní anténa odpovídá maximu (resp. maximu v pásmu Q3 +1,5QR) a dolní minimu (resp. minimu v pásmu Q1 – 1,5QR). BOXSTYLE=SKELETAL (antény znázorňují minimum a maximum). BOXSTYLE=SCHEMATIC (antény znázorňují maximum v pásmu Q3 +1,5QR a minimum v rámci Q1 – 1,5QR; BOXSTYLE=SCHEMATICID a BOXSTYLE=SCHEMATICIDFAR znázorňují odlehlé hodnoty, respektive extrémní hodnoty. Umístění souhrnných statistik do grafu INSET INSETGROUP.

2) The UNIVARIATE Procedure (Base SAS). Statistiky založené na momentech (průměr, rozptyl, směrodatná odchylka, variační koeficient, šikmost, špičatost), kvantily, modus, odlehlé hodnoty, četnosti. Intervaly spolehlivosti pro průměr, směrodatnou odchylku a varianci. Příkazy FREQ a WEIGHT. Histogram (HISTOGRAM), proložení histogramu (parametrickými rozloženími, neparametrickým jádrovým - kernel). Umístění textu a souhrnných statistik do grafu (INSET). PROBPLOT and Q-Q diagram (QQPLOT). Skupinová data a tvorba skupinových (srovnávacích grafů), příkaz CLASS. Zaokrouhlování hodnot proměnných (ROUND). Testování statistik. The MEANS Procedure (Base SAS).

3) The FREQ Procedure (Base SAS). Tvorba jednorozměrných a vícerozměrných tabulek četností. Testování závislostí v kontingenčních tabulkách. (TABLES definuje typ tabulky; specificky tabulky 2x2 – poměr šancí a relativní riziko). TEST (Chi-kvadrát test, Pearsonův korelační koeficient, Spearmanův korelační koeficient pořadí). Příkaz WEIGHT.

4) The CORR Procedure (Base SAS). Pearsonův korelační koeficient (parametrická míra asociace dvou proměnných). Měří sílu a směr lineární asociace. Spearmanův koeficient pořadové korelace (neparametrická míra asociace založená na pořadí), Kendallův koeficient pořadové korelace (Kendallovo tau, míra založená na počtu konkordancí a diskordancí párových pozorování). Pearsonův, Spearmanův, Kendallův koeficient parciální korelace (PARTIAL statement). Příkazy FREQ a WEIGHT.

5) ODS Graphics (Base SAS). Tvorba „statistických“ grafů s pokročilými možnostmi fomátování, barev ap. The SGPLOT Procedure umožňuje tvorbu velkého množství různých druhů grafů: korelační diagramy, histogramy, liniové grafy, sloupcové grafy, výstupy z regrese a další. The SGSCATTER Procedure umožňuje pokročilou tvorbu různě uspořádaných korelačních diagramů. The SGPANEL Procedure umožňuje skupinové uspořádání grafů, které také mohou být rozdílně formátovány.

6) Transformace dat, (statistická) standardizace: Standard, Stdize.

· The STANDARD Procedure (Base SAS). Standardizace vzhledem k danému průměru a dané směrodatné odchylce, vytváří se nový datový soubor, který obsahuje standardizované proměnné (standardizovaná data nebo-li standardizované skóry).

· The STDIZE Procedure (SAS/STAT). Standardizace vzhledem k dané konstantě a měřítku. Více možností oproti proc standard.

7) The ANOVA Procedure (SAS/STAT). Analýza variance pro vyvážená data. Efekt jednoho faktoru na závisle proměnnou. Hlavní a interakční efekty kategoriálních nezávislých proměnných na závisle proměnnou kvantitativního typu. F-testovací charakteristika rozdílnosti skupinových průměrů. Procedure GLM pro vyvážená a nevyvážená data.

8) MAKRA. Typy a použití. Makro proměnná, přiřazení hodnot a použití. Makro jako část programu definovaná uživatelem pro opakované použití. Makra v knihovně programů SAS a jejich aplikace.

9) The FACTOR Procedure (SAS/STAT). Cílem je popis chování množiny proměnných pomocí menšího počtu nových proměnných – faktorů. Metoda hlavních komponent, faktorové zátěže, faktorová rotace, faktorové skóry. Příkazy FREQ a WEIGHT.

10) The DISTANCE Procedure(SAS/STAT). Výpočet matice vzdáleností mezi objekty pro různé typy proměnných.

11) The CLUSTER Procedure (SAS/STAT). Cílem shlukové analýzy je rozdělit objekty do skupin na základě jejich podobnosti. Transformace proměnných. Omezený výpočet matice vzdáleností.

12) The TREE Procedure (SAS/STAT). Tvorba dendrogramu, horizontálně, vertikálně a další formátování.

13) The VARCLUS Procedure (SAS/STAT). Seskupování proměnných. Metoda redukce proměnných.