Program

October, 17

13:50 – 14:00

Reception

14:00 – 14:45

Model Misspecification: Classical Statistics vs. Info-Metrics

With imperfect and incomplete information, it is quite common to misspecify a model. This problem exists not only in the social and behavioral sciences, where the underlying models are often a mystery, but also in the other sciences. Traditionally, misspecification deals with the basic issues of model selection (such as the choice of the functional form, moment specification, etc.), variable selection, and frequently the choice of likelihood or the choice of the statistical inferential method itself. Within the info-metrics framework – the science of modeling, reasoning, and drawing inferences under conditions of noisy and insufficient information – misspecification may appear in three ways. The first is to do with the specification of the constraints (the functional form used, based on the input information). The second is to do with the choice of the criterion or decision function. Whether specified correctly or not, together they determine the solution. The third is to do with priors’ misspecification. In this talk, I am concerned with the first two fundamental misspecifications: the constraints and the criterion. (Note, however that the empirical problem of variable selection for a specific model is similar across all inferential methods, so I do not discuss it here.)
In my talk, I will discuss the above misspecification issues and will contrast classical methods with info-metrics. I will demonstrate some of the main issues via a simple example where I investigate power law distributions using Shannon entropy and the Empirical Likelihood. I show that though they both yield the same prediction, one of them is misspecified. But which one?

My talk will be based on my new book ‘Foundations of Info-Metrics: Modeling, Inference, and Imperfect Information’, http://info-metrics.org/ in which I develop and examine the theoretical underpinning of info-metrics and provide extensive interdisciplinary applications.

14:45 – 15:30

Enrico Ciavolino (University of Salento; SARA Lab)

Streaming Generalized Cross Entropy

Real-time data generation has become pervasive in many fields: from social sciences, aerospatial and neuroimaging, the continuous and rapid production of new data makes their treatment and analysis difficult. Each time a new data unit occurs, the new parameters estimation has to be based on all the previous batch of data, resulting in a slow and expensive parameter estimation process.

Inspired by these needs, an approach for dealing with streaming data, the Streaming Generalized Cross Entropy (SGCE), has been developed and will be introduced in this talk. SGCE consists of a two-step procedure, where the information generated from a batch of data is used to update the parameters estimation of a new observation. This approach streamlines and speeds up parameters estimation.

15:30 – 16:00

Coffee break

16:00 – 16:30

Carlos Pires (Institute Dom Luiz (IDL), Faculty of Sciences, University of Lisbon)

Minimum mutual information constrained by imposed cross statistics: properties, estimation, and examples

Mutual information I(X,Y) between random variables X,Y comes from their cross statistical dependency (e.g. due to linear or nonlinear correlations). Here, we solve a maximum-entropy variational problem, providing the Minimum Mutual Information (MinMI) that is guaranteed under fixed constrained marginal Gaussian distributions and cross moments up to fourth order (correlation, co-skewness and co-kurtosis tensor components). MinMI is generally overestimated due to artificial untrue dependencies emerging from finite samples. Therefore, we provide asymptotic formulas for the bias, variance and pdf of MinMI estimators, depending on estimator’s statistics of imposed moments. Finally, MinMI behavior is studied in two situations. Firstly, generating synthetic quite long samples for different linear and/or nonlinear (X,Y) relationships, for a range of the signal-to-noise ratios and different noise types (additive, multiplicative, Gaussian, non-Gaussian). Secondly the MinMI is estimated for a short finite sample of climatic variables. From that, we conclude that a trade-off choice must exist between the maximum order of imposed cumulant statistics and its impact on the MinMI bias.

16:30 – 17:00

António Xavier (CEFAGE, University of Évora)

An approach based on entropy to disaggregate agricultural data at the field level

The Portuguese agriculture faces complex challenges at the beginning of a new programming period and disaggregated information is crucial for policy analysis and evaluation. This communication presents an alternative approach to disaggregating agricultural data concerning land-use at the pixel level. The proposed approach combines several techniques, such as HJ-Biplot, cluster analysis, dasymetric mapping and cross-entropy, and it is implemented in two steps. First, prior information is estimated based on the application of a HJ-Biplot and cluster analysis and using a dasymetric mapping methodology. Then, the estimated prior information is used in a cross-entropy model to disaggregate data at the pixel level in a context of incomplete information. This approach is applied to the Algarve and Alentejo regions in southern Portugal. The results show a significant correlation between observed and estimated land-uses.

17:00 – 17:30

Aníbal Galindro (CETRAD, University of Trás-os-Montes and Alto Douro)

Wine productivity per farm size: A maximum entropy application

The size of a farm is one of the factors that influence its productivity, therefore the very small wine-farms that characterize the Portuguese Demarcated Douro Region (Baixo Corgo, Cima Corgo and Douro Superior) is considered a limitating factor in the profitability of the wine farms. In order to assess the problem the farms were categorized in nine different size ranges, since those variables outnumbered the available seven observations, the Generalized Maximum Entropy (GME) estimator was used. According to the simulations, larger farms (with an area greater than 20 ha) on Douro Superior and Cima Corgo reveal higher marginal productivity given the current state of the region. On the other hand, Baixo Corgo’s results suggest that medium-sized farms (with area ranges between 2 and 5 ha) display higher marginal increments to the region productivity.

An Info-Metrics tutorial will be given by Professor Amos Golan in the Faculty of Economics, University of Porto, during October 15-19, including a seminar entitled “Data Confession, Torture and Truth: Classical Statistics vs. Info-Metrics”, on October 19, 1-2pm, room 613. Please, contact Professor Elvira Silva (esilva@fep.up.pt) for additional details.

 Download Program / PDF (265KB)

Organizing Committee

Andreia Dionísio, CEFAGE, University of Évora

 

Maria Conceição Costa, CIDMA, University of Aveiro

Elvira Silva, CEF.UP, University of Porto

 

Pedro Macedo, CIDMA, University of Aveiro

Sponsors

This workshop is supported in part by the Portuguese Foundation for Science and Technology (FCT – Fundação para a Ciência e Tecnologia), through CIDMA – Center for Research and Development in Mathematics and Applications, within project UID/MAT/04106/2013.