class: top, left, title-slide # desca[R]gando datos de la SEC ## Meetup de R-Ladies Madrid ### Ana Guardia ### Noviembre 2019 --- class: center, middle, logo, letragrande # Datos de contacto <a href="mailto:r.popiula@gmail.com"><i class="fa fa-paper-plane fa-fw"></i> r.popiula gmail</a><br> <a href="http://twitter.com/anuskig"><i class="fa fa-twitter fa-fw"></i> anuskig</a><br> <a href="http://github.com/popiula"><i class="fa fa-github fa-fw"></i> popiula</a><br> --- class: center, middle, logo # [R] en finanzas --- class: logo, letragrande # Instituciones financieras que usan [R] .pull-left[ American Express <br>ANZ <br>Bank of America <br>Barclays Bank <br>Bazaj allianz Insurance <br>Bharti Axa insurance <br>Blackrock <br>Citibank <br>Dun &Bradstreet <br>Fidelity <br>HSBC <br>JP Morgan ] .pull-right[ KeyBank <br>Lloyds Banking <br>RBS <br>Standard Chartered <br>UBS <br>Wells Fargo <br>Goldman Sachs <br>Morgan Stanley <br>PNC Bank <br>Citizens Bank <br>Fifth Third Bank ] Fuente: Quora --- class: logo, letragrande <br><br><br> [Barclays' Gordon: Risk managers to be trained in Python and R ](https://www.bobsguide.com/guide/news/2019/Nov/20/barclays-gordon-risk-managers-to-be-trained-in-python-and-r/) <br>By Rebekah Tunstead | 20 November 2019 “I use the data to ask a lot of questions around risk, around our portfolio, where we are exposed, and really just looking for very quick answers. I don’t mind what solutions or software they are using… it’s just 'can I trust the data?' and 'can I make some key decisions with that data?' and the faster I get that answer back with certainty, the happier I am,” said Gordon. “What we have done though, is now I am retraining a lot of my risk managers use Python and R, so that they can actually access the data and answer the questions themselves, and that has been great. Now we can trust the data, we can actually start deploying it.” --- class: logo, letragrande <br><br><br> [Bank of America uses R for reporting ](https://blog.revolutionanalytics.com/2014/06/bank-of-america-uses-r-for-reporting.html) <br>By David Smith | June 23, 2014 “[R] is also catching on on Wall Street. Traditionally, banking analysts would pore over Excel files late into the night, but now R is increasingly being used for financial modeling, particularly as a visualization tool, says Niall O’Connor, vice president at Bank of America. “R makes our mundane tables stand out,” he says.” --- class: logo, letragrande # Personas [Joshua Ulrich](http://www.joshuaulrich.com/) | [Dirk Eddelbuettel](http://dirk.eddelbuettel.com/) | Jeffrey Ryan | Brian Peterson | Peter Carl [Carlos Gil Bellosta](https://www.datanalytics.com/bio/) [Jonathan Regenstein](https://resources.rstudio.com/authors/jonathan-regenstein) + Crean y mantienen paquetes financieros + Organizan [R/Finance](http://www.rinfinance.com/previous.html), una conferencia que se organiza todos los años en Chicago # ¿R-Ladies? [Conney Marulanda](https://finanzaszone.com/) --- background-image: url("img/scott-webb-z1623zxkgkU-unsplash.jpg") background-size: cover class: center, middle, inverse # Paquetes específicos --- class: logo <br><br><br> .pull-left[ <!-- --> ] .pull-right[ <table> <thead> <tr> <th style="text-align:left;"> paquete </th> <th style="text-align:right;"> descargas </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> tidyverse </td> <td style="text-align:right;"> 446333 </td> </tr> </tbody> </table> ] --- class: logo <br><br><br> .pull-left[ <!-- --> ] .pull-right[ <table> <thead> <tr> <th style="text-align:left;"> paquete </th> <th style="text-align:right;"> descargas </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> quantmod </td> <td style="text-align:right;"> 182929 </td> </tr> <tr> <td style="text-align:left;"> TTR </td> <td style="text-align:right;"> 179185 </td> </tr> <tr> <td style="text-align:left;"> PerformanceAnalytics </td> <td style="text-align:right;"> 34045 </td> </tr> <tr> <td style="text-align:left;"> tidyquant </td> <td style="text-align:right;"> 14930 </td> </tr> <tr> <td style="text-align:left;"> highcharter </td> <td style="text-align:right;"> 8460 </td> </tr> <tr> <td style="text-align:left;"> RQuantLib </td> <td style="text-align:right;"> 2335 </td> </tr> <tr> <td style="text-align:left;"> fAssets </td> <td style="text-align:right;"> 2018 </td> </tr> <tr> <td style="text-align:left;"> fredr </td> <td style="text-align:right;"> 1823 </td> </tr> <tr> <td style="text-align:left;"> scorecard </td> <td style="text-align:right;"> 1353 </td> </tr> <tr> <td style="text-align:left;"> alfred </td> <td style="text-align:right;"> 571 </td> </tr> <tr> <td style="text-align:left;"> pedquant </td> <td style="text-align:right;"> 250 </td> </tr> </tbody> </table> ] <!-- --- --> <!-- class: logo --> <!-- <br><br><br> --> <!-- .pull-left[ --> <!-- ```{r echo = FALSE} --> <!-- # width="800px", height="400px", align = "center" --> <!-- library("ggplot2"); library("dlstats"); require("gridExtra"); require("dplyr") --> <!-- x <- cran_stats(c("riskParityPortfolio", "IKTrading", "FinancialInstrument", "quantstrat", --> <!-- "blotter")) --> <!-- colnames(x) <- c("inicio", "fecha", "descargas", "paquete") --> <!-- if (!is.null(x)) { --> <!-- head(x) --> <!-- ggplot(x, aes(fecha, descargas, group=paquete, color=paquete)) + --> <!-- geom_line() + geom_point(aes(shape=paquete)) --> <!-- } --> <!-- ``` --> <!-- ] --> <!-- .pull-right[ --> <!-- ```{r echo = FALSE} --> <!-- if (!is.null(x)) { --> <!-- f <- max(x$fecha) --> <!-- y <- x %>% filter(fecha == f) %>% select(paquete, descargas) %>% arrange(desc(descargas)) --> <!-- # ggplot(data=y, aes(x=paquete, y=descargas, fill=paquete)) + geom_bar(stat="identity", position="stack") --> <!-- knitr::kable(y, row.names = F, format = 'html') --> <!-- } --> <!-- ``` --> <!-- ] --> --- class: logo <br><br><br> .pull-left[ <!-- --> ] .pull-right[ <table> <thead> <tr> <th style="text-align:left;"> paquete </th> <th style="text-align:right;"> descargas </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> zoo </td> <td style="text-align:right;"> 498172 </td> </tr> <tr> <td style="text-align:left;"> xts </td> <td style="text-align:right;"> 217284 </td> </tr> <tr> <td style="text-align:left;"> tseries </td> <td style="text-align:right;"> 168155 </td> </tr> <tr> <td style="text-align:left;"> timetk </td> <td style="text-align:right;"> 17455 </td> </tr> </tbody> </table> ] --- class: logo, letragrande ### Trading, riesgos y gestión de carteras [quantmod](https://cran.r-project.org/web/packages/quantmod/index.html): Quantitative Financial Modelling & Trading Framework for R. [TTR](https://cran.r-project.org/web/packages/TTR/index.html) - Functions and data to construct technical trading rules with R. [PerformanceAnalytics](https://cran.r-project.org/web/packages/PerformanceAnalytics/index.html) - Econometric tools for performance and risk analysis. [RQuantLib](https://cran.r-project.org/web/packages/RQuantLib/index.html) - [Quantlib.org](https://www.quantlib.org/) [scorecard](https://cran.r-project.org/web/packages/scorecard/index.html) - Credit Risk Scorecard [pedquant](https://cran.rstudio.com/web/packages/pedquant/index.html) - Public Economic Data and Quantitative Analysis [fAssets](https://cran.r-project.org/web/packages/fAssets/index.html) - Analysing and Modelling Financial Assets. [IKTrading](https://rdrr.io/github/pdrano/IKTrading/) - no está en [CRAN](https://cran.r-project.org/) - [github](https://github.com/pdrano/IKTrading/) [Rblpapi](https://cran.r-project.org/web/packages/Rblpapi/index.html) - api para comunicación con Bloomberg [FinancialInstrument](https://cran.r-project.org/web/packages/FinancialInstrument/index.html) - [tidyquant](https://cran.r-project.org/web/packages/tidyquant/index.html) - [highcharter](https://cran.r-project.org/web/packages/highcharter/index.html) --- class: logo, letragrande ### Series temporales [zoo](https://cran.r-project.org/web/packages/zoo/index.html) - S3 Infrastructure for Regular and Irregular Time Series. [xts](https://cran.r-project.org/web/packages/xts/index.html) - eXtensible Time Series. [tseries](https://cran.r-project.org/web/packages/tseries/index.html) - Time series analysis and computational finance. [timetk](https://cran.r-project.org/web/packages/timetk/index.html) ### Datos abiertos [fredr](https://cran.r-project.org/web/packages/fredr/index.html) - [alfred](https://cran.r-project.org/web/packages/alfred/index.html) - Saint Louis FED [wbstats](https://cran.r-project.org/web/packages/wbstats/index.html) - [WDI](https://cran.r-project.org/web/packages/WDI/index.html) - Banco Mundial [data360r](https://tcdata360.worldbank.org/tools/data360r) - publicado por el Banco Mundial - [presentación en sociedad](https://blogs.worldbank.org/opendata/introducing-data360r-data-power-r) y muchos más en [ésta lista de CRAN](https://cran.r-project.org/web/views/Finance.html) --- background-image: url("img/giraffe_forest_social.jpg") background-size: cover class: center, middle, inverse # R es divertido <br> ## Hay recursos infinitos para aprender... ## busca tu camino --- class: middle, logo, letragrande # en español... [Curso de introducción a R en castellano de la Universidad de Valencia](https://www.uv.es/vcoll/curso_r.html) [R para profesionales de los datos: una introducción](https://www.datanalytics.com/libro_r/index.html) [Ciencia de datos para curiosos](https://bookdown.org/martinmontaneb/CienciaDeDatos/) [Curso de la uc3m](http://ocw.uc3m.es/estadistica/aprendizaje-del-software-estadistico-r-un-entorno-para-simulacion-y-computacion-estadistica) [Introducción a la programación en R](https://rsanchezs.gitbooks.io/rprogramming/content/index.html) --- class: middle, logo, letragrande # o en inglés... [Welcome to the Wonderful World of TEACUPS, GIRAFFES, & STATISTICS](https://tinystats.github.io/teacups-giraffes-and-statistics/) [An Introduction to R](https://cran.r-project.org/doc/manuals/r-release/R-intro.html#Writing-your-own-functions) [Data science with R](https://garrettgman.github.io/) [What they forgot to teach you about R](https://rstats.wtf/) [Advanced R course](https://privefl.github.io/advr38book/index.html) [Cookbook for R](http://www.cookbook-r.com/) [The R book](https://www.cs.upc.edu/~robert/teaching/estadistica/TheRBook.pdf) [Varios escritos con el paquete bookdown](https://bookdown.org/) --- # Tutoriales para crear paquetes [R package primer - a minimal tutorial](https://kbroman.org/pkg_primer/) [R packages](http://r-pkgs.had.co.nz/) [Making an R package](http://portal.stats.ox.ac.uk/userdata/ruth/APTS2012/Rcourse10.pdf) [Creating packages Leisch](https://cran.r-project.org/doc/contrib/Leisch-CreatingPackages.pdf) # Consejos de [Best Practices for Scientific Computing](https://arxiv.org/abs/1210.0530) + Write programs for people, not computers.<br> + Automate repetitive tasks<br> + Use the computer to record history<br> + Make incremental changes<br> + Use version control<br> + Don’t repeat yourself (or others)<br> + Plan for mistakes<br> + Optimize software only after it works correctly<br> + Document design and purpose, not mechanics<br> + Collaborate<br> --- class: center, middle, logo # C[R]eando un paquete: sec13f --- class: logo, middle, center, letragrande # ¿Por qué estructurar el código en funciones? # ¿Por qué empaquetarlas? ## name <- function(arg_1, arg_2, …) expression ## install.packages("tidyverse") # reutilizar + compartir --- class: logo # Why write an R package? "R packages and the Comprehensive R Archive Network (CRAN) are incredibly important features of R. R packages provide a **simple way to distribute R code and documentation**. Packages on CRAN are basically guaranteed to be installable, as they are regularly built, installed, and tested on multiple systems. And R packages really are **quite simple to create**. It used to be that the documentation format was a big pain and so a big barrier to writing a package. But Roxygen2 has greatly simplified that part, and so it should no longer be a barrier. Write an R package to **keep track of the miscellaneous R functions that you write and reuse**. If they’re in a package, it’ll be easier to keep track of them, and so you’ll be much more likely to reuse them. Write an R package to distribute the data and software that accompany a paper. This really is the easiest way to distribute R code and associated data. R packages can be big and important, but that shouldn’t scare you off. I can’t emphasize enough: assembling a few R functions within a package will make it way easier for you to use them regularly. You don’t need to distribute the package to anyone." De R package primer --- class: logo, middle .pull-left[  ] .pull-right[ <br> # Procesado masivo y automático del formulario 13f de la SEC [https://github.com/Popiula/sec13f.es](https://github.com/Popiula/sec13f.es) ] --- class: logo, letragrande, center, middle # Motivación ## Mejores ideas Las acciones con más peso de los hedge funds que han tenido un mejor rendimiento en el último año, consiguen un 7% por encima de la referencia del mercado. --- class: logo, letragrande, center, middle # Formulario 13f Estados Unidos <br> Inversores institucionales <br> Más de 100 millones de dólares <br> Obligación de informar trimestralmente de la composición de sus carteras <br> Regulador del mercado de valores: SEC <br> Acceso público --- class: logo, middle, center  --- class: logo, center, middle, letragrande # Objetivo Desarrollar una herramienta de **código abierto** para construir, de forma **automática**, una **base de datos indexada** con toda la información del formulario 13F de la SEC. Proceso automático que engloba todas las fases de construcción de una base de datos con la información contenida en el formulario. --- class: logo, center, middle # Automatización del proceso  --- class: logo, center, middle # Extracción  --- class: logo, center, middle # Estructura/transformación  --- class: logo, middle, center # Ejemplo de dataframes  --- class: logo, center, middle # Carga/almacenamiento  --- class: logo, center, middle # Herramientas  --- class: logo, middle, center  --- class: logo, middle, center  --- class: logo, middle, center  --- class: logo, middle, center  --- class: logo, middle, center  --- class: logo, middle, center  --- class: logo, middle, center  --- class: logo, middle, center Patrón de envío  --- class: logo, middle, center ## Estadísticas de los retrasos  Datos censurados La media decreciente puede explicarse porque hay observaciones futuras que aun no tengo (y que sí tengo para los primeros años) y que subirán la media de los últimos años. --- class: logo, middle, center  --- class: logo, center, middle # Comentarios/preguntas/reflexiones --- class: logo, center, middle # 💜 ¡Gracias! --- background-image: url("img/sharon-mccutcheon-H_FbsufW7yw-unsplash.jpg") background-size: cover class: middle, inverse ## Créditos Presentación creada con el paquete de R [Xaringan](https://github.com/yihui/xaringan) y la [plantilla para RLadies de Alison Hill](https://alison.rbind.io/post/2017-12-18-r-ladies-presentation-ninja/). <br> Imágenes de [Unsplash](https://unsplash.com) de: [Jon](https://unsplash.com/@j_mk18), [Orlova Maria](https://unsplash.com/@orlovamaria), [Sharon McCutcheon](https://unsplash.com/@mccutcheon) <br> Ilustraciones de [Desirée de Leon](http://desiree.rbind.io/) bajo [licencia Creative Commons 4.0 Internacional (CC BY-NC-ND 4.0)](https://creativecommons.org/licenses/by-nc-nd/4.0/legalcode.es)     <br> Estadísticas de descargas: paquete [dlstats](https://cran.r-project.org/web/packages/dlstats/index.html)