math :/

another month! so much learning and so little posting! but i was in thailand for 3 weeks. where instead of ignoring both my blog and my learning, I just ignored my blog.

and wrote such exciting programs as…

“what time is in in san francisco right now?”

and

“a program which yells at you if you don’t type in all caps”

seriously guys, i’m on my way.

but really actually seriously, i am getting on my way. every day i experience the simultaneous joy of realizing how far i’ve come and frustration of how far i have to go.

and somewhere in between those things is a less extreme realization. what you’re actually ACTUALLY really shitty at.

which brings me to today’s post: MATH

i can’t remember if i mentioned this in my inaugural post and i’m too lazy to look, but the reason I didn’t get into this when i SHOULD HAVE back in college was because of honors calculus. Like the elitist I was then and am now, I was too good for the dumbed down  “business calculus” class that my finance degree suggested. And as an honor student, I was also too good for regular calculus. So Kari enrolled in honors calc, only to develop emotional complexes with math, having been the only one dumb enough to take the class without having taken calc at some point before.

Long story short, they went too fast and I worked harder for that B- (my only B in college btw) than I ever worked before or since, up until I took on programming.

And thus I thought I sucked at math. and felt justified giving up computer science before I really tried.

and then i slowly crept into programming. inching my way, unintentionally for years. driven by an obsession for organized problem solving. a need to define myself as “a smart one.” a love of automation. a passion for creating things.

and then….

i started doing algorithms. working through problems from coworkers, friends, books.

and mother fuck, if my math shit isn’t coming back to haunt me.

granted, I was at a coding interview talk tonight and the speaker (hai gayle) asked how many people had ever used a binary tree in their professional career, and 2 people raised their hand.

STILL! this shit that i fear I am too visual for haunts me. these arbitrary concepts and theorems and formulas!

so here comes the fun part. where i self-deprecate using examples 🙂

hokay so *today* i fucked up a midpoint formula. so like, the loops were perfect, the variables well named, the approach was sound. but where was the bug? well i forgot how to fucking calculate a midpoint between 2 numbers.

so once i debug it, realize where the error is and REFUSE to google the answer, I spent 16 minutes drawing pictures of number lines, reversing out the stupid formula.

btw- (n1 + n2) / 2

anyway, i had more examples but I spent all day doing recursion and it’s midnight and i’m tired of looking at my laptop (even tho it’s all flu.x-y golden)

night nerds.

Advanced “medium-sized” data skills

Although I’ve been doing an absolute shit job updating and have lost countless amazing insights by ignoring my little blog, I’m going to immediately jump back into a rant and not bother summarizing the last 5 months. Okay maybe at the end.

Tho I will say this, I do believe I am over the learning curve at this point. I’ve learned how to decode stack overflow answers and am mostly google-sufficient. This is a very comfortable place to be in. Feels like I’m *actually* learning relevant things and moving in an *actual* direction instead of stumbling blindly from shitty half-baked advice to overly-granular explanations of irrelevant nuances.

Today I want to cautiously mention big data. Very cautiously. I will approach this subject with the same judgement-free complaining that I give all subjects here.

So, about 3 months ago, my brain started becoming uncomfortable with the inconsistencies in the way people were talking about databases. So I started a document of every word I didn’t think had a clear definition and took it to one of our analytics guys and said “explain these things.

For fun, here’s the list:

Airpal vs. Sql Pro – Are these interfaces? Why different ones?

Presto: def= “a cutting edge technology that facebook has been using at scale for the last year” <– wtf does that mean?

Production vs. Slave- which dbs have slaves and which don’t? How do slaves actually work?

Mysql-what is this actually? A language? a database? A server?

Monolithic?

Things I’ve heard people say about Hive:

“The data lives in Hive”

“Hive is an interface”

“Hive is a platform”

“Hive is a database warehouse facility”

“Hive sits on top of Hadoop”

“Hive *acts* like a database”

map-reduce- a framework that hive and hadoop are based off?

namespac?

Hadoop = manages the data

data itself sits on the servers disks as files

Hive – acts like a database

HQL?

so you get the point. the problem was, with everyone i talked to, engineers, analysts, the list got longer and longer as they used more buzz words to explain already confusing things. One of my major pet peeves with both engineering and accounting is how the same innocuous words are used to describe vastly different and actually very simple things. I’ve made it a rule that in my meetings no one is allowed to use the words “reconcile,” “system,” “account,” “sits” or “data element.”

The other challenge is you can’t always default to “explain it to me like i’m a 5 year old” because then it’s so high level, it’s not useful. The real trick, and thing domain experts often fail at is knowing how to explain something in someone else’s terms. For me, what finally clicked was comparing database things in terms of excel, which I understand backwards and forwards.

Depending on your level this may or may not be helpful but without further ado, here’s today’s ah-ha moment:

2 kinds of databases worth talking about. Relational and Document. Relational databases have been around since the 70s, they’re basically set up to hold a bunch of excel spreadsheets. Document databases are the latest trendy thing. They’re basically set up to hold a bunch of word documents. They’re less structured but fit a specific need.

Think about when you’re planning a trip and you open an google spreadsheet and then realize quickly that you don’t really know what all the rows and columns should be. Should you divy it out by city? By activity? Do you need times or just dates? So you switch to opening a google doc and just brain dumb everything and call it “Thailand 2014.” Then later when you’re more organized, you move all the pieces back into google spreadsheet since you have a better idea of how to organize it.

That’s honestly a really accurate metaphor for document versus relational databases.

An obvious question is, how the eff would you query a structureless database?

And that’s where map reduce comes in. Hadoop, btw, is just a way to DO map reduce, which is a technique for structuring data.

And here is the insanely simple way that it works-

Going back to my Thailand example, let’s say I want to know how many times I mentioned each city so I can start prioritizing what to hit. I would write a map reduce job, which is done by using programming-friendly languages. And it works in 2 parts.

1. Mapping- I’m going to tell it to make me a list of every word in every document I have and give it a count. So I’ll have something like this:

Bangkok 1

Ko Pha Gnam 1

Bangkok 1

Bangkok 1

2. Reduce- Then I really just do a pivot table so it looks like this:

Bangkok 3

Ko Pha Gnam 1

So why is this so annoyingly trendy right now? Well, the real work of map reduce is the mapping. And If I have my documents on 2 different servers, it can run those at the same time, making it twice as efficient. It basically scales in speed with the number of servers. Versus mysql which has all your excel documents in one place and has to do everything in that same place.

Another analogy I like is, instead of sending your boyfriend to Target to pick up 20 things, you send 20 friends to 20 different Targets to pick up 1 thing each. Much faster.

And that, I swear to god, is it.

Well, except that map-reduce is hard for non-programmers to write so there’s a zillion programs that translate more SQL-type languages into map-reduce so that analysts can leverage it. This is where HIVE comes in.

So there ya go. I won’t even touch all the elitism, snark or the word “data-drive journalism” for now. But there ya go, Big data as explained by google docs and pivot tables.

words words words… dictionary!

Not gonna lie, i’m pretty proud of myself today. Not for any remarkable coding feat but in how
I worked around not knowing shit but still ask questions properly.

Here’s how things went down…
So I built a fairly impressive dashboard through a combination of piecing together existing code and adding some new models and controllers and blah blah blah.
And while I was pretty proud, I still feel like I don’t have a fundamental understanding of what I did and how i would fix it when it broke.

So I set off to do the grunt work. Building a brand new page and switching out every piece of code to make sure I understood the structure and rules.

Which, okay, random side vent- this whole ruby “magic” thing is such crap for beginners. It basically makes everything just “work” and it’s impossible to really understand why. This is the second time I’m finding myself on the “visibility and referencial integrity > simple and easy.”

Anyway, so i’m trying to figure out what kind of a word “index” is. When you write “def index” in a controller file, then make an index views file it works. But type in “index1” and it fails.
So i’m trying to understand if this is a Ruby specific “word” or if it’s been mapped somewhere else in the code base.
And asking the question like that isn’t exactly elegant.

So, I google around to find a nice glossary of terms and there isn’t quite one that easily maps back to the code base i’m working with. they’re all more general.

And I’m staring at my pretty color coded sublime text file and it hits me! There’s has to be a settings file that tells it what colors to put the different words in. So I google around, find the git hub repo and boom- instant glossary of every kind of word in your syntax with it’s color.

So next time I need to ask what something does I can look at the mapping, realize all the orange ones are “functions” and not only sound smart, but most likely answer the question myself with google.

Not that anyone reading this would ever need this but for posterity, the code-

https://github.com/n00ge/Sublime-Text-2-Packages/blob/master/Color%20Scheme%20-%20Default/Monokai.tmTheme

how to curb stomp a macbook pro

step 1- convince your engineering team to let you configure a dev environment on your computer

step 2- insist on figuring it out yourself

step 3- do not ask for help

step 4- throw your computer out the window in a violent rage

long story short- I am set up to start building shit and it took me a week to figure out how to get to the point where i could even pretend to start learning to code.

to dumb it down (since no one ever does)- a dev environment is basically a copy of all the code and databases that run something, but it sits on your local desktop. you play with changes here, then when you’re ready, push a copy of the code into the production world.

what they don’t tell you is how BLOODY FREAKING IMPOSSIBLE it is to set this up.
so you’re young and fresh and ready to start learning to code, and the FIRST thing you have to do is actually more challenging than writing the fucking code itself.

so be warned- it’s hard. the documents don’t account for 90% of the shit you need to do because it was written by someone on an ENTIRELY DIFFERENT COMPUTER. and they forgot that they already fucking HAD an SSH key on git hub. and they already had xcode. and they had access to the password to create the database.yml file. and oh yeah, this was written back when the most recent version of rails was 4.1.1 and no one has upgraded so if you don’t SPECIFICALLY install a random old version, yours won’t work, but you do THEN have to separately add a bunch of Gems or it won’t work. oh and you’ll obviously need to reconfigure a bunch of shit so you have access to actually install some of the stuff where you can’t call out SUDO because it’s in the library.

so yeah. fuck you last week.

i will say however, that when you get it to work for the first time, you WILL feel like you’ve just performed the single greatest engineering feat in the history of the company. and everyone should lower their eyes in your presence because you are a god among tech.

engineering is so dramatic, damn.

shi++. gets. real.

get it?? sh(i++)?? i++???

how do I have friends…

anyway, today’s ah-ha moment comes from my first attempt in the big bad world of programming. After my complete mastery of linux (haha), I decided it was time to get some real cred and start learning to code. 

But where to begin? The internet is extremely unhelpful here. They offer everything from free online tutorials to $20,000 monthlong bootcamps. And which LANGUAGE?! and is there a fundamentals piece I should look into? What about algorithms? And even if I know how the language works where do it… uhm… put it? How does it get from words on my screen to a functioning app?

Since my meeting with our engineering lead was 6 hours away and I couldn’t wait, I just grabbed the source code from one of the dashboards I hope to learn to build and started playing with the code, naturally first changing the text to things like “Kari is amazing.”

that’s when the confusion started. went something like this.

me- so everyone is saying we use ruby but this all looks like html and java what… like… IS Ruby?

 friend- it generates the HTML code.

me- but i thought it was a language just like html…and why are people telling me to learn Ruby when it, if it generates HTML then it seems like I’d need to learn that first?

friend- it’s a different language. sorta.

me- so is it like I type in <Ruby> Code code code </Ruby> and it spits out a bunch of HTML equivalant?

friend- no, it’s much more complex. you need like servers and shit.

me- so once I install all the things does ruby create all the code? and why is the HTML I’m looking at have hardcoded data when I know the data is a result of an underlying query? where does that live? 

And then the ah-ha. where an engineer realizes how much they have to back up when explaining things to someone without a CS degree.

Here is the lesson-

There’s 2 kinds of languages (probably more elegant ways to categorize the nuances but for this discussion, 2): Client side code and server side code.

Java, HTML and CSS are all client side. The stuff you “see” basically. that’s why you can steal it from a webpage and replicate the visual on your own. 

THEN there’s server side code. This is Ruby, SQL etc. You can’t view this outright, but it helps generate the client side code. 

I’m still a little unclear on how they work together and how/where they live, but this is a fairly HUGE important thing to realize when blindly figuring out which “language” to start learning first.

also variables are done in a really stupid way in Javascript.

why this:

why does this:
 
<script>
var x = 1
var i = 2
{x=x+i}
alert(x);
</script>
 
produce the same result as this:
 
<script>
var x = 1
var i = 2
{y=x+i}
alert(y);
</script>
 
is stupid and feels unorganized. but it’s more likely that I just don’t fully understand it yet. 
til next time!
</blog post>

 

 

adventures in tmux

i know. I KNOW. Varsity right? I can’t take credit, well I sorta can. I was crazy excited after accidentally hitting command+D and splitting the pane, not realizing it only duplicated. This led Max to suggest I go discover the multiplexing god that is tmux.

But like most things I’m finding, most of the forums and wikis and tutorials assume a certain level of knowledge and fail to mention the EXTREMELY IMPORTANT basics.

What i was trying to do- uhm. split a pane. the, like, FIRST thing you should be able to do.

The problem- The prefix (as i learned later it was called) CTL+B and the ” key weren’t working.

The solution- the plus symbols means “at the same time” while the <“> means directly following it.

So yeah, while ALT+SHIFT+TAB is insanely common in pane management, they go and decide it’s now 2 separate commands and don’t bother explaining it ANYWHERE.

This is what makes learning this shit from scratch so infuriating.

It says <Prefix + %>, do i type that whole thing?

or not the <> part?

is “prefix” part of the code or does it means something that i need to input?

and why does it sometimes say CTL+B and sometimes CTL-B?

there’s like 80% consistency and EVERYONE SEEMS TO UNDERSTAND THE 20% THAT’S NOT EXPLAINED ANYWHERE

AND WHY DON’T THEY COLOR CODE THIS IN STACKED OVERFLOW

AND WHY DOESN’T THE CONFIG FILE COME WITH IT? AND HOW TO I CREATE MY OWN CONFIG FILE SO I CAN JUST COPY THIS THAYER GUY’S CRAP.

;ALSKDFJA;LSFJ

the lesson- you better have friends, a computer science degree or an insane amount of patience.

xo-k

/tmp lesson learned the hard way

As my friend Max so elegantly said when I  made my first big fuck up “most programs make it so you can’t shoot yourself in the foot too badly, working in the terminal there’s no such thing”

So for reasons I can’t figure out, I had to save files to the tmp folder when connected to my company’s database server. Which, w/e makes no different to me.

Until I waited 2 patient days for a query to finish running then restarted my computer…

turns out /tmp is nothing like Desktop/temp and everything gets deleted when you shut down.

the bright side- Having no alternatives but to make the query run faster I learned the black magic that is indexing.

Having only an average understanding of database engineering, I thought indexes worked differently than they actually do.

What i thought they did- Act like a table of contents in a book. So if i only need my query to pull stuff out of Chapter 3, it prevents it from looking through the whole book. This makes total sense if you have repeated data like male/female etc. But what if every record is unique? What’s the point of indexing on, say, the primary key?

What it actually does- Using my book analogy again, indexes can also be a table of contents filled with just page numbers, so it can jump right to the page without scanning the entire book for it.

The query then ran in 16 seconds. Which made me decide it’s sorcery…

 

OSX nuances suck out the better part of the day

If you were wondering what triggered this journey, it was all about my breakup with excel.

I was asked to do some  analysis on a data set with 6 million rows, basically determining a US state for each row, using crap data, requiring an irriratingly long Case When statement in SQL and/or IF statement in excel. First I needed to join to tables that didn’t exist (like a table of US Zip codes) which required me to learn setting up local databases.

And when sql pro couldn’t export a query  that size,  I had to learn how to do everything from the command line. Which brings us to the story of how I earned my right to ask questions (or more appropriately, convinced people I’ve done the grunt work before I come to you for help).

It was a simple journey into the world of sed. Sed being one of the most useful first things people learn. Sed being a way to find and replace without opening a file. quite useful when your file is 6 million rows.

But of course what I wanted to do wasn’t simple and I found only a single post that explained this little trick.

What I was trying to do: Change a file from tab delimited (how mysql outputs a file) into comma delimited (how mysql needs it to import it). (And while i’m at it, why does mysql default it to tab delimited anyway when it’s just going to want it in CSV later? asshole.)

The problem: Figuring out why all the code that said ‘s/TAB/,/g’ didn’t work.

The solution: that code is for windows. if you’re on a mac (which i don’t know why you wouldn’t be), you have to replicate the Tab key with the keyboard shortcuts CTL+v Then CTL+i.

The bright side is, i learned a lot about the smart way to test errors, the head command, and became a lot better and quickly creating, editing and moving files around.

The lesson- Prefix all google searches with “OSX terminal” from now on.

 

Asking questions when you know nothing (John Snow)

So I was well on my way in the terminal, listing directories, CHANGING directories (omg CD), viewing files, creating files. So exciting! Fuck you Word and excel, I do everything from the command line now. Your inferior “UI” is useless to me.

But like most things I set out to do, I jumped in head first and tried to solve a very complicated problem.

What I tried to do: Open a list of URLs and save a PDF of the content.

Why this was more challenging than expected: They’re internal encrypted websites requiring a login.

What I tried (found on google) : Bypassing the browser completely and using a HTML to PDF converter called wkhtmltopdf

Now, in theory, I understand the problem and why the solution could work. What I got stuck on was even more retarded. And thus again, the point of this blog.

So what is wkhtmltopdf? It’s an open source LGPL command line tool… k… great. so I type in the code that people are referencing in various forums, no luck. Everntually i figure out you have to like… download something.

So I work out my own somewhat accurate metaphor. Linux (or unix, still unclear on the difference) has basically a dictionary, and to do other cool stuff you need to kinda download other dictionaries. Then you can use the words in those dictionaries. What makes things more confusing to a noob is most people download these dictionaries by first downloading a dictionary called Homebrew. Then you can use the word brew. but not til you download it. a;lskdjf

The instructions on the site are naturally crystal clear:

  1. Download a precompiled binary or build from source

Instead of googling what a precomipled binary is or try and decipher what they mean by the irritatingly innocuous “source,”  decide it would be faster to “ASK AN ENGINEER!” and that’s when the lolz began.

I trek across the office to the land of Hershel backpacks and imported Japanese tea. And attempt to ask my question. it sounded something like this:

“So. I need help… uhm “installing” like a … “tool” maybe? or “client” i’ve seen it called also… or like something about a binary something…. or a kit? Basically i need to make use commands that i can’t use now… uh work.”

After a pause, he proceeded to install homebrew, download the tool/kit/client and dumb it down for me in a very gentle, kind way.

The lesson-

Linux has a list of words, commands are like verbs and i haven’t mapped out the rest of the syntax yet. To use the verb “install” in order to download something, you first need to install homebrew (or a similar program) that’s kind of like an excel add-on.

The epilogue- Apparantly you can’t bypass a browser if you need to login and can’t replicate the cookies (thanks Max for spending an evening helping me try) and I settled for an applescript which did the job just fine.

 

Scripting and Louboutins- The inaugural post

The requisite explanation entry, which no one ever reads, assuming you’ve done a good job naming your blog.

However, there’s a few different angles this could be coming from, so I’ll attempt to give this little project a well-deserved introduction.

I’m a girl. and i’m learning to code. and there aren’t a ton of people trying to learn this stuff from scratch. and it’s hilarious. and i need a place to vent. and completely geek out. and my friends (most of whom are programmers) find my little “rants” amusing. so maybe there are people out there afraid to ask really really dumb questions to really really smart people. i’m not one of those people. and i really really enjoy going against the hipster grain. asking a car full of people what song is playing, and then following up by asking what else Mumford and Sons have done.

maybe moving to Seattle after of college was a culture shock of people pretending to know things they didn’t know. and i relished the friendships spawned from a “thank you for asking, i had no idea what they were talking about either.”

and now that i’m a financial analyst for a start up in Silicon Valley, I’ve found a new breed of hipsters; the engineer. The overworked, too smart for his own good, well-dressed, bikes to work, built his own wireless stereo system in his cement-walled Soma Loft, knows more about Sushi than you, engineer.

But being a girl has one important advantage. Anything technical I do is disproportionately impressive . and I can milk the “aw she wants to learn linux” out of my friends for at least the next few years.

and so, i’m starting a blog because you too should be frustrated when stackedoverload doesn’t clearly explain which part is the code and which part you have to fill in your own variables for. and you too should giggle the first time you make your computer talk to you. and you too should realize that shitty part of your job can probably be automated.

It’s worth mentioning that I’m writing this inaugural post at work on a Friday, watching tmux tutorials in a polka dot dress, pearls and Christian Louboutins, killin an hour because i wrote a script that runs every query i need for the next week.

<nerd>…