up one level
---

2017-02-02
2017-01-31

Thoughts After Having Read Blog post on "Bulk Log Analytics With Hive"

By w̶i̶e̶l̶d̶l̶i̶n̶u̶x̶.̶c̶o̶m̶ author Morgan Jassen

After having encountered the persona of @hypertextranch (https://twitter.com/HypertextRanch), I occasionally try to follow what they have published to see what I can learn.

I was glad to come across this tweet, https://twitter.com/wordpress...40384, that linked to this post, "Bulk Log Analytics With Hive" https://data.blog/2016...-with-hive/, (by XIAO ( https://data.blog/author/hypertextranch/ )), which seems to be one of the first on what seems to be a new (in Nov. 2016) "Data for Breakfast" data blog from Automattic.

My take-aways. A mix of what I learned and what it reminded me of:
- Huge (TB) data analysis can be seen as "easy" with the right techniques and technology.
- It'll probably take multiple servers, or a cluster, to be able to manipulate the data quickly.
- The big data can be organized in a way that can be treated similar to a SQL table.
- Fitting data technologies together like this to get the desired data reports, is in a way like constructing a house frame from wood boards, or like cooking a delicious dish from raw ingredients, or any other number of analogies. (making completed crafts from craft pieces, etc.) Just that the pieces are digital not physical.

In addition, I tried to find how to pronounce "serde" online. (Apache Hive "SerDe" -- serialization/deserialization -- that was referenced in the article). I found a video* where the speaker pronounced it like sərdiː (that is in IPA phonetic spelling), (Or "Sir D." -- that is in my own made-up phonetic spelling of words that I know) So for now I'll go with that.

Thank you to @hypertextranch on twitter and XIAO on data.blog, for having authored and/or published these.

In conclusion, I'm glad to have read the article because I learned some general things about fitting technologies together to read and report on large data, and also I learned some specific things about Apache Hive, SerDe, and how Automattic processes blog metrics.

*Video link: "What is SerDe ? Hadoop Training , Apache Hive Training (1)" https://www.youtube.com/watch?v=ri3dqc-rt5s

[2017-03-02 Update: I add the paragraph near the end saying "Thank you to @hypertextranch...", to acknowledge thanks and try to show appreciation for the resources.]

[2019-03-11 edit: Moved to: https://i̶n̶v̶e̶s̶t̶o̶r̶w̶o̶r̶k̶e̶r̶.̶c̶o̶m̶/2017/... .html.]