Introduction to Blockchain

While reading a book on Blockchain recently (Daniel Drescher’s “Blockchain Basics”), i made following notes:

A. DEFINITION

  • The term blockchain is ambiguous; it has different meanings for different people, depending on the context. It can refer to:
    • A data structure
    • An algorithm
    • A suite of technologies
    • A group of purely distributed peer-to-peer systems with a common application area
  • Blockchain can be thought of as a purely distributed peer-to-peer system of ledgers, managed by an algorithm, which negotiates the informational content of ordered and connected blocks of data, in order to achieve and maintain its integrity. Managing and clarifying ownership is the most prominent application case of the blockchain (but not the only one).

    ledger

    Concepts and principles of a ledger

 

B. PROBLEM AREA

  • Problem of ownership:
    • A proof of ownership has three elements:
      • Identification of the owner
      • Identification of the object being owned
      • Mapping the owner to the object
    • ID cards, birth certificates, and driver’s licenses as well as serial numbers, production dates, production certificates, or a detailed object description can be used in order to identify owners and objects.
    • The mapping between owners and objects can be maintained in a ledger, which plays the same role as a witness in a trial.
    • Having only one ledger is risky since it can be damaged, destroyed, or forged. In this case, the ledger is no longer a trustworthy source for clarifying ownership.
    • Instead of using only one central ledger, one can utilize a group of independent ledgers for documenting ownership and clarify requests concerning the ownership on that version of the reality on which the majority of ledgers agrees.
    • It is possible to create a purely distributed peer-to-peer system of ledgers by using the blockchain-data-structure. Each blockchain-data-structure represents one ledger and is maintained by one node of the system. The blockchain-algorithm is responsible for letting the individual nodes collectively arrive at one consistent version of the state of ownership. Cryptography is used to implement identification, authentication, and authorization.
    • Integrity of a purely distributed peer-to-peer system of ledgers is found in its ability to make true statements about ownership and to ensure that only the lawful owner can transfer his or her property rights to others.
  • Problem of double spending
    • Double spending can refer to:
      • A problem caused by copying digital goods
      • A problem that may appear in a distributed peer-to-peer system of ledgers
      • An example of violating the integrity of distributed peer-to-peer systems
    • It’s a vulnerability of purely distributed peer-to-peer systems of ledgers, and blockchain is a means to solve this problem
  • The core problem to be solved by the blockchain is achieving and maintaining integrity in a purely distributed peer-to-peer system that consists of an unknown number of peers with unknown reliability and trustworthiness.
  • In order to design a purely distributed peer-to-peer system of ledgers for managing ownership, one has to address the following tasks:
    • Describing ownership
    • Protecting ownership from unauthorized access
    • Storing transaction data
    • Preparing ledgers to be distributed in an untrustworthy environment
    • Forming a system of distributed the ledgers
    • Adding and verifying new transactions to the ledgers
    • Deciding which ledgers represent the truth
ownership

Concepts of ownership

 

C. TRANSACTIONS

  • Transaction data provide the following information for describing a transfer of ownership:
    • An identifier of the account who initiates the transaction and is to transfer ownership to another account
    • An identifier of that account that is to receive ownership
    • The amount of the goods to be transferred
    • The time the transaction is to be done
    • A fee to be paid to the system for executing the transaction
    • A proof that the owner of the account who hands off ownership agrees with that transfer
  • The complete history of transaction data is an audit trail that provides evidence of how people acquired and handed off ownership.
  • Any transaction not being part of that history is regarded as if it never happened.
  • A transaction is executed by adding it to the history of transaction data and allowing it to influence the result of aggregating them.
  • The order in which transaction data are added to the history must be preserved in order to yield identical results when aggregating these data.
  • In order to maintain integrity, only those transaction data are added to the blockchain-data-structure that fulfill the following three criteria:
    • Formal correctness
    • Semantic correctness
    • Authorization

 

D. SECURITY

  • Identifying data from their digital fingerprint
    • Hash functions transform any kind of data into a number of fixed length, regardless of the size of the input data.
    • There are many different hash functions that differ among others with respect to the length of the hash value they produce.
    • Cryptographic hash functions are an important group of hash functions that create digital fingerprints for any kind of data.
    • Cryptographic hash functions exhibit the following properties:
      • Provide hash values for any kind of data quickly
      • Deterministic
      • Pseudorandom
      • One-way usage
      • Collision resistant
    • Hash values can be used:
      • To compare data
      • To detect whether data that were supposed to stay unchanged have been altered
      • To refer to data in a change-sensitive manner
      • To store a collection of data in a change-sensitive manner
      • To create computationally expensive tasks
  • The major goal of cryptography is to protect data from being accessed by unauthorized people. Main cryptographic activities are:
    • Encryption: Protecting data by turning them into cypher
      text by utilizing a cryptographic key
    • Decryption: Turning cypher text back into useful data by utilizing a matching cryptographic key
hash_values

Calculating hash values

  • Asymmetric cryptography always uses two complementary keys: cypher text created with one of these keys can only be decrypted with the other key and vice versa. When utilizing asymmetric cryptography in real life, these keys are typically called the public key and private key in order to highlight their role. The public key is shared with everyone, while the private key is kept secret. For this reason, asymmetric cryptography is also called public- private-key cryptography.
  • There are two classical use cases of public and private keys:
    • Everyone uses the public key to encrypt data that can only be decrypted by the owner of the corresponding private key. This is the digital equivalent to a public mailbox where everyone can put letters in but only the owner can open it.
    • The owner of the private key uses it to encrypt data that can be decrypted by everyone who possesses the corresponding public key. This is the digital equivalent to a public notice board that proves authorship.
  • The blockchain uses asymmetric cryptography in order to achieve two goals:
    • Identifying accounts: User accounts are public cryptographic keys.
    • Authorizing transactions: The owner of the account who hands off ownership creates a piece of cypher text with the corresponding private key. This piece of cypher text can be verified by using the corresponding public key, which happens to be the number of the account that hands off ownership.
  • Digital signatures serve two purposes:
    • Identify its author uniquely
    • State agreement of its author with the content of a document and authorize its execution
  • In the blockchain, digital signatures of transactions are cryptographic hash values of transaction data encrypted with the private key that corresponds to the account that hands off ownership.
  • Digital signatures in the blockchain can be trace back uniquely to one specific private key and to one specific transaction in one process.
asymmetric_cryptography

Asymmetric cryptography

There are two keys: a white key and a black key. Together they form the pair of corresponding keys. The original message is encrypted with the black key, which yields cypher text represented by the black box containing white let- ters. The original message can also be encrypted with the second key, which yields different cypher text represented by the white box containing black letters. For didactical reasons, the colors of the boxes representing cypher text and the colors of the keys used to produce them are identical in order to highlight their relation: The black key yields black cypher text, while the white key produces white cypher text. Black cypher text can only be decrypted with the white key and vice versa. The trick to asymmetric cryptography is that you can never decrypt cypher text with the key that was used to create it.

 

E. DATA STRUCTURE

  • The blockchain-data-structure is a specific kind of data structure that is made up of ordered units called blocks.
  • Each block of the blockchain-data-structure consists of a block header and a Merkle tree that contains transaction data.
  • The blockchain-data-structure consists of two major data structures: an ordered chain of block headers and Merkle trees.
  • One can imagine the ordered chain of block headers as being the digital equivalent to an old-fashioned library card catalog, where the individual catalog cards are sorted according to the order in which they were added to the catalog.
  • Having each block header referencing its preceding block header preserves the order of the individual block headers and blocks, respectively, that make up the blockchain-data-structure.
  • Each block header in the blockchain-data-structure is identified by its cryptographic hash value and contains a hash reference to its preceding block header and a hash reference to the application-specific data whose order it maintains.
  • The hash reference to the application-specific data is typically the root of a Merkle tree that maintains hash references to the application-specific data.
data_structure

Simplified blockchain-data-structure containing four transactions

 

F. STORING DATA

  • The steps to be performed in order to add new transaction data to the blockchain-data-structure are:
    • Create a new Merkle tree that contains all new transaction data to be added.
    • Create a new block header that contains both a hash reference to its preceding header and the root of the Merkle tree that contains the new transaction data.
    • Create a hash reference to the new block header, which is now the current head of the blockchain- data-structure.
  • Changing data in the blockchain-data-structure requires renewing all hash references starting with the one that directly points to the manipulated data and ending with the head of the whole blockchain-data-structure as well as all hash references in between them.
  • The blockchain-data-structure pursues a radical all-or-nothing approach when it comes to changing its data: One either changes the whole data structure completely starting from the point that causes the change until the head of the whole chain or one better leave it unchanged in the first place.
  • All half-hearted, halfway through, or partial changes will leave the whole blockchain-data-structure in an inconsis- tent state, which will be detected easily and quickly.
  • Changing the blockchain-data-structure completely is a very elaborate process on purpose.
  • The high sensitivity of the blockchain-data-structure regarding changes is due to the properties of hash references.

 

G. DATA STORE PROTECTION

  • The blockchain protects the history of transaction data from manipulation and forgery by storing transaction data in an immutable data store.
  • The history of a transaction is made immutable by utilizing two ideas:
    • Storing the transaction data in the change-sensitive blockchain-data-structure, which when being changed requires rewriting the data structure starting at the point that causes the change until the head of the whole chain.
    • Requiring the solution of a hash puzzle for writing, rewriting, or adding every single block header in the blockchain-data-structure.
  • The hash puzzle is unique for each block header because it depends on its unique content.
  • The need to rewrite the blockchain-data-structure when it is changed and the costs of doing so make it unattractive to manipulate the history of transaction data in the first place.
  • Requiring the solution of a hash puzzle for every writing, rewriting or adding of block headers in the blockchain-data- structure turns is into an append-only data store.
  • A block header contains at least the following data:
    • A hash reference to the header of its preceding block
    • The root of a Merkle tree that contains transaction data
    • The difficulty of its hash puzzle
    • The time when solving the hash puzzle was started
    • The nonce that solves the hash puzzle
hash-puzzle

Hash puzzle required to be solved when adding a new block to the blockchain-data-structure

 

H. VERIFYING AND ADDING TRANSACTIONS

  • The blockchain-algorithm is a series of rules and instructions that governs the way in which transaction data are processed and added to the system.
  • The challenge solved by the blockchain-algorithm is to keep the system open to everyone while ensuring that only valid and authorized transactions are added.
  • The blockchain-algorithm utilizes the carrot-and-stick approach, combined with competition and peer control.
  • The major idea of the blockchain-algorithm is to allow all nodes of the system to act as supervisors of their peers and reward them for adding valid and authorized transactions and for finding errors in the work of others.
  • Due to the rules of the blockchain-algorithm, all nodes of the system have an incentive to process transactions correctly and to supervise and point out any mistakes made by the other peers.
  • The blockchain-algorithm is based on the following concepts:
    • Validation rules for transaction data and block headers
    • Reward for submitting valid blocks
    • Punishment for counteracting the integrity of the system
    • Competition among peers for earning reward based on processing speed and quality
    • Peer control
  • The rules of the competition establish a two-step rhythm that governs the work of every node in the network.At any given point in time, all nodes of the system are in either of the two phases:
    • Evaluating a new block that was created by others
    • Trying hard to be the next node that creates a new block that has to be evaluated by all others
  • The working rhythm is imposed by the arrival of messages at the individual nodes.
  • The majority of honest nodes and their striving for reward will outweigh the attempts of dishonest nodes to counteract the integrity of the system.

 

I. TRANSACTION HISTORY CHOICE

  • Delays in sending new blocks across the network or two nodes creating new blocks nearly at the same time cause the blockchain-data-structure to grow into the shape of a tree or a columnar cactus with branches that arise from a common trunk that represent conflicting versions of the transaction history.
  • Selecting an identical version of the transaction history is a collective decision-making problem.
  • Distributed consensus is an agreement among the members of a purely distributed peer-to-peer system in a collective decision-making problem.
  • The collective decision-making problem of the blockchain is characterized by the following facts:
    • All nodes operate in the identical environment, consisting of the network, nodes that maintain their individual copies of the blockchain-data-structure, and the blockchain-algorithm that governs the behavior of the nodes.
    • The decision-making problem is to select the identical transaction history across all nodes.
    • All nodes strive to maximize their individual income earned as a reward for adding new valid blocks to the blockchain-data-structure.
    • In order to achieve their goals, all nodes send their new blocks to all their peers to have them examined and accepted. As a result, each nodes leaves its individual footprint in the environment that is the collectively maintained blockchain-data-structure.
    • All nodes use the identical criterion for selecting a history of transaction data.
  • The longest-chain-criterion states that each node independently chooses the path of the tree-shaped blockchain-data-structure that contains the most blocks.
  • The heaviest-chain-criterion states that each node independently chooses that path of the tree-shaped blockchain-data-structure that has the highest aggregated difficulty.
  • Selecting one path of the tree-shaped blockchain-data- structure has the following consequences:
    • Orphan blocks
    • Reclaimed reward
    • Clarifying ownership
    • Reprocessing of transactions
    • A growing common trunk
    • Eventual consistency
    • Robustness against manipulations
  • The deeper down the authoritative chain a block is located:
    • The further in the past it was added
    • The more time has passed since its inclusion in the blockchain-data-structure
    • The more common effort has been spent on adding subsequent blocks
    • The less it is affected by random changes of the blocks that belong to the longest chain
    • The less likely it will be abandoned
    • The more accepted it is by the nodes of the system
    • The more anchored it is in the common history of the nodes
  • The fact that certainty concerning the inclusion of blocks in the authoritative chain increases as time goes by and as more blocks are added eventually is called eventually consistency.
  • A 51 percent attack is an attempt to gather or control the majority of the whole voting power in a collective decision-making process with the goal to turn blocks that are part of the authoritative chain into orphan blocks and establish a new authoritative chain that contains a transaction history that is more favorable from the attackers point of view.
  • A 51 percent attack has the following characteristics:
    • Economically: Changing the allocation of ownership rights by changing the collective history of transaction data.
    • Decision making: Gathering the majority of voting power in order to enforce a desired result.
    • Technically: Undermining the integrity of the system.
    • Architecturally: Establish at least temporarily a hidden element of centrality that changes the state of the system.

 

J. INTEGRITY

  • The blockchain utilizes fees for compensating its peers for contributing to the integrity of the system.
  • The instrument of payment used to compensate peers (e.g. Bitcoin), has an impact on major aspects of the blockchain such as:
    • Integrity
    • Openness
    • The distributed nature
    • The philosophy of the system
  • Desirable properties of an instrument of payment for compensating peers are:
    • Being available in digital form
    • Being accepted in the real world
    • Being accepted in all countries
    • Not being the subject to capital movement restrictions
    • Being trustworthy
    • Not being controlled by one single central organization or state
  • A cryptocurrency is an independent digital currency whose ownership is managed by a blockchain that uses it as an instrument of payment for compensating its peers for maintaining the integrity of the system.

 

K. LIMITATIONS

  • The openness of the blockchain and the absence of any form of central control are the fundamentals of its functioning but can also cause limitations for its adoption.
  • Major technical limitations of the blockchain are:
    • Lack of privacy
    • The security model
    • Limited scalability
    • High costs
    • Hidden centrality
    • Lack of flexibility
    • Critical size
  • The most important nontechnical limitations of the blockchain are:
    • Lack of legal acceptance
    • Lack of user acceptance
  • Technical limitations of the blockchain can be overcome by improving the existing technology or by introducing conceptual changes.
  • The nontechnical limitations of the blockchain can be overcome by educational and legislative initiatives.

 

L. OTHER TYPES OF BLOCKCHAIN

  • The blockchain inherently contains the following conflicts:
    • Transparency vs. privacy: On the one hand, transparency is needed for clarifying ownership and preventing double spending, but on the other hand, its users require privacy.
    • Security vs. speed: On the one hand, protecting the history of transaction data from being manipulated is done by utilizing the computationally expensive proof of work, but on the other hand, speed and scalability are required in most commercial contexts.
  • The transparency vs. privacy conflict has its root in the allocation of reading access rights to the blockchain-data-structure.
  • The security vs. speed conflict has its root in the allocation of writing access rights to the blockchain-data-structure.
  • Solving the transparency vs. privacy conflict led to the following versions of the blockchain:
    • Public blockchains grant reading access and the right to create new transactions to all users or nodes.
    • Private blockchains limit reading access and the right to create new transactions to a preselected group of users or nodes.
  • Solving the security vs. speed conflict led to the following versions of the blockchain:
    • Permissionless blockchains grant writing access to everyone. Every user or node can verify transaction data and create and add new blocks to the blockchain- data-structure.
    • Permissioned blockchains grant writing access only to a limited group of preselected nodes or users that are identified as trustworthy through an on- boarding process.
  • Combining these restrictions pairwise led to the emergence of four different kinds of blockchains.
  • Restricting reading or writing access results in consequences on the following properties of the blockchain:
    • The peer-to-peer architecture
    • The distributed nature
    • Its purpose
  • The blockchain-technology-suite causes value even in restricted environments for the following reasons:
    • The number of nodes can vary due to technical failures or downtime.
    • Every distributed system faces the adversaries of networks that make communication on the level of individual messages unreliable.
    • Even an on-boarding process may not guarantee the trustworthiness of nodes at a 100 percent level.
    • Even trustworthy nodes may yield wrong results due to technical failures.

 

M. BLOCKCHAIN USAGE

  • The blockchain can be considered a purely distributed data store with additional properties such as being immutable, append-only, ordered, time-stamped, and eventually consistent.
  • Being a generic data store means that the blockchain can store a wide range of data, which in turn makes it usable in a wide range of application areas.
  • Based on its properties, we can identify the following generic-use patterns of the blockchain:
    • Proof of existence
    • Proof of nonexistence
    • Proof of time
    • Proof of order
    • Proof of identity
    • Proof of authorship
    • Proof of ownership
  • Specific application areas of the blockchain that have already received attention or may receive attention in the future are:
    • Payments
    • Cryptocurrencies
    • Micropayments
    • Digital assets
    • Digital identity
    • Notary services
    • Compliance and audit
    • Tax
    • Voting
    • Record management
  • When analyzing specific blockchain applications or blockchain services, some questions need to be answered:
    • What kind of blockchain is used?
    • Are the requirements for using the blockchain fulfilled?
    • What is the added value of using a purely distributed peer-to-peer system?
    • What is the application idea?
    • What is the business case?
    • How are peers compensated for contributing resources to the system?

 

N. FUTURE

  • The blockchain has been and will continue to be the subject of further improvements and developments such as variations in its implementation, improving efficiency, improving scalability, and conceptual advances.
  • Smart contracts, zero-knowledge proofs, and alternative ways to achieve consensus are major areas of conceptual advancement of the blockchain.
  • Besides it technical merits, the blockchain may be honored for the following long-term accomplishments:
    • Disintermediation
    • Automation
    • Standardization
    • Streamlining processes
    • Increased processing speed
    • Cost reduction
    • Shift toward trust in protocols and technology
    • Making trust a commodity
    • Increased technology awareness
  • Possible disadvantages of the blockchain are:
    • Lack of privacy
    • Loss of personal responsibility
    • Loss of jobs
    • Reintermediation
  • Possible usages of the blockchain to be seen in the future are:
    • Limited enthusiast projects
    • Large-scale commercial projects
    • Governmental projects

 

O. SUMMARY

  • The blockchain is a purely distributed peer-to-peer system that addresses the following aspects of managing ownership:
    • Describing ownership: History of Transaction Data
    • Protecting ownership: Digital Signature
    • Storing transaction data: Blockchain-Data-Structure
    • Preparing ledgers for being distributed: Immutability
    • Distributing ledgers: Gossip-Style Information Forwarding Through a Network
    • Processing new transactions: Blockchain-Algorithm
    • Deciding which ledger represents the truth: Distributed Consensus
  • Analyzing the blockchain involves the following aspects:
    • The application goal
    • Its properties
    • Its internal functioning
  • The blockchain has two application goals:
    • Clarifying ownership
    • Transferring ownership
  • The blockchain fulfills its application goals while exhibiting the following qualities:
    • Highly available
    • Censorship proof
    • Reliable
    • Open
    • Pseudoanonymous
    • Secure
    • Resilient
    • Eventually consistent
    • Keeping integrity
  • Internally the blockchain consists of components that are either specific or agnostic to the application goal of managing ownership.
  • The application-specific components of the blockchain are:
    • Ownership logic
    • Transaction data
    • Transaction processing logic
    • Transaction security
  • The application-agnostic components are:
    • The blockchain-technology-suite
    • The purely distributed peer-to-peer architecture
  • The blockchain-technology-suite consists of:
    • Storage logic
    • Consensus logic
    • Data processing logic
    • Asymmetric cryptography

 

If you find the subject of Blockchain interesting and would like to get more in-depth understanding of it, i strongly encourage you to read  Daniel Drescher’s “Blockchain Basics”.

 

Cheers!

Advertisements

ECMAScript ES6 (ES2015) changes overview

I’ve been playing recently with ReactJS a bit, and was pleasantly surprised when seeing great changes, the JavaScript language has undergone, over the last c.a. 2 years.

This made me realize, that i need to study those changes in more detail, which is how this blog entry came to existence 🙂

According to Wikipedia, “ECMAScript (or ES) is a scripting-language specification, standardized by the European Computer Manufacturers Association. (…) JavaScript is the best-known implementation of ECMAScript since the standard was first published, with other well-known implementations including JScript and ActionScript” (anyone remembering the Flash platform authored by Macromedia?).

In June 2015, sixth edition of ECMAScript (ES6) was introduced, which later changed its name to ECMAScript 2015 (ES2015).

Among the design objectives, that the TC39 (Ecma Technical Committee 39) team defined for the new version of the language, were:

  • Goal 1: Be a better language (for writing: complex applications, libraries (possibly including the DOM) shared by those applications, code generators)
  • Goal 2: Improve interoperation (i.e. adopt de facto standards where possible)
  • Goal 3: Versioning (keep versioning as simple and linear as possible)

Some of the new constructs, that caught my attention:

 

1. let/const vs. var

In ES5, you declare variables via var. Such variables are function-scoped, their scopes are the innermost enclosing functions

In ES6, you can additionally declare variables via let and const. Such variables are block-scoped, their scopes are the innermost enclosing blocks.

let is roughly a block-scoped version of var.

const works like let, but creates variables whose values can’t be changed.

var num = 0;

if (num === 0) {
  let localSpeed = 100;
  var globalSpeed = 200;

  for (let i = 0; i < 0; i++){
    num += (localSpeed + globalSpeed) * 1;
  }

  console.log(typeof i);  // undefined
}

console.log(typeof localSpeed);  // undefined
console.log(typeof num);  // number
console.log(typeof globalSpeed);  // number

General advice by Dr. Axel Rauschmayer (author of Exploring ES6):

  • Prefer const. You can use it for all variables whose values never change.
  • Otherwise, use let – for variables whose values do change.
  • Avoid var.

 

2. IIFEs vs. blocks

In ES5, you had to use a pattern called IIFE (Immediately-Invoked Function Expression) if you wanted to restrict the scope of a variable tmp to a block:

(function () {  // open IIFE
  var tmp = ···;
  ···
}());  // close IIFE

console.log(tmp);  // ReferenceError

In ECMAScript 6, you can simply use a block and a let declaration (or a const declaration):

{  // open block
  let tmp = ···;
  ···
}  // close block

console.log(tmp);  // ReferenceError

 

3. concatenating strings vs. template literals

In ES5, you put values into strings by concatenating those values and string fragments:

function printCoord(x, y) {
  console.log('('+x+', '+y+')');
}

In ES6 you can use string interpolation via template literals:

function printCoord(x, y) {
  console.log(`(${x}, ${y})`);
}

Template literals also help with representing multi-line strings.

 

4. function expressions vs. arrow functions

In ES5, such callbacks are relatively verbose:

var arr = [1, 2, 3];
var squares = arr.map(function (x) { return x * x });

In ES6, arrow functions are much more concise:

const arr = [1, 2, 3];
const squares = arr.map(x => x * x);

 

5. for vs. forEach() vs. for-of

Prior to ES5, you iterated over Arrays as follows:

var arr = ['a', 'b', 'c'];
for (var i=0; i<arr.length; i++) {
  var elem = arr[i];
  console.log(elem);
}

In ES5, you have the option of using the Array method forEach():

arr.forEach(function (elem) {
  console.log(elem);
});

A for loop has the advantage that you can break from it, forEach() has the advantage of conciseness.

In ES6, the for-of loop combines both advantages:

const arr = ['a', 'b', 'c'];
for (const elem of arr) {
  console.log(elem);
}

If you want both index and value of each array element, for-of has got you covered, too, via the new Array method entries() and destructuring:

for (const [index, elem] of arr.entries()) {
  console.log(index+'. '+elem);
}

 

6. Handling multiple return values

A. via arrays

In ES5, you need an intermediate variable (matchObj in the example below), even if you are only interested in the groups:

var matchObj = /^(\d\d\d\d)-(\d\d)-(\d\d)$/.exec('2999-12-31');
var year = matchObj[1];
var month = matchObj[2];
var day = matchObj[3];

In ES6, destructuring makes this code simpler:

const [, year, month, day] = /^(\d\d\d\d)-(\d\d)-(\d\d)$/.exec('2999-12-31');

(The empty slot at the beginning of the Array pattern skips the Array element at index zero.)

B. via objects

In ES5, even if you are only interested in the properties of an object, you still need an intermediate variable (propDesc in the example below):

var obj = { foo: 123 };
var propDesc = Object.getOwnPropertyDescriptor(obj, 'foo');
var writable = propDesc.writable;
var configurable = propDesc.configurable;

console.log(writable, configurable);  // true true

In ES6, you can use destructuring:

const obj = { foo: 123 };
const {writable, configurable} = Object.getOwnPropertyDescriptor(obj, 'foo');
console.log(writable, configurable);  // true true

 

7. Handling parameter default values

In ES5, you specify default values for parameters like this:

function foo(x, y) {
  x = x || 0;
  y = y || 0;
  ···
}

ES6 has nicer syntax:

function foo(x=0, y=0) {
  ···
}

 

8. Handling named parameters

A common way of naming parameters in JavaScript is via object literals (the so-called options object pattern):

selectEntries({ start: 0, end: -1 });

Two advantages of this approach are: Code becomes more self-descriptive and it is easier to omit arbitrary parameters.

In ES5, you can implement selectEntries() as follows:

function selectEntries(options) {
  var start = options.start || 0;
  var end = options.end || -1;
  var step = options.step || 1;
  ···
}

In ES6, you can use destructuring in parameter definitions and the code becomes simpler:

function selectEntries({ start=0, end=-1, step=1 }) {
  ···
}

 

9. arguments vs. rest parameters

In ES5, if you want a function (or method) to accept an arbitrary number of arguments, you must use the special variable arguments:

function logAllArguments() {
  for (var i=0; i<arguments.length; i++) {
    console.log(arguments[i]);
  }
}

In ES6, you can declare a rest parameter (args in the example below) via the …operator:

function logAllArguments(...args) {
  for (const arg of args) {
    console.log(arg);
  }
}

Rest parameters are even nicer if you are only interested in trailing parameters:

function format(pattern, ...args) {
  ···
}

Handling this case in ES5 is clumsy:

function format(pattern) {
  var args = [].slice.call(arguments, 1);
  ···
}

 

10. apply() vs. the spread operator (…)

In ES5, you turn arrays into parameters via apply().

ES6 has the spread operator for this purpose.

A. Math.max() example

ES5 – apply():

Math.max.apply(Math, [-1, 5, 11, 3])

ES6 – spread operator:

Math.max(...[-1, 5, 11, 3])

B. Array.prototype.push() example

ES5 – apply():

var arr1 = ['a', 'b'];
var arr2 = ['c', 'd'];

arr1.push.apply(arr1, arr2); // arr1 is now ['a', 'b', 'c', 'd']

ES6 – spread operator:

const arr1 = ['a', 'b'];
const arr2 = ['c', 'd'];

arr1.push(...arr2); // arr1 is now ['a', 'b', 'c', 'd']

 

11. concat() vs. the spread operator (…)

The spread operator can also (non-destructively) turn the contents of its operand into Array elements. That means that it becomes an alternative to the Array method concat().

ES5 – concat():

var arr1 = ['a', 'b'];
var arr2 = ['c'];
var arr3 = ['d', 'e'];

console.log(arr1.concat(arr2, arr3)); // [ 'a', 'b', 'c', 'd', 'e' ]

ES6 – spread operator:

const arr1 = ['a', 'b'];
const arr2 = ['c'];
const arr3 = ['d', 'e'];

console.log([...arr1, ...arr2, ...arr3]); // [ 'a', 'b', 'c', 'd', 'e' ]

 

12. function expressions in object literals vs. method definitions

In JavaScript, methods are properties whose values are functions.

In ES5 object literals, methods are created like other properties. The property values are provided via function expressions.

var obj = {
  foo: function () {
    ···
  },
  bar: function () {
    this.foo();
  }, // trailing comma is legal in ES5
}

ES6 has method definitions, special syntax for creating methods:

const obj = {
  foo() {
    ···
  },
  bar() {
    this.foo();
  },
}

 

13. constructors vs. classes

ES6 classes are mostly just more convenient syntax for constructor functions.

A. Base classes

In ES5, you implement constructor functions directly:

function Person(name) {
  this.name = name;
}
Person.prototype.describe = function () {
  return 'Person called '+this.name;
};

Note the compact syntax for method definitions – no keyword function needed.

Also note that there are no commas between the parts of a class

B. Derived classes

Subclassing is complicated in ES5, especially referring to super-constructors and super-properties.

This is the canonical way of creating a sub-constructor Employee of Person:

function Employee(name, title) {
  Person.call(this, name); // super(name)
  this.title = title;
}

Employee.prototype = Object.create(Person.prototype);
Employee.prototype.constructor = Employee;
Employee.prototype.describe = function () {
  return Person.prototype.describe.call(this) // super.describe()
    + ' (' + this.title + ')';
};

ES6 has built-in support for subclassing, via the extends clause:

class Employee extends Person {
  constructor(name, title) {
    super(name);
    this.title = title;
  }
  describe() {
    return super.describe() + ' (' + this.title + ')';
  }
}

 

14. custom error constructors vs. subclasses of Error

In ES5, it is impossible to subclass the built-in constructor for exceptions, Error.

The following code shows a work-around that gives the constructor MyError important features such as a stack trace:

function MyError() {
  var superInstance = Error.apply(null, arguments); // Use Error as a function
  copyOwnPropertiesFrom(this, superInstance);
}
MyError.prototype = Object.create(Error.prototype);
MyError.prototype.constructor = MyError;

function copyOwnPropertiesFrom(target, source) {
  Object.getOwnPropertyNames(source).forEach(function(propKey) {
    var desc = Object.getOwnPropertyDescriptor(source, propKey);
    Object.defineProperty(target, propKey, desc);
  });
return target;
};

In ES6, all built-in constructors can be subclassed, which is why the following code achieves what the ES5 code can only simulate:

class MyError extends Error {
}

 

15. objects vs. Maps

Using the language construct object as a map from strings to arbitrary values (a data structure) has always been a makeshift solution in JavaScript. The safest way to do so is by creating an object whose prototype is null. Then you still have to ensure that no key is ever the string ‘__proto__’, because that property key triggers special functionality in many JavaScript engines.

The following ES5 code contains the function countWords that uses the object dictas a map:

var dict = Object.create(null);

function countWords(word) {
  var escapedWord = escapeKey(word);
  if (escapedWord in dict) {
    dict[escapedWord]++;
  } else {
    dict[escapedWord] = 1;
  }
}

function escapeKey(key) {
if (key.indexOf('__proto__') === 0) {
    return key+'%';
  } else {
    return key;
  }
}

In ES6, you can use the built-in data structure Map and don’t have to escape keys. As a downside, incrementing values inside Maps is less convenient.

const map = new Map();
function countWords(word) {
  const count = map.get(word) || 0;
  map.set(word, count + 1);
}

Another benefit of Maps is that you can use arbitrary values as keys, not just strings.

 

16. New string methods

A. indexOf vs. startsWith

if (str.indexOf('x') === 0) {} // ES5
if (str.startsWith('x')) {} // ES6

B. indexOf vs. endsWith

function endsWith(str, suffix) { // ES5
  var index = str.indexOf(suffix);
  return index >= 0 && index === str.length-suffix.length;
}
str.endsWith(suffix); // ES6

C. indexOf vs. includes

if (str.indexOf('x') >= 0) {} // ES5
if (str.includes('x')) {} // ES6

D. join vs. repeat (the ES5 way of repeating a string is more of a hack):

new Array(3+1).join('#') // ES5
'#'.repeat(3) // ES6

 

17. New Array methods

A. Array.prototype.indexOf vs. Array.prototype.findIndex

The latter can be used to find NaN, which the former can’t detect:

const arr = ['a', NaN];
arr.indexOf(NaN); // -1
arr.findIndex(x => Number.isNaN(x)); // 1

As an aside, the new Number.isNaN() provides a safe way to detect NaN (because it doesn’t coerce non-numbers to numbers):

isNaN('abc') // true
Number.isNaN('abc') // false

B. Array.prototype.slice() vs. Array.from() (or the spread operator)

In ES5, Array.prototype.slice() was used to convert Array-like objects to Arrays. In ES6, you have Array.from():

var arr1 = Array.prototype.slice.call(arguments); // ES5
const arr2 = Array.from(arguments); // ES6

If a value is iterable (as all Array-like DOM data structure are by now), you can also use the spread operator (…) to convert it to an Array:

const arr1 = [...'abc']; // ['a', 'b', 'c']
const arr2 = [...new Set().add('a').add('b')]; // ['a', 'b']

C. apply() vs. Array.prototype.fill()

In ES5, you can use apply(), as a hack, to create in Array of arbitrary length that is filled with undefined:

// Same as Array(undefined, undefined)
var arr1 = Array.apply(null, new Array(2)); // [undefined, undefined]

In ES6, fill() is a simpler alternative:

const arr2 = new Array(2).fill(undefined); // [undefined, undefined]

fill() is even more convenient if you want to create an Array that is filled with an arbitrary value:

// ES5
var arr3 = Array.apply(null, new Array(2)).map(function (x) { return 'x' }); // ['x', 'x']

// ES6
const arr4 = new Array(2).fill(‘x’); // ['x', 'x']

fill() replaces all Array elements with the given value. Holes are treated as if they were elements.

 

18. CommonJS modules vs. ES6 modules

Even in ES5, module systems based on either AMD syntax or CommonJS syntax have mostly replaced hand-written solutions such as the revealing module pattern.

ES6 has built-in support for modules. Alas, no JavaScript engine supports them natively, yet. But tools such as browserify, webpack or jspm let you use ES6 syntax to create modules, making the code you write future-proof.

A. Multiple exports in CommonJS

//------ lib.js ------
var sqrt = Math.sqrt;
function square(x) {
  return x * x;
}
function diag(x, y) {
  return sqrt(square(x) + square(y));
}
module.exports = {
  sqrt: sqrt,
  square: square,
  diag: diag,
};

//------ main1.js ------
var square = require('lib').square;
var diag = require('lib').diag;

console.log(square(11)); // 121
console.log(diag(4, 3)); // 5

Alternatively, you can import the whole module as an object and access square and diag via it:

//------ main2.js ------
var lib = require('lib');

console.log(lib.square(11)); // 121
console.log(lib.diag(4, 3)); // 5

B. Multiple exports in ES6

In ES6, multiple exports are called named exports and handled like this:

//------ lib.js ------
export const sqrt = Math.sqrt;
export function square(x) {
  return x * x;
}
export function diag(x, y) {
  return sqrt(square(x) + square(y));
}

//------ main1.js ------
import { square, diag } from 'lib';

console.log(square(11)); // 121
console.log(diag(4, 3)); // 5

The syntax for importing modules as objects looks as follows (line A):

//------ main2.js ------
import * as lib from 'lib'; // (A)

console.log(lib.square(11)); // 121
console.log(lib.diag(4, 3)); // 5

C. Single exports in CommonJS

Node.js extends CommonJS and lets you export single values from modules, via module.exports:

//------ myFunc.js ------
module.exports = function () { ··· };

//------ main1.js ------
var myFunc = require('myFunc');
myFunc();

D. Single exports in ES6

In ES6, the same thing is done via a so-called default export (declared via export default):

//------ myFunc.js ------
export default function () { ··· } // no semicolon!

//------ main1.js ------
import myFunc from 'myFunc';
myFunc();

 

 

That would be it,

Cheers!

 

Resources:

Writing files to Hadoop HDFS using Scala

If you’ve been wondering whether storing files in Hadoop HDFS programmatically is difficult, I have good news – it’s not.

For the purpose of this example i’ll be using my favorite (recently) language – Scala.

 

Here’s what you need to do:

  1. Start a new SBT project in IntelliJ
  2. Add the “hadoop-client” dependency (Important: You must use the same version of the client, as is the version of the Hadoop server you’ll be writing files to)
    libraryDependencies ++= Seq(
      "org.apache.hadoop" % "hadoop-client" % "2.7.0"
    )
    
  3. Check in Hadoop configuration the value of “fs.default.name” property (/etc/hadoop/core-site.xml). This will be the URI you need in order to point the app code at your Hadoop Cluster
  4. Write few lines of code
    import org.apache.hadoop.conf.Configuration
    import org.apache.hadoop.fs.{FileSystem, Path}
    
    object Hdfs extends App {
    
      def write(uri: String, filePath: String, data: Array[Byte]) = {
        System.setProperty("HADOOP_USER_NAME", "Mariusz")
        val path = new Path(filePath)
        val conf = new Configuration()
        conf.set("fs.defaultFS", uri)
        val fs = FileSystem.get(conf)
        val os = fs.create(path)
        os.write(data)
        fs.close()
      }
    }
    
  5. Use the code written above
      Hdfs.write("hdfs://0.0.0.0:19000", "test.txt", "Hello World".getBytes)
    

 

That’s all there is to it, really

Cheers 🙂

Installing Hadoop on Windows 8 or 8.1

I was installing Hadoop 2.7.0 recently on a Windows platform (8.1) and thought i’ll document the steps, as the procedure isn’t that obvious (existing documentation on how to do it, is outdated in few places)

 

Basic info:

  • Official Apache Hadoop releases do not include Windows binaries, so you have to download sources and build a Windows package yourself.
  • Do not run the installation from within Cygwin. Cygwin is not required/supported anymore
  • I assume you have a JDK already installed (ver. 1.7+)
  • I assume you have Unix command-line tools (like: sh, mkdir, rm, cp, tar, gzip) installed as well. These tools must be present on your PATH. They come with Windows Git package that can be downloaded from here or you can also use win-bash (here) or GnuWin32.
  • If using Visual Studio, it must be Visual Studio 2010 Professional (not 2012).
  • Do not use Visual Studio Express (It does not support compiling for 64-bit)
  • Google’s Protocol Buffers must be installed in exactly version 2.5.0 (not newer, this is a hard-coded dependency …weird)
  • Several tests that are being executed while building hadoop widows package, require that the user must have the “Create Symbolic Links” privilege. Therefore, the ‘mvn package’ command must be executed from the Command Line in “Administrator mode”.

 

Installation:

  1. Download Hadoop sources tarball from here.
  2. Make sure you have JAVA_HOME in your “Environment Variables” set up properly (in my case it was “c:\Program Files\Java\jdk1.8.0_40”)
  3. Download Maven binaries from here.
  4. Add ‘bin’ folder of maven to your path (in “Environment Variables”)
  5. Download Google’s Protocol Buffers in version 2.5.0 (no other version, including 2.6.1 will work) from here.
  6. Download and install CMake (Windows Installer) from here.
  7. Download and install “Visual Studio 2010 Professional” (Trial is enough) from here (Web Installer) or here (ISO Image)
  8. Alternatively (to the step no 7 above), you can install “Windows SDK 8.1” from here.
  9. Add the location of newly installed MSBuild.exe (c:\Windows\Microsoft.NET\Framework64\v4.0.30319;) to your system path (in “Environment Variables”).
  10. Because you’ll be running the Maven ‘package’ goal from the Command Line (cmd.exe) in “Administrator mode” (aka. “Elevated mode”), it is important that in steps no 4 and 9 above, you’re updating the “PATH” in “System variables” section, and not in “User variables for logged-in user” section.
  11. Run cmd in “Administrator Mode” and execute: “set Platform=x64” (assuming you want 64-bit version, otherwise use “set Platform=Win32”)
  12. Now, while still in cmd, execute:
    mvn package -Pdist,native-win -DskipTests -Dtar
    
  13. After the build is complete, you should find hadoop-2.7.0.tar.gz file in “hadoop-2.7.0-src\hadoop-dist\target\” directory.
  14. Extract the newly created Hadoop Windows package to the directory of choice (eg. c:\hdp\)

 

Testing:

  1. We’ll be configuring Hadoop for a Single Node (pseudo-distributed) Cluster.
  2. As part of configuring HDFS, update the files:
    1. near the end of “\hdp\etc\hadoop\hadoop-env.cmd” add following lines:
        set HADOOP_PREFIX=c:\hdp
        set HADOOP_CONF_DIR=%HADOOP_PREFIX%\etc\hadoop
        set YARN_CONF_DIR=%HADOOP_CONF_DIR%
        set PATH=%PATH%;%HADOOP_PREFIX%\bin
      
    2. modify “\hdp\etc\hadoop\core-site.xml” with following:
      <configuration>
        <property>
          <name>fs.default.name</name>
          <value>hdfs://0.0.0.0:19000</value>
        </property>
      </configuration>
      
    3. modify “\hdp\etc\hadoop\hdfs-site.xml” with:
      <configuration>
        <property>
          <name>dfs.replication</name>
          <value>1</value>
        </property>
      </configuration>
      
    4. Finally, make sure “\hdp\etc\hadoop\slaves” has the following entry:

        localhost
      
    5. and create c:\tmp directory as the default configuration puts HDFS metadata and data files under \tmp on the current drive
  3. As part of configuring YARN, update files:
    1. add following entries to “\hdp\etc\hadoop\mapred-site.xml”, replacing %USERNAME% with your Windows user name:
      <configuration>
        <property>
          <name>mapreduce.job.user.name</name>
          <value>%USERNAME%</value>
        </property>
        <property>
          <name>mapreduce.framework.name</name>
          <value>yarn</value>
        </property>
        <property>
          <name>yarn.apps.stagingDir</name>
          <value>/user/%USERNAME%/staging</value>
        </property>
        <property>
          <name>mapreduce.jobtracker.address</name>
          <value>local</value>
        </property>
      </configuration>
      
    2. modify “\hdp\etc\hadoop\yarn-site.xml”, with:
      <configuration>
        <property>
          <name>yarn.server.resourcemanager.address</name>
          <value>0.0.0.0:8020</value>
        </property>
        <property>
          <name>yarn.server.resourcemanager.application.expiry.interval</name>
          <value>60000</value>
        </property>
        <property>
          <name>yarn.server.nodemanager.address</name>
          <value>0.0.0.0:45454</value>
        </property>
        <property>
          <name>yarn.nodemanager.aux-services</name>
          <value>mapreduce_shuffle</value>
        </property>
        <property>
          <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
          <value>org.apache.hadoop.mapred.ShuffleHandler</value>
        </property>
        <property>
          <name>yarn.server.nodemanager.remote-app-log-dir</name>
          <value>/app-logs</value>
        </property>
        <property>
          <name>yarn.nodemanager.log-dirs</name>
          <value>/dep/logs/userlogs</value>
        </property>
        <property>
          <name>yarn.server.mapreduce-appmanager.attempt-listener.bindAddress</name>
          <value>0.0.0.0</value>
        </property>
        <property>
          <name>yarn.server.mapreduce-appmanager.client-service.bindAddress</name>
          <value>0.0.0.0</value>
        </property>
        <property>
          <name>yarn.log-aggregation-enable</name>
          <value>true</value>
        </property>
        <property>
          <name>yarn.log-aggregation.retain-seconds</name>
          <value>-1</value>
        </property>
        <property>
          <name>yarn.application.classpath</name>
          <value>%HADOOP_CONF_DIR%,%HADOOP_COMMON_HOME%/share/hadoop/common/*,%HADOOP_COMMON_HOME%/share/hadoop/common/lib/*,%HADOOP_HDFS_HOME%/share/hadoop/hdfs/*,%HADOOP_HDFS_HOME%/share/hadoop/hdfs/lib/*,%HADOOP_MAPRED_HOME%/share/hadoop/mapreduce/*,%HADOOP_MAPRED_HOME%/share/hadoop/mapreduce/lib/*,%HADOOP_YARN_HOME%/share/hadoop/yarn/*,%HADOOP_YARN_HOME%/share/hadoop/yarn/lib/*</value>
        </property>
      </configuration>
      
  4. because Hadoop doesn’t recognize JAVA_HOME from “Environment Variables” (and has problems with spaces in pathnames)
    1. copy your JDK to some dir (eg. “c:\hdp\java\jdk1.8.0_40”)
    2. edit “\hdp\etc\hadoop\hadoop-env.cmd” and update
        set JAVA_HOME=c:\hdp\java\jdk1.8.0_40
      
    3. initialize Environment Variables by running cmd in “Administrator Mode” and executing: “c:\hdp\etc\hadoop\hadoop-env.cmd”
  5. Format the FileSystem
      c:\hdp\bin\hdfs namenode -format
    
  6. Start HDFS Daemons
      c:\hdp\sbin\start-dfs.cmd
    
  7. Start YARN Daemons
      c:\hdp\sbin\start-yarn.cmd
    
  8. Run an example YARN job
      c:\hdp\bin\yarn jar c:\hdp\share\hadoop\mapreduce\hadoop-mapreduce-examples-2.7.0.jar wordcount c:\hdp\LICENSE.txt /out
    
  9. Check the following pages in your browser:
      Resource Manager:  http://localhost:8088
      Web UI of the NameNode daemon:  http://localhost:50070
      HDFS NameNode web interface:  http://localhost:8042
    

 

Voilà.

 

 

Resources:

Connecting remote JVM over JMX using VisualVM or JConsole

There are many posts over the Internet on how to do it right, but unfortunately none worked for me (debian behind firewall on the server side, reached over VPN from my local Mac). Therefore, i’m sharing below the solution that worked for me.

 

1. Check server ip

hostname -i

 

2. use JVM params:

-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=[jmx port]
-Dcom.sun.management.jmxremote.local.only=false
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
-Djava.rmi.server.hostname=[server ip from step 1]

 

3. Run application

 

4. Find pid of the running java process

 

5. Check all ports used by JMX/RMI

netstat -lp | grep [pid from step 4]

 

6. Open all ports from step 5 on the firewall

 

Connecting remote JVM over JMX using VisualVM or JConsole

 

Cheers!

SSH Linux login without password

Below is probably the quickest way to achieve this

 

1. Generate SSH key (if you don’t have one already)

ssh-keygen -t rsa

 

2. Use SSH to create a remote directory ~/.ssh

ssh username@dev.company.com mkdir -p .ssh

 

3. Append your public key to .ssh/authorized_keys on remote host

cat ~/.ssh/id_rsa.pub | ssh username@dev.company.com 'cat >> .ssh/authorized_keys'

 

 

That’s it!

SSH Key Authentication with GitLab

Every time i start building a product for a new company, one of the first step is creating a repository and uploading SSH key. Instead of browsing the web looking for a reminder on how to do it, i decided i’ll post the quickest solution here.

 

1. Enter the following command in the Terminal window (Mac OS X)

ssh-keygen -t rsa

 

2. Accept default location and leave password blank (or not, up to you)

 

3. The key will get generated

Your identification has been saved in /Users/mariuszprzydatek/.ssh/id_rsa.
Your public key has been saved in /Users/mariuszprzydatek/.ssh/id_rsa.pub.
The key fingerprint is:
ce:80:76:66:5b:5d:d2:29:3d:64:66:65:e8:d3:aa:5e mariuszprzydatek@Mariuszs-MacBook-Pro.local
The key's randomart image is:
+--[ RSA 2048]----+
|                 |
|         .       |
|        E .      |
|   .   . o       |
|  o . . S .      |
| + + o . +       |
|. + o = o +      |
| o...o * o       |
|.  oo.o .        |
+-----------------+

 

4. The private key (id_rsa) is saved in the .ssh directory and used to verify the public key. The public key (id_rsa.pub) is the key you’ll be uploading to your GitLab account.

 

5. Copy your public key to the clipboard

pbcopy < ~/.ssh/id_rsa.pub

 

6. Paste the key to GitLab

 

GitLab SSH Key Authentication

 

 

Cheers!