<< go back to Tools page
Carpenter
Carpenter is a simple tool to manage Apache Parquet files.

This video will explain in 12 minutes the specificities of Apache Parquet, and how to use The Carpenter for managing such files :
Video : Carpenter in 12 minutes

With Carpenter, you can open, visualize, and save Parquet files.
You can also import CSV files to create new Parquet files, and export existing Parquet files as CSV files.
With full CharSep integration, you can also instantanously access this file in CharSep for further processing and transformations.

Before creating and saving a Parquet file, you can define the model metadata.
For each column, you can therefore influence the column label, the column datatype, and a default value.
In conjunction with Charsep and S3Man, Carpenter complements your toolbox for csv, parquet, and Amazon S3-based file management.

Known limitations :
As of now, to ensure compatibility with CSV files and structure simplicity, only simple datatypes are processed - no complex datatypes.

Download and Install

A wrapped-exe version is available by clicking here for standard Windows users. It's a wrapped executable - meaning that it will search for the best suitable Java Virtual Machine on your PC to execute this program.

This version is the simplest way to run Carpenter however it has some limitations.

If you want to run it on Linux or MacOS, do not want to download an exe file, or if you want to apply specific memory settings you will need to use the jar file (see below).

JAR - FILE DOWNLOAD

Carpenter is  also available as an executable jar file by clicking here and saving the file to your local disk (less than 800 kbytes). It is a freeware - you can download and use it on any computer without requiring any admin/install privileges. No "installation" is required - just copy the jar file and run it... Since an executable jar will use default memory settings for your system, once downloaded you can also execute it using the following command (I did not include the paths so you can adapt to your system) :

javaw -cp carpenter.jar com.s3man.s3man.Carpenter

This allows you to set specific memory parameters, i.e. -Xms and -Xmx to set initial and Max Java heap size, force select 32 or 64 bits memory structure, etc.. It can also allow you to include additional Charsets on the classpath.

Configuration is based on a s3man.properties text file, that will be created where the jar file is launched from, if missing on first execution.

You can get the ico file for setting an icon here.

Help details

Menu File and toolbar

Option Open : Opens a parquet file. File structure will be displayed on the left table, File content will be displayed on the right table.
Option Save : Saves a new parquet file, based on displayed structure and content.
Option Import CSV file : opens a dialog to define, column separator, if the imported file first row contains headers, and the codepage. Then you can select a file from disk and open/display it in the table.
Option Export file : opens a dialog to export a loaded file, to a CSV file, to an HTML file, to the clipboard, or to an XML or JSon file. You can decide to include headers or not, limit export to the rows you selected in the "Content" table, and to select a subset of columns you want to export. On the bottom of the dialog you can also decide to switch from UTF-8 Unicode format, to a codepage of your choice.
Button "Info" : on the toolbar, an "Info" option allows to display the proposed schema structure for the current file.

Structure table

This table displays the list of columns, with a label, type, and optionally a default value for each column.
By double-clicking one of the rows in this table, you can alter and edit the schema of the file. This edited schema will influence conversion of the column content, and will be used when storing a new Parquet file.

 Content table

This table displays the content of  a loaded file. The right-click accessible contextual menu allows to search the table, view the distinct values of a column, and access other functions.

The bottom display

A log of messages and activities is displayed below the tables. Underneath, the count of rows and columns is displayed for the current file (and the number of selected rows), and the source used for the current file lastly loaded.

Releases