<< go back to Tools page

Carpenter is a simple tool to manage Apache Parquet files.
This
video will explain in 12 minutes the specificities of Apache Parquet,
and how to use The Carpenter for managing such files :
Video : Carpenter in 12 minutes
With Carpenter, you can open, visualize, and save Parquet files.
You can also import CSV files to create new Parquet files, and export existing Parquet files as CSV files.
With
full CharSep integration, you can also instantanously access this file
in CharSep for further processing and transformations.
Before creating and saving a Parquet file, you can define the model metadata.
For each column, you can therefore influence the column label, the column datatype, and a default value.
In conjunction with Charsep and S3Man, Carpenter complements your toolbox for csv, parquet, and Amazon S3-based file management.
Known limitations :
As
of now, to ensure compatibility with CSV files and structure
simplicity, only simple datatypes are processed - no complex datatypes.
Download and Install
A wrapped-exe version is available by clicking here
for standard Windows users. It's a wrapped executable - meaning that it
will search for the best suitable Java Virtual Machine on your PC to
execute this program.
This version is the simplest way to run Carpenter however
it has some limitations.
- Windows only (Any version of Windows with a Java Virtual Machine v1.8 minimum)
- It may be wrongly
detected as a virus by some
antivirus software since executable is not signed
- Memory settings are pre-defined : 100 Mo mini, 300 Mo
max (This should allow correct usage on common PCs).
If you want to run it on Linux or MacOS, do not want to download an exe file, or if you want to apply
specific memory settings you will need to use the jar file (see below).
JAR - FILE DOWNLOAD
Carpenter is also available as an executable jar
file
by clicking here and
saving the
file to your local disk (less than 800 kbytes). It is a freeware - you
can download and use it on any computer without requiring any
admin/install privileges. No
"installation" is required - just copy the jar file and run it... Since
an executable jar will use default memory settings for your system,
once downloaded you can also execute it using the following command (I
did not include the paths so you can adapt to your system) :
javaw
-cp carpenter.jar com.s3man.s3man.Carpenter
This allows you to set specific memory parameters, i.e.
-Xms and -Xmx to set initial and Max Java heap size, force select 32 or
64 bits memory structure, etc.. It can also allow
you to include additional Charsets on the classpath.
Configuration is based on a s3man.properties text
file, that will be created where the jar file is launched from, if
missing on first execution.
You can get the ico file for setting an icon here.
Help details
Menu File and toolbar
Option
Open : Opens a parquet file. File structure will be displayed on the
left table, File content will be displayed on the right table.
Option Save : Saves a new parquet file, based on displayed structure and content.
Option
Import CSV file : opens a dialog to define, column separator, if the
imported file first row contains headers, and the codepage. Then you
can select a file from disk and open/display it in the table.
Option
Export file : opens a dialog to export a loaded file, to a CSV file, to
an HTML file, to the clipboard, or to an XML or JSon file. You can
decide to include headers or not, limit export to the rows you selected
in the "Content" table, and to select a subset of columns you want to
export. On the bottom of the dialog you can also decide to switch from
UTF-8 Unicode format, to a codepage of your choice.
Button "Info" : on the toolbar, an "Info" option allows to display the proposed schema structure for the current file.
Structure table
This table displays the list of columns, with a label, type, and optionally a default value for each column.
By
double-clicking one of the rows in this table, you can alter and edit
the schema of the file. This edited schema will influence conversion of
the column content, and will be used when storing a new Parquet file.
Content table
This
table displays the content of a loaded file. The right-click
accessible contextual menu allows to search the table, view the
distinct values of a column, and access other functions.
The bottom display
A
log of messages and activities is displayed below the tables.
Underneath, the count of rows and columns is displayed for the current
file (and the number of selected rows), and the source used for the
current file lastly loaded.
Releases
- Included new column details option on contextual menu
- Updated version of integrated Charsep release
- v2024.01.25:
- Integration
with "Charsep" : function to open the Parquet file into a Charsep grid
for rich profiling and maintenance of information. Either the full
parquet file can be opened, or only selected rows
- New
Edit menu for basic structure changes functions: move of columns left
& right, deletion of a column, deletion of selected rows
- On grids, you can change sort order by clicking again on same header - Ctrl-click still allows for descending order
- Defects correction on datatype handling through contextual menu functions
- v2024.01.20:
- New function to convert directly files on disk - from Csv to Parquet and from Parquet to Csv
- Importing a CSV file can be processed direct-to-file to prevent memory issues with large files
- "Wizard" to automatically analyze CSV file imported for data
typing.
- When a default value is defined for a column, apply this value to all empty cells in column.
- Fixed headers labels management to comply with parquet format.
- Do not allow to manually drag columns on grid.
- Do not allow to directly sort columns list.
- Provide processing time on log console, when saving a parquet file or exporting a csv file.
- v2024.01.01
- this
is the initially published version of the tool, with basic features of
loading/saving a Parquet file, displaying the structure and content,
importing/exporting a CSV file.