Sunday, December 10, 2023

Kinesis Data Streaming , AWS GLUE

 


https://www.youtube.com/watch?v=We5Jr4GGLL0
















https://www.youtube.com/watch?v=dQnRP6X8QAU



Consuming Data from AWS Kinesis Data Firehose in Apache Flink

If you have data coming into a Kinesis Data Firehose delivery stream and you want to process this data using Apache Flink, you could set up Firehose to deliver the data to an S3 bucket. Flink can then consume the data from S3 using the appropriate file input connector.

Here is a high-level approach to this scenario:

  1. Configure your Kinesis Data Firehose delivery stream to deliver data to an Amazon S3 bucket.
  2. Use Apache Flink's FileInputFormat or any available S3 connectors to read the data files from the S3 bucket.

Here's an example of how you might set up a Flink job to read data from files stored in S3:

java
import org.apache.flink.core.fs.Path;
import org.apache.flink.streaming.api.functions.source.FileProcessingMode;
import org.apache.flink.streaming.api.functions.source.FileReadFunction;
import org.apache.flink.streaming.api.functions.source.FileSource;
import org.apache.flink.streaming.api.functions.source.TimestampedFileInputSplit;

// Define the path to the S3 bucket
String inputPath = "s3://your-s3-bucket/path-to-files/";

// Create a FileSource builder and set the FileInputFormat
FileSource.TimestampedFileInputSplitBuilder<String> builder = FileSource.forRecordStreamFormat(new SimpleStringFormat(), new Path(inputPath))
    .monitorContinuously(Duration.ofMinutes(1)); // Check for new files every minute

// Create the actual source function
DataStreamSource<String> source = env.fromSource(
    builder.build(),




As of my last update in September 2021, the Kinesis Producer Library (KPL) was designed primarily for Java-based applications and there wasn't a direct equivalent provided by AWS for Swift or Node.js. The KPL offers features that help facilitate efficient and reliable streaming of data into a Kinesis Data Stream, such as batching, retries, and aggregation of records.

For languages other than Java, developers usually use the AWS SDKs available for their language of choice to interact with Kinesis Data Streams. Here are the alternatives for Swift and Node.js:

For Node.js:

AWS provides the AWS SDK for JavaScript, which can be used in Node.js applications to interact with Kinesis. You can send data to Kinesis Data Streams using the putRecord or putRecords API calls. Here's a simple example using putRecord in Node.js: